Loading....
Recent Article links:

Article

This weeks round up of errors and effort.

After the upgrade to one of our load balancing servers, we have had a series of issues to work our way through. These issues were not easy to solve as they ranged from kernel bugs (hard to find and reproduce) to script bugs (easy peasy).

Here is a list of issues we’re currently working on and the efforts we’re putting in to solve it:

1. The most important of all is the slowness at peak hours along with a few 500 errors. Though we did promise to rid the 500 errors for good prior to our server upgrade, the current 500 errors are caused by a phenomena completely different. The current issue is that we have decreased the number of PHP processes on our main database server and shifted most of the PHP processing to our new Quad core. This should give more resource leeway for MySQL to do it’s thing on our db server. However, from observation so far, the connection between the Quad and db server drops for PHP processes when the search begins to reindex every 2 hours.

The effort? Those with some knowledge about how servers work will realize there is no correlation between the reindexing of the searching (reindexing is done via a C++ program) with the connection between PHP processes. This is why it is such a hard issue to narrow down and reproduce. It was only a few days ago did we realize this, and we have now started playing with the parameters between the search and the PHP processes to narrow this down further. It is only through the behavioral changes through our changed settings can we narrow this down.

Most of my effort so far has been placed towards this.

2. Avatars and emails.

Our avatar problem was just fixed 12 hours ago. The reason why is because we did not notice that we needed to switch servers back from avt2.warez-bb.org to avt.warez-bb.org. Avt2 currently does not have the syncing of avatars set up, so all uploaded avatars was not viewable until we made the switch in our script.

Our emailing problem was due an unknown error between our emailing server (img5) and our servers in HK. We had added a secondary IP to img5 recently and was set as eth0:1. For some stupid and still unknown reason, the main IP that the server uses to connect to the outside world has switched from eth0 to eth0:1. In all my life working on servers, eth0 (or ethX) should always be the main IP. So if someone is reading this and has knowledge of where to configure the main IP for CentOS/Redhat, please contact me via PMing one of our Administrators.

Currently, we’ve authorized our second IP to access our HK servers which is why emailing once again works. To avoid sending too many emails at once, I have removed all pending emails that were in the queue about new replies and PM notifications. However, as of now, all notifications and all emails will be sent out as usual.

3. A large number of our members are receiving email notifications regarding incorrect log ins. Though we have moved to firewall ban all IPs that try and log into multiple accounts, more aggressive techniques are going to be used to thwart incorrect log ins.

ACF loading animated gif