Loading....
Recent Article links:

Archive for January, 2009

MySQL replication to prove it’s a hardware problem.

We’ll be setting up replication on our servers to see if the master server is indeed in a hardware problem. To prove our hypothesis we will wait until the site crashes again once it is brought up and if it does and the slave server is fine we will know that this is a hardware related issue.

We’ve already restored our database for the squilionth time and got some sleep while we were waiting.

Should this be a hardware problem, the guys at the DC will be able to run tests and replace the hardware for us.

Hardware problem.

We continue to think this is a RAM problem as HDDs were replaced. We will be running some hardware tests tomorrow when the DC guys are in to confirm that it is and replace the faulty hardware. Lets do hope that this is a RAM problem as it will be easy to fix.

Downtime again. Sigh.

We’re down again due to a MySQL error. Luckily this time I had the opportunity to witness this corruption process in action. What occurs is that MySQL’s CPU usage would spike up to 100% on all CPUs of the server. Soon after that a series of errors are outputed into the error log and MySQL automatically restarts.

The worst part of this is that the database may become irrecoverable. This has happened before in previous crashes. Every crash however seems different in terms of the errors that is outputed.

But from looks of things, the current back up we’re making seems to be getting through.

Reinstallation Update 12.

We’re nearly up and running. The database restoration is nearly finished.

Servers up and restoring DB.

Should take about 2 hours to restore the DB. I made a change to deadline scheduler today, so it should take less than that to get up and running.

DC guys are in and working.

Shouldn’t be long before we’re back up again.

Damn.

We countered the same corruption error again. This time it occured because of an ungraceful restart. We’re still not sure why this occurs and perhaps could be a MySQL bug. Currently the server is down and is awaiting the DC to reboot it manually so that we can get into it again.

It is roughly 8PM in HK. Lets hope the techs are working a late night shift.

No more sluggish WBB.

We faced a problem where zlib was not compiled into PHP on the new Centos 5.2 install. This has now been fixed and gzipping has started working again.

Reinstallation Update 11.

We’re finally back up, we still have a list of things we must complete before a fully operational site is working. The site will go through a number of downtimes and uptimes in the next 12 hours.

Reinstallation Update 10. Nearly there edition.

We’re very close to opening up WBB once again. Everything is set up but a few other things that can we can whilst the forum is alive and working for all.

ETA 30min.

ACF loading animated gif