2011 October

Users vs. Programmers

By mnott on 2011/10/17 | Code, Computer | 1 comment

When your Server dies…

Having recently upgraded my server by an external housing and an additional SCSI controller (an LSI 9750-8e in addition to theĀ 9690SA-8i), I was experiencing unexpected server crashes that had never happened before on that server.

The server just became unresponsive, sometimes still responded to a ping, sometimes not. Before, it had an uptime since fall 2008, only interrupted by the occasional kernel upgrade.

The server runs on an UPS, and an additional Filter. It has a redundant power supply. For that reason I was excluding problems on that side.

Server Specs: dual 3.00 GHz Xeon E5472 with 32 GB RAM, 7 x Seagate ST310006400SS 1TB drives running in a RAID 10 on the internal controller (1 hot spare; leaving 2.7 TB net space); 5 x SEAGATE ST33000650SS 3TB drives running in a RAID 10 on the external controller (1 hot spare; leaving 5.8 TB net space); both controllers have 512 MB cache each and are both battery buffered. Of course, the NIC is also redundant. The server was provided by Silicon Mechanics.

Now, since the problems arose after the upgrade I immediately thought about the controller. I dumped a log and fired it off to Silicon Mechanics, who showed an excellent level of responsiveness – yet they could not see any problems. They suggested memory to be failing. Yet, because of the correlation with the upgrade, I wasn’t really buying into the idea of memory modules failing just by accident around the same time.

Someone on #linuxger suggested using netconsole. Netconsole is basically a driver that fires off logging information (dmesg) to a remote host using UDP – beginning at a very early stage of the boot process. Since – once the server had failed – I could not log on to the system, I really wanted to get some “pre mortem” error messages. So I took out an old linux notebook, configured netconsole on the server and syslogd on the notebook, and waited. Read more…

By mnott on 2011/10/14 | Computer


By mnott on 2011/10/10 | Computer | A comment?