This one was not easy to solve. Running SuSE SLES 11.3 as guest in VirtualBox 4.2.18 on an otherwise entirely stable host (Linux 3.11.0 running Ubuntu 13.10), my 64 bit SLES VMs would randomly just “stuck”, consuming full CPU usage on the host system for the cores I had assigned to them. If not touched, they would remain in that state forever. Netconsole to another host did not reveal anything of value. All other machines (about 20 at the same time on any normal day) would continue running with no issues. So what was so special about SLES that made these machines hang?
Finally, I found this post. At the end of the thread, one user suggests:
Analyzing the core dump I saw that the E1000 ethernet card waits for the guest to free more network descriptors. One of the guest CPUs is currently executing code, the other is in halt state. This could be a problem with the E1000 network card emulation. Could you test if your guest works better if you change the network card to PCNet (VM network settings / advanced)?
The next post confirms:
It seems that setting NC to PCnet_Fast_III solves the problem with hanging. With 2 processors, machine worked under load for 2 days without problem with much better performance than with 1 CPU.
Confirmed also now from my side. I was able to bring the machine into a state where it would consistently hang after starting some software I had installed and which puts substantial load on the network as well as file-IO and memory. That was a good thing as I could just wait for 2 minutes and have the system hang again. The only change that made a difference was changing the network settings. In phpVirtualBox:
I had tried (and always used) the different options of Intel PRO; yet for this particular guest, it would make my virtual machine hang.