The evening before the Boston VMUG conference my friends and I were in our hotel winding down with some pizza and TV, discussing which breakout sessions we were going to attend the next day. At 10:50PM my phone lit up “VMware vCenter – Alarm alarm.HAhostStatus.” I immediately fired up my VPN connection and launched iLO to see another Purple Screen Of Death.
Damn! Exception 14 again. That’s two hosts within a month! I extracted the logs as described in part 1 and found the following.
2014-06-24T02:49:00.449Z cpu17:33512)@BlueScreen: #PF Exception 14 in world 33512:lpfc_do_work IP 0x4180057c62c6 addr 0x747369 PTEs:0x13e5d1027;0x13e2e9027;0x0; 2014-06-24T02:49:00.449Z cpu17:33512)Code start: 0x418005000000 VMK uptime: 60:12:38:37.042 2014-06-24T02:49:00.449Z cpu17:33512)0x41238ba1da80:[0x4180057c62c6]lpfc_sli4_bpl2sgl@#+0xb6 stack: 0x410a793b2248 2014-06-24T02:49:00.450Z cpu17:33512)0x41238ba1db50:[0x4180057cfef8]__lpfc_sli_issue_iocb_s4@#+0x80 stack: 0x412e80c5d5c0 2014-06-24T02:49:00.450Z cpu17:33512)0x41238ba1db90:[0x4180057c9c54]lpfc_sli_issue_iocb@#+0xc0 stack: 0x418000000002 2014-06-24T02:49:00.450Z cpu17:33512)0x41238ba1dc60:[0x418005768082]lpfc_els_rsp_rls_acc@#+0x142 stack: 0x41238ba1dd00 2014-06-24T02:49:00.451Z cpu17:33512)0x41238ba1ddb0:[0x4180057d45dd]lpfc_sli_handle_mb_event@#+0x129 stack: 0xc5913e00000078 2014-06-24T02:49:00.451Z cpu17:33512)0x41238ba1df30:[0x41800577c810]lpfc_work_done@#+0xbcc stack: 0x225e 2014-06-24T02:49:00.451Z cpu17:33512)0x41238ba1df80:[0x4180057821d4]lpfc_do_work_event@#+0xbc stack: 0x13
This time it wasn’t the HPSA, but rather the LPFC driver used by our HP VirtualConnects that caused the crash. This host was running version 10.0.575.8. VMware tells me that they are working with Emulex to fix the driver and VMware recommends upgrading to LPFC version 10.2.261.7 available here as there have been no reported PSOD’s with this version.
Installing the new driver is identical to the process described in part 1 and is well documented on VMware’s site.
To find what version LPFC driver you’re running…
- SSH to your ESXi host
- Identify the HBA’s installed and make sure you have LPFC interfaces
# esxcfg-scsidevs -a vmhba0 hpsa link-n/a sas.5001438026743f90 (0:3:0.0) Hewlett-Packard Company Smart Array P220i vmhba1 lpfc link-up fc.5001438002a30041:5001438002a30040 (0:4:0.2) ServerEngines Corporation Emulex OneConnect OCe11100 FCoE Initiator vmhba2 lpfc link-up fc.5001438002a30043:5001438002a30042 (0:4:0.3) ServerEngines Corporation Emulex OneConnect OCe11100 FCoE Initiator
- Now that we verified our host has two LPFC’s, run the following to get the version
# vmkload_mod -s lpfc |grep Version Version: 10.0.575.8-1OEM.550.0.0.1198611
If you have version 10.0.575.8-1OEM.550.0.0.1198611 like the example, you’d best get patching.