Broadcom BCM57810 FCoE and ESXi

January 29, 2015  •  12 Comments

While working with Dell support on an iSCSI storage issue the issue came up of getting the firmware update to date on our hosts to rule out an undocumented bug or interaction related to being approximately a year behind.  While skeptical of it actually resolving the issue, I proceeded to do the updates after giving the release notes for the various components a cursory look for any obvious show stoppers.  Imagine my surprise when after installing the requisite updated ESXi VIB that my hosts now had FCoE offload functionality that was unexpected and undesired.  This wasn't too big of an issue till an issue arose, and I needed to view the logs for the host and they were spammed with impunity by the Broadcom driver with FCoE VLAN discovery failure messages.

Rebooted a host and poked around through BIOS and the Life Cycle Controller, absolutely nothing for disabling FCoE offload without implementing NPAR which isn't a feature desired for our environment.  Right clicking the FCoE device under Storage Adapters and selecting remove proved frustratingly fruitless as after a reboot they were still showing up.  Dug into the issue more with support, going through syslogs from the impacted hosts and we stumbled upon a script that was running during ESXi's boot process.  Said script was triggering a FCoE discovery on the NICs and restoring the functionality we were trying to remove.

After finding the script support took a closer look at the release notes on the Broadcom VIB and found this little tidbit just tossed in.

Problem: After installing the driver, the NIC interface associated with
             the FCoE functionality needs to be triggered for discovery. This
             is done manually by executing a few commands on the ESX server.

Cause: The bnx2fc is a dependent FCoE solution wherein the NIC interface
            is used for part of the slow path processing.

Change: Carry a init script with the driver vib which will be executed
            automatically after the driver is deployed, which executes the
            required commands on the ESX server.

To resolve this on each host navigate to /etc/rc.local.d/ and remove 99bnx2fc.sh.  Then execute the following esxcli commands to disable the FCoE discovery, remove the VIB and reboot.  Probably a good idea to also modify VMware's Non-Critical Patch baseline to exclude the bnx2fc VIB to prevent it's accidental reinstallation.

Determine which vmnics have FCoE capability
esxcli fcoe nic list

Replace vmnicX with the output of the above command.
esxcli fcoe nic disable -n vmnicX

You will also need to remove the Broadcom FCoE VIB prior to rebooting.
esxcli software vib remove --vibname=scsi-bnx2fc

And the contents of 99bnx2fc.sh if anyone is interested.

/etc/rc.local.d # cat 99bnx2fc.sh
#!/bin/sh
# QLogic bnx2fc FCoE NIC discovery Script
esxcli fcoe nic list | grep -i "^vmnic" > /scratch/vmnic_fc.txt

count=$(esxcli fcoe nic list | grep -c vmnic)

echo "$count interfaces are FCoE capable"
echo ""

for i in $(seq 1 1 $count)
do
        vmnic=$(sed -n "$i"p /scratch/vmnic_fc.txt)
        ethtool -i $vmnic | grep "bnx2" &> /dev/null
        # Only Run the command for Broadcom/QLogic NX-2 adapters
        if [ $? -eq 0 ]; then
                echo "Broadcom/QLogic Adapter : $vmnic"
                echo "esxcli fcoe nic discover -n $vmnic";
                esxcli fcoe nic discover -n $vmnic;
        fi
        echo ""
        sleep 1;
done
rm -f /scratch/vmnic_fc.txt
exit 0

 

 


Comments

12.Almero(non-registered)
Hi Guys , just sharing for possible other PPL that have the same problem .

I set Auto Discover = 1 , and that did indeed set all my CNAs to become FCOE VMHBas .
So not a good idea if you are mixing CNA functions on the same ESXi hosts

Then I updated BNX2FC only to a slightly newer version and all my problems went away .
I can now reboot Hosts , and normal NICs , ISCSI adapters , and FCOE Nics remain unchanged.
Also no observed Log Spam .

You require BNX2FC 1.713.20.v60.2 , and that is now available on the VMware Website >
https://my.vmware.com/web/vmware/details?productId=491&downloadGroup=DT-ESX60-QLOGIC-BNX2X-271310V60.4
11.Almero(non-registered)
Hi Guys , just sharing for possible other PPL that have the same problem .

I set Auto Discover = 1 , and that did indeed set all my CNAs to become FCOE VMHBas .
So not a good idea if you are mixing CNA functions on the same ESXi hosts

Then I updated BNX2FC only to a slightly newer version and all my problems went away .
I can now reboot Hosts , and normal NICs , ISCSI adapters , and FCOE Nics remain unchanged.
Also no observed Log Spam .

You require BNX2FC 1.713.20.v60.2 , and that is now available on the VMware Website >
https://my.vmware.com/web/vmware/details?productId=491&downloadGroup=DT-ESX60-QLOGIC-BNX2X-271310V60.4
10.Almero(non-registered)
Hi Guys/DavideDG .

My ESX6.02 ( A02 DELL Build ) Dell M630 Blade Hosts have 6 57810 Adapters , two of those we nominate to be FCOE adapters . After Rebooting , they are normal NICs again , and storage is gone .

I assume this relates fro the Autodiscovery=1 below , but how can I GET the current value for that ? I want to see if its on /off at the moment , but config files are binaries .

Also how do I ensure that my 4 other 57810 Adapters don't try and become FCOE also ? We had a a issue a while back with that . Dell support told us to go to A02 ISO build and all will be good . Clearly not . Does it apply on driver level , or only of NIC selected on esxcli fcoe enable ?

Any advice will be appreciated .
9.Almero(non-registered)
Hi Guys/DavideDG .

My ESX6.02 ( A02 DELL Build ) Dell M630 Blade Hosts have 6 57810 Adapters , two of those we nominate to be FCOE adapters . After Rebooting , they are normal NICs again , and storage is gone .

I assume this relates fro the Autodiscovery=1 below , but how can I GET the current value for that ? I want to see if its on /off at the moment , but config files are binaries .

Also how do I ensure that my 4 other 57810 Adapters don't try and become FCOE also ? We had a a issue a while back with that . Dell support told us to go to A02 ISO build and all will be good . Clearly not . Does it apply on driver level , or only of NIC selected on esxcli fcoe enable ?

Any advice will be appreciated .
8.DavideDG(non-registered)
we had the opposite issue: bnx2fc_autodiscovery is non-present, so FCoE devices are disappearing every 2 reboots.
Luckily this is still pre-production environment...
We are trying with
esxcli system module parameters set --module bnx2fc --parameter-string bnx2fc_autodiscovery=1

and see if FCoE are now persistent.

So unbelievable that this issue is not documented anywhere in DELL/VMWare KBs.
Thanks!!
No comments posted.
Loading...

Archive
January (5) February March (1) April May June July August September October November (1) December (1)
January February March (1) April May June July August September October (1) November (4) December
January (1) February (1) March (2) April (1) May June July August September October November December
January February March April May June July August September October November December
January February March April May June July August September October November December