2018-07-09

Do not use "downlink delay" on Cisco Nexus if vPC peer-keepalive is done through the access ports

The Cisco Nexus 3000 series switches with 1GE copper interfaces support the "downlink delay" feature, that looks really helpful in the first place, since it blocks traffic flow until the switch is connected to the core. But, you should be very careful when combining it with vPC if peer-keepalive is built either over the access copper ports or the downlink ones (a non-existent scenario, since usually you can't spare even one downlink port with the usual 4-port configuration), instead of default recommendation for mgmt0.
With downlink delay configured, the access ports come up with a specified delay (30s default), leading to peer-keepalive being down. When one of the switches comes down and then up, this leads the second (vPC peer) switch to believe that since peer-keepalive is down and peer-link is up, it should not become primary and, in fact should shut down all local vPCs. So, whenever you reload any of the vPC peers, all your vPCs are down on both switches for the downlink delay.
The solution is simple -- either disable downlink delay (we went this way and didn't encounter any problems we anticipated when enabling this setting in the first place), or use mgmt0 ports for vPC-keepalive.