- Software drops
- Hardware drops
- The PCIe slot is of the wrong generation (e.g. the card is plugged into a Gen 2 slot)
- The PCIe slot is of the wrong width (e.g. there is only x1 available instead of the required x8)
- The PCIe slot is not attached to a root port
- The card is plugged into a PCIe slot on the wrong NUMA node
- Power-saving settings are getting in the way, causing weird latency spikes
- On a card with many ports, e.g. an X40, it is possible for the combined ingress rate across all ports to exceed the available PCIe bandwidth
To be clear, the NIC does not touch main memory - the RX and TX regions live on the card itself. However, it does contend for L3 cache slots when sending frames to the host, which means the host will have to write evicted lines back out to main memory - causing memory bandwidth pressure. It may also be the case that there is L3 cache pressure but no memory bandwidth pressure, for example with multiple cards receiving at line rate. In this case, you'll see packet drops even though there is enough PCIe bandwidth for all of the cards.
To determine whether you're running into this, it's worth shutting down non-essential applications (that may be consuming memory bandwidth) and turning off other ports. If hardware drops go away when you do this, you may have an L3 bandwidth issue.
It is possible to fix this using what Intel calls "Cache QoS Enforcement", the setup for which is outside the scope of this article.