HPC - Infiniband vs RoCEv2
Learning about HPC (high-performance computing) infrastructure and standards. Also known as supercomputers require high-speed transport and broad bandwidth. We're talking from 10Gbps up to 400Gbps.
Most of wired OSI L2 we know is Ethernet with MAC overhead etc.
62% supercomputers use Infiniband standard (June 2022).
1% HPC uses Ethernet.
=== 1st pic
Now Infiniband (IB) requires own architectures and NICs (HCA in 1st pic), natively provides RDMA*, and has lower latency than Ethernet due to efficient L2, L3+ layers usage for HPC env rather than generic Ethernet L2 switches.
IB has own denotion (?) like EDR, HDR, NDR etc. like Ethernet 100GbE, 200GbE, 400GbE etc.
IB in Linux Kernel has 4096 max MTU.
=== 2nd pic
Mellanox is the 1st company that implemented standard RoCE (RDMA over Converged Ethernet) and hence most of articles I've read refer to them.
So RoCEv2 (latest) protocol is using Ethernet as L2 (with all Ethernet widespread HW) and UDP/IP (encapsulated in IBTA Protocol for RDMA verbs).
Mellanox Connectx-3 Pro NIC can be used for RoCE protocol (depending on speed and BW needs, e.g. SFP, QSFP+, QSFP28 etc.)
There is Software impl. of RoCE via
There is also "deprecated" iWARP implementation with Ethernet but AFAIU, it sucks, so ignore it — use RoCEv2 if you have Ethernet HW.
=====
In general, comparison between IB and RoCE is written in this article
* RDMA (remote direct memory access) - technology to access remote target's memory without CPU participation in server-server arch.
#hpc #roce #infiniband
Learning about HPC (high-performance computing) infrastructure and standards. Also known as supercomputers require high-speed transport and broad bandwidth. We're talking from 10Gbps up to 400Gbps.
Most of wired OSI L2 we know is Ethernet with MAC overhead etc.
62% supercomputers use Infiniband standard (June 2022).
1% HPC uses Ethernet.
=== 1st pic
Now Infiniband (IB) requires own architectures and NICs (HCA in 1st pic), natively provides RDMA*, and has lower latency than Ethernet due to efficient L2, L3+ layers usage for HPC env rather than generic Ethernet L2 switches.
IB has own denotion (?) like EDR, HDR, NDR etc. like Ethernet 100GbE, 200GbE, 400GbE etc.
IB in Linux Kernel has 4096 max MTU.
=== 2nd pic
Mellanox is the 1st company that implemented standard RoCE (RDMA over Converged Ethernet) and hence most of articles I've read refer to them.
So RoCEv2 (latest) protocol is using Ethernet as L2 (with all Ethernet widespread HW) and UDP/IP (encapsulated in IBTA Protocol for RDMA verbs).
Mellanox Connectx-3 Pro NIC can be used for RoCE protocol (depending on speed and BW needs, e.g. SFP, QSFP+, QSFP28 etc.)
There is Software impl. of RoCE via
modprobe rdma_rxe
Linux module as well.There is also "deprecated" iWARP implementation with Ethernet but AFAIU, it sucks, so ignore it — use RoCEv2 if you have Ethernet HW.
=====
In general, comparison between IB and RoCE is written in this article
* RDMA (remote direct memory access) - technology to access remote target's memory without CPU participation in server-server arch.
#hpc #roce #infiniband
fibermall.com
What is InfiniBand Network and the Difference with Ethernet?
What is the InfiniBand Network?
The InfiniBand architecture brings fabric consolidation to the data center Storage networking can concurrently run with
The InfiniBand architecture brings fabric consolidation to the data center Storage networking can concurrently run with