EVL networking in a nutshell
EVL features a simple network stack which currently implements the UDP
protocol from the PF_INET domain (IPv4), and raw ethernet datagrams
from the PF_PACKET domain, both over an ethernet medium. This
provides datagram-oriented networking support to applications with
real-time communication requirements, through the common networking
device infrastructure. Out-of-band communication happens via
out-of-band network ports as
provided by Dovetail.
Out-of-band network ports
An out-of-band network port as defined by the Dovetail interface is a communication channel hosted by a regular Linux network interface device (netdev), which provides a way for real-time applications to send and receive packets from the out-of-band execution stage. Both real (physical) devices and virtual (VLAN) interfaces sitting on top of the latter can become out-of-band network ports independently from each other. However, when a VLAN device is turned into an out-of-band port, input diversion is automatically enabled on the underlying real device.
Enabling a network device as an out-of-band port can be done in three ways:
-
Using the
evl net -ei <interface>management command. -
Calling the evl_net_enable_port() service from your application.
-
writing a non-zero value to
/sys/class/net/<netdev>/oob_port. Conversely, writing zero disables the port.
Warning
Enabling a network device as an out-of-band port does NOT magically make its driver oob-capable. Doing so always requires code changes to the stock kernel driver for the relevant network interface controller. Enabling the port merely allows out-of-band traffic to be submitted to the network device by applications running on the out-of-band stage. If the driver is not oob-capable, ingress packets are (usually) received from the NAPI poll routine running on the in-band stage then forwarded to out-of-band consumers, egress packets are relayed from out-of-band senders to the driver transmit routine running on the in-band stage. In other words, when the driver is not oob-capable, no real-time requirement can be met by EVL, the worst-case latency depends on the real-time capabilities of the in-band kernel. However, even through non oob-capable driver, out-of-band threads may still receive and transmit packets without risking demotion to the in-band stage.
Redirecting the ingress traffic to EVL
EVL relies on Dovetail which can redirect ingress traffic between NIC drivers and the EVL core directly, either from the in-band or out-of-band stages, depending on whether such drivers have been adapted in order to handle traffic from the out-of-band stage. The following rules for redirecting the incoming ethernet traffic to the EVL network stack apply in sequence:
-
eBPF filtering. If an eBPF program is installed on a network device (VLAN or base/physical interface) which enables an out-of-band port, it receives every ingress packet for deciding whether it should be processed by the EVL network stack, the in-band stack, or dropped. The program should be of type
BPF_PROG_TYPE_SOCKET_FILTER, receiving a socket buffer.By returning the appropriate status code, the filter program can decide to either:
-
Accept the packet for handling by the EVL network stack (
EVL_RX_ACCEPT). -
Hand over the packet to the in-band stack instead (
EVL_RX_SKIP). -
Postpone the decision to applying the VLAN matching rule (
EVL_RX_VLAN). -
Drop the packet, which won’t enter any network stack as a consequence (
EVL_RX_DROP).
-
When present, the eBPF program takes precedence over the reserved interface and VLAN matching rules. In absence of eBPF filter, the latter apply in sequence. Packets which EVL did neither accept nor drop (eBPF only) are forwarded to the in-band network stack.
For instance, a transparent filter passing all input to the next selection method would look like this:
#include <linux/types.h>
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <evl/net/bpf-abi.h>
SEC("socket")
int bpf_netrx(struct __sk_buff *skb)
{
return EVL_RX_VLAN;
}
char _license[] SEC("license") = "GPL";Conversely, a filter entirely bypassing EVL for ingress traffic would look like this:
#include <linux/types.h>
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <evl/net/bpf-abi.h>
SEC("socket")
int bpf_netrx(struct __sk_buff *skb)
{
return EVL_RX_SKIP;
}
char _license[] SEC("license") = "GPL";We could write a more elaborate filter for redirecting UDP/IPv4 datagrams sent to a particular port (say ‘42042’ in the example below) to the out-of-band network stack, passing the rest of the ingress traffic to the in-band stack. Such filter would look like this:
#include <linux/types.h>
#include <linux/bpf.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <linux/in.h>
#include <linux/if_ether.h>
#include <linux/if_vlan.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#define OOB_UDP_PORT 42042
struct vlan_hdr {
__be16 h_vlan_TCI;
__be16 h_vlan_encapsulated_proto;
} __packed;
SEC("socket")
int bpf_netrx(struct __sk_buff *skb)
{
struct vlan_hdr vhdr;
struct iphdr iph;
struct udphdr uh;
__u16 protocol;
int offset = 0;
protocol = bpf_ntohs(skb->protocol);
if (protocol == ETH_P_8021AD) {
if (bpf_skb_load_bytes(skb, 0, &vhdr, sizeof(vhdr)))
return EVL_RX_SKIP;
offset = sizeof(vhdr);
protocol = bpf_ntohs(vhdr.h_vlan_encapsulated_proto);
}
if (protocol == ETH_P_8021Q) {
if (bpf_skb_load_bytes(skb, offset, &vhdr, sizeof(vhdr)))
return EVL_RX_SKIP;
offset += sizeof(vhdr);
protocol = bpf_ntohs(vhdr.h_vlan_encapsulated_proto);
}
if (protocol != ETH_P_IP)
return EVL_RX_SKIP;
if (bpf_skb_load_bytes(skb, offset, &iph, sizeof(iph))) {
bpf_printk("cannot load IP header");
return EVL_RX_SKIP;
}
if (iph.version != 4 || iph.protocol != IPPROTO_UDP)
return EVL_RX_SKIP;
if (bpf_skb_load_bytes(skb, offset + iph.ihl * sizeof(__u32), &uh, sizeof(uh))) {
bpf_printk("cannot load UDP header");
return EVL_RX_SKIP;
}
return bpf_ntohs(uh.dest) == OOB_UDP_PORT ? EVL_RX_ACCEPT : EVL_RX_SKIP;
}
char _license[] SEC("license") = "GPL";The following diagram illustrates the flow of an incoming packet from the network interface controller to the service access point in the application, i.e. the socket.
Tip
An eBPF filter can be installed on the device for which an
out-of-band port is active using the
evl net -F<filter.o> -i<interface> management
command.
-
Reserved interface. If an out-of-band port is enabled on a base/physical network interface (i.e. not a VLAN device) and no eBPF module is installed on the latter, all IPv4 packets flowing in from this interface are automatically redirected to the EVL network stack. Typically, when multiple NICs are available on the platform, some of them may be reserved for handling out-of-band traffic exclusively.
-
VLAN tagging. If none of the previous rules matched, packets flowing in from an out-of-band VLAN (i.e. a VLAN network device which provides an out-of-band port) are redirected to the EVL network stack. EVL may be requested to accept traffic from any VLAN identifier, except reserved ones (namely #0, #1 and #4095). Using an out-of-band VLAN device is a simple way to share a physical network device for in-band and out-of-band networking at the same time.
Note
Since the 802.1Q standard
has been around for quite some time by now, most ethernet switches
should pass on frames with the ethertype information set to the
802.1Q TPID “as is” to some hardware port, and they should also be
able to cope with the four additional octets involved in VLAN tagging
without having to lower the MTU everywhere (most equipments even
support jumbo frames these days).
Egress traffic handling
EVL is in charge of building the outgoing packets for conveying the data sent by applications which run on the out-of-band execution stage. Analogously to receiving the ingress traffic, EVL can either send the outgoing packets directly from the out-of-band stage, or hand them over to the in-band network stack, depending on whether the NIC driver has been adapted in order to handle traffic from the out-of-band stage directly.
It is fundamental to understand that enabling an out-of-band port on a network device does not per se ensure end-to-end real-time communication through it. For this guarantee to be provided, the NIC driver must have been made oob-capable, so that packet transmit and receive operations to/from the hardware do happen directly from the out-of-band stage. However, in any case, EVL still guarantees that threads calling out-of-band receive/transmit services won’t demote to the in-band execution stage when doing so.
Typical use cases
Reserving a physical interface for out-of-band networking
When multiple network interfaces are available from the hardware platform, reserving some of them for sending and/or receiving out-of-band traffic is best to decrease network latency for real-time applications. This can be done by enabling an out-of-band port on a base/physical device directly (e.g. ’eth0’, ’eno2’). In this case, VLAN tagging is not required since all ingress traffic on this interface can be unconditionally redirected to the EVL network stack.
Sharing a network interface between in-band and out-of-band traffic
The hardware platform might have a single network interface available, in which case VLAN tagging may come in handy to direct incoming packets either to the EVL network stack or the in-band one. Obviously, this comes at a cost with respect to latency, since the in-band traffic might slow down the out-of-band packets at the hardware level. Likewise, the hardware would be shared for transmitting both in-band and out-of-band originated packets. This said, depending on the real-time requirements, that cost may still be within budget for many applications.
Dealing with complex out-of-band selection rules
An eBPF program allows deep inspection of the packet data before issuing any decision about which network stack should handle the traffic. One may rely on this feature to implement complex out-of-band traffic detection rules.
Out-of-band support in NIC drivers
Unlike Xenomai 3 with the RTnet stack, EVL provides a network stack which does not require EVL-specific drivers. Instead, the capabilities of the stock NIC drivers can be extended to supporting out-of-band I/O operations for dealing with ingress and egress traffic, based on facilities provided by Dovetail for that matter. If a driver does not advertise out-of-band I/O capabilities, EVL automatically offloads the final I/O operations to the in-band network stack for talking to the network device, allowing the application code to keep running on the out-of-band stage in parallel.
Although EVL does not require the NIC driver code to be oob-capable, i.e. conveying ingress and egress traffic directly from the out-of-band execution stage, having such support in place is the only way to have a complete, end-to-end real-time networking path, from the Ethernet wire to the application code. In other words, one may still use stock ethernet controller drivers along with the EVL network stack, at the expense of the real-time performance which would depend on the low-latency capabilities of the host kernel.
Out-of-band packet routing
EVL does no input routing. At all. All ingress packets accepted by EVL should either be delivered to a receiver process on the local host or dropped, there is no forwarding of any sort of such data to remote hosts.
However, EVL does a basic form of output routing for IPv4 over Ethernet, enough to satisfy the requirements of the UDP protocol for determining the proper output device to submit outgoing packets to, along with the MAC address to put into the Ethernet destination field of those packets. However, an out-of-band sender cannot perform direct lookups into the tables maintained by the in-band network stack to find the routing information it needs, this would contradict the stage separation rule. In order to address this issue, EVL maintains two front caches containing the related information:
-
a table of routes indexed on the IPv4 address of the receiving peer. Each entry refers to the routing information determined by the in-band kernel for those outgoing routes (i.e. a
struct rtableitem). -
a table of ARP entries indexed on a composite key combining the output device found by the in-band routing process for reaching the peer, and the IPv4 address where it may be reached from that device. Each entry contains the MAC Ethernet address to be written to the destination field of a packet for sending it to the given peer. This cache tracks the updates to its counterpart maintained by the in-band network stack.
In other words, EVL does not calculate any route, does not determine any destination MAC address by itself: it simply collects and learns this information from the in-band kernel when the latter estabishes a new route via some network device. When an out-of-band port is enabled for a device, EVL is notified about new outgoing routes established through such device, and records this information into its own caches, which in turn can be accessed directly from the out-of-band stage by real-time applications.
Peer solicitation
As mentioned above, EVL is able to figure out where to channel packets to an IPv4 peer based on the route and Ethernet addressing information it retrieves from its dedicated front caches. If such information is missing from those caches when needed by an out-of-band sender, the transmit operation is offloaded to the in-band network stack. Obviously, doing so drops any real-time guarantee in the process when it comes to packet transmission, although the out-of-band sender is not demoted to the in-band stage when this happens: the packet is simply scheduled for transmission as soon as possible via the in-band network stack. Therefore, we must make sure to have this information directly available from the EVL front caches in order to always meet real-time requirements, directly from the out-of-band stage.
The process of obtaining this information involves the peer host which should eventually reply to an ARP request with its MAC address, once we have determined through which network device such request should be sent. EVL calls this process peer solicitation. The result of a successful solicitation is recorded into the front caches. There are two ways to solicit a peer:
-
Using the
evl net -Si <interface>management command. -
Calling the evl_net_solicit() service from your application.
Explicitly soliciting a peer this way may involve broadcasting an ARP request under the hood, just like sending some IP packet would do if the MAC address of its destination is not known by the in-band stack yet. Eventually, the in-band stack should receive an ARP reply for that destination, which is forwarded to the EVL network stack, which in turn updates its front caches with such information.
Warning
Sending an UDP packet to a non-solicited peer will cause this packet to be offloaded to the in-band network stack, so no real-time guarantee can be expected in this case when it comes to packet transmission. As a result, the in-band stack would resolve the outgoing route, before issuing an ARP request, expecting the peer to reply with its MAC address. Such information would be recorded into the EVL front cache eventually. This means that a subsequent packet transmit might not require to be offloaded to the in-band stage, unless the ARP information ages enough to be invalidated, at which point we would be back to square #1. The only way to guarantee strict out-of-band resolution of outgoing routes is to explicitly solicit the peer before the first packet is issued. Marking the ARP entry as permanent further ensures that such information does not age, preventing its removal after a some time.
ARP cache management
Keep in mind that ARP entries are ageing unless made permanent. After
some time, the entry is no more usable and the corresponding peer
should be probed again to confirm its presence, which might cause
outgoing UDP packets to be offloaded to the in-band stage in the
meantime. For this reason, assuming the topology and addressing of
your real-time network is stable, it is wise to set the ARP entries of
real-time peers as permanent, typically by using the evl net -S
command form instead of evl net -s.
Since EVL monitors the updates to the ARP table maintained by the in-band network stack, you can also use the common administration tools such as arp(8) to add, remove or set the properties of ARP entries for real-time peers. Those changes will be mirrored into the related EVL front cache.
Flushing the EVL front cache from all ARP entries can be achieved by
writing a non-zero value to /sys/class/evl/net/arp, e.g.:
~# echo 1 > /sys/class/evl/net/arpRoute cache management
Flushing the EVL front cache from all IPv4 routes can be achieved by
writing a non-zero value to /sys/class/evl/net/ipv4_routes, e.g.:
~# echo 1 > /sys/class/evl/net/ipv4_routesSpecial addresses
Some IPv4 addresses do not need [ARP resolution] (#evl-output-routing). However, for strict out-of-band handling, we still need the in-band stack to establish a route to those destinations before the EVL network stack can transmit packets directly to these peers. In order to ensure this, you have to solicit any special IPv4 address listed below prior to transmitting packets.
Local host
When EVL is present, the loopback device (aka ’lo’) is automatically
turned into an oob-capable device. This means that ’localhost’ (127.0.0.1) is a perfectly valid
peer host for end-to-end real-time communication. However, you still
need to enable the out-of-band port on this device.
~# evl net -ei lo -S localhostLocal broadcast
The 'local broadcast' address (namely 255.255.255.255) can be used
with out-of-band ports. On the transmit side, you have to bind the
socket to the network device you want the packet to be broadcast
through:
#define SOME_UDP_PORT 42042
struct sockaddr_in dst_addr = { 0 };
struct oob_msghdr msghdr;
int on = 1, ret, s;
struct iovec iov;
s = socket(AF_INET, SOCK_DGRAM | SOCK_OOB, 0);
...
ret = setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, "eth0.42", sizeof "eth0.42" - 1);
...
ret = setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof(on));
...
dst_addr.sin_family = AF_INET;
dst_addr.sin_port = htons(SOME_UDP_PORT);
dest.sin_addr.s_addr = htonl(INADDR_BROADCAST);
...
msghdr.msg_name = &dst_addr;
msghdr.msg_namelen = sizeof(dst_addr);
msghdr.msg_iov = &iov;
msghdr.msg_iovlen = 1;
iov.iov_base = "mellow sword";
iov.iov_len = sizeof "mellow sword" - 1;
/* Broadcast message via VLAN device 'eth0.42'. */
ret = oob_sendmsg(s, &msghdr, NULL, 0);On the receive side, you may listen to broadcast packets from any interface for which an out-of-band port is enabled, or restrict the scope to a particular interface by binding it to the receiving socket:
#define SOME_UDP_PORT 42042
struct sockaddr_in addr = { 0 };
struct oob_msghdr msghdr = { 0 };
char buf[1024];
struct iovec iov = {
.iov_base = buf,
.iov_len = sizeof buf,
};
int ret, s;
s = socket(AF_INET, SOCK_DGRAM | SOCK_OOB, 0);
...
addr.sin_family = AF_INET;
addr.sin_port = htons(SOME_UDP_PORT);
addr.sin_addr.s_addr = INADDR_BROADCAST;
ret = bind(s, (struct sockaddr *)&addr, sizeof(addr));
...
/* Optionally, ask for broadcast packets from device 'eno2' only. */
ret = setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, "eno2", sizeof "eno2" - 1);
...
/* Receive next broadcast message (from network interface 'eno2'). */
msghdr.msg_iov = &iov;
msghdr.msg_iovlen = 1;
ret = oob_recvmsg(s, &msghdr, NULL, 0);Directed broadcast
Packets can be sent to all hosts within a specific network via the out-of-band stack. For instance, 10.1.2.255 is the IPv4 broadcast address for the 10.1.2.0/24 (sub-)network.
Multicast
The EVL network stack supports multicasting over UDP/IPv4 the same way you would use the in-band stack for the same purpose, e.g.:
#define SOME_UDP_PORT 42042
#define SOME_LOCAL_SRCIP "192.168.0.12"
#define SOME_MCAST_GROUP "239.17.5.1"
struct sockaddr_in dst_addr = { 0 };
struct oob_msghdr msghdr = { 0 };
struct in_addr src_addr = { 0 };
unsigned char ttl = 1;
struct iovec iov;
int ret, s;
s = socket(AF_INET, SOCK_DGRAM | SOCK_OOB, 0);
...
ret = setsockopt(s, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl));
...
src_addr.s_addr = inet_addr(SOME_LOCAL_SRCIP);
ret = setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, &src_addr, sizeof(src_addr));
...
dst_addr.sin_family = AF_INET;
dst_addr.sin_port = htons(SOME_UDP_PORT);
ret = inet_pton(AF_INET, SOME_MCAST_GROUP, &dst_addr.sin_addr);
...
msghdr.msg_name = &dst_addr;
msghdr.msg_namelen = sizeof(dst_addr);
msghdr.msg_iov = &iov;
msghdr.msg_iovlen = 1;
iov.iov_base = "mellow sword";
iov.iov_len = sizeof "mellow sword" - 1;
ret = oob_sendmsg(s, &msghdr, NULL, 0);