Measuring TCP Latency Using eBPF: Part 3 - User Space

Thu Sep 12 2024

This is Part 3 in a series of posts describing the implementation of tcplat, a Go program which incorporates eBPF to passively measure the latency of TCP flows.

In this post, we’ll implement a Go program which matches SYN/SYN-ACK pairs and calculates their latency, (as seen below in purple).

Hybrid Program Structure Diagram

All the code for this part is located in this directory.

Overview

To facilitate the passive monitoring of TCP latency, our user space implementation must do the following.

  1. Load and attach our tcplat eBPF program
  2. Read the intercepted packets from the ring buffer eBPF map
  3. Scan for SYN/SYN-ACK pairs and calculate the TCP latency of matched pairs

Loading & Attaching

We’re going to encapsulate the logic related to the loading and unloading of our TC classifier into a package, which we’ll call probe. This is where our eBPF program is located. (The probe package will be part of a Go module called tcplat.)

.
├── go.mod
├── go.sum
├── main.go
└── probe
  ├── probe.go
  └── tcplat.bpf.c

Just like before, let’s add a go:generate directive in probe.go.

package probe

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go probe tcplat.bpf.c -- -O2 -Wall

And once generated, we should have the following directory.

probe> go generate
Compiled /home/soda/tcplat/probe/probe_bpfel.o
Stripped /home/soda/tcplat/probe/probe_bpfel.o
Wrote /home/soda/tcplat/probe/probe_bpfel.go
Compiled /home/soda/tcplat/probe/probe_bpfeb.o
Stripped /home/soda/tcplat/probe/probe_bpfeb.o
Wrote /home/soda/tcplat/probe/probe_bpfeb.go

probe> ls
probe_bpfeb.go  probe_bpfeb.o  probe_bpfel.go  probe_bpfel.o  probe.go  tcplat.bpf.c

This is our probe’s skeleton which we’re going to flesh out.

It has two main parts:

  1. A Probe structure will provide a read-only channel for raw bytes from the ring buffer
  2. An AttachProbe() function will be used to attach a probe to a specified interface and populate the channel with ring buffer samples.
type Probe struct {
  Samples <-chan []byte
  // New code here...
}

func AttachProbe(ifaceName string) (*Probe, error) {
  // New code here...
}

func (p *Probe) Detach() {
  close(p.Samples)
  // New code here...
}

Before we attach our probe, we need to load it up in addition to verfiying that the interface we’d like to attach it to exists.

iface, err := net.InterfaceByName(ifaceName)
if err != nil {
  return nil, err
}
objs := probeObjects{}
if err := loadProbeObjects(&objs, nil); err != nil {
  return nil, err
}

Next, we need to attach our probe to the Interface’s ingress and egress hooks.

Given the last post, you would think that we would do this by, (1) programmatically creating clsact qdic’s, (2) attaching them on ingress and egress, and (3) then attach our eBPF program to them as a direct-action TC classifier. This would all be performed using the Netlink API which is used to configure TC.

Well, as of kernel version 6.6, we can leverage the bpf_link to attach to TCX, which is a revamped TC datapath in the Linux kernel.

This makes it trivial to attach our eBPF program.

ingressLink, err := link.AttachTCX(link.TCXOptions{
  Interface: iface.Index,
  Program:   objs.Tcplat,
  Attach:    ebpf.AttachTCXIngress,
})
if err != nil {
  return nil, err
}
egressLink, err := link.AttachTCX(link.TCXOptions{
  Interface: iface.Index,
  Program:   objs.Tcplat,
  Attach:    ebpf.AttachTCXEgress,
})
if err != nil {
  return nil, err
}

Finally, we start a separate Goroutine which reads from the ring buffer and sends the byte samples to our channel. To do this, we use the cilium/ebpf/ringbuf package.

samples := make(chan []byte)
go func() {
  for {
    event, err := reader.Read()
    if err != nil {
      slog.Error("Failed to read from ring buffer", slog.Any("err", err))
      close(samples)
      return
    }
    samples <- event.RawSample
  }
}()

That’s our Attach() function done, we just need to return our probe.

return &Probe{
  Samples:     samples,
  bpfObjects:  objs,
  ingressLink: ingressLink,
  egressLink:  egressLink,
  ringbuf:     reader,
}, nil

Leading to this final Probe definition.

type Probe struct {
  Samples <-chan []byte

  bpfObjects  probeObjects
  ingressLink link.Link
  egressLink  link.Link
  ringbuf     *ringbuf.Reader
}

Now, when we detach our program, we need to make sure we unload the eBPF program itself, the TCX links, and the ring buffer.

func (p *Probe) Detach() {
  p.bpfObjects.Close()
  p.ingressLink.Close()
  p.egressLink.Close()
  p.ringbuf.Close()
}

And that’s our probe done!

If we go back to the root of our Go module and edit the main.go file, we can attach our probe and see what the raw sample data looks like.

package main

import (
  "context"
  "fmt"
  "log/slog"
  "os"
  "os/signal"
  "syscall"
  "tcplat/probe"
  "time"
)

func main() {
  if len(os.Args) < 2 {
    fmt.Println("Please specify a network interface")
    return
  }

  ctx, cancel := context.WithCancel(context.Background())
  stop := make(chan os.Signal, 1)
  signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
  go func() {
    <-stop
    cancel()
  }()

  ifaceName := os.Args[1]
  probe, err := probe.AttachProbe(ifaceName)
  if err != nil {
    slog.Error("Failed to attach probe", slog.Any("err", err))
    return
  }
  defer probe.Detach()

  for {
    select {
    case <-ctx.Done():
      return
    case raw := <-probe.Samples:
      fmt.Println(raw)
    }
  }
}

When we run this, we’ll attach this to the enp0s1 interface which has traffic flowing over it on my machine. If you don’t see any output, you can generate TCP traffic by using curl with a protocol that relies on TCP such as HTTP 2.

> go build -o tcplat && sudo ./tcplat enp0s1
[0 0 0 0 0 0 0 0 0 0 255 255 10 0 2 15 0 0 0 0 0 0 0 0 0 0 255 255 142 250 180 3 150 50 1 187 1 0 0 0 73 188 37 49 107 69 0 0]
[0 0 0 0 0 0 0 0 0 0 255 255 142 250 180 3 0 0 0 0 0 0 0 0 0 0 255 255 10 0 2 15 1 187 150 50 1 1 0 0 13 127 226 51 107 69 0 0]

Reading Packet Samples

Each packet we read from the ring buffer via the Samples channel will be a byte slice []byte, we’re responsible for parsing this data into some user space representation of this, which will be a following Go structure.

type Timestamp uint64

type Packet struct {
  SrcIP     netip.Addr
  DstIP     netip.Addr
  SrcPort   uint16
  DstPort   uint16
  Syn       bool
  Ack       bool
  Timestamp Timestamp
}

C structures are allocated in a contiguous block of memory, however, there is no guarantee that the fields of a structure are all adjacent to each other. Compilers may pad fields so that they are aligned on certain boundaries, e.g. 4-byte or 8-byte boundaries. Alignment leads to more efficient memory access, by reducing cache misses and bus transactions.

We need to take this padding into account when unmarshalling our Packet structure.

To visualise this padding we can use the pahole program on the tcplat object file. Note that we have to compile using the -g which emits extra debugging information into the object file.

> clang -Wall -O2 -target bpf -c tcplat.bpf.c -o tcplat.bpf.o -g

> pahole tcplat.bpf.o
struct packet_t {
  struct in6_addr  src_ip;     /*     0    16 */
  struct in6_addr  dst_ip;     /*    16    16 */
  __be16           src_port;   /*    32     2 */
  __be16           dst_port;   /*    34     2 */
  _Bool            syn;        /*    36     1 */
  _Bool            ack;        /*    37     1 */

  /* XXX 2 bytes hole */

  uint64_t         timestamp;  /*    40     8 */

  /* size: 48 */
};

Using this information we can extract the offset and size of any field we’d like. For example, the timestamp field begins at offset 40, and it is 8 bytes long. It also tells us the total size of our structure, which is 48 bytes long.

Let’s unmarshal our packet now — we use the binary package to parse the big endian and little endian values.

We’ll insert this function in a main.go file at the root directory of our Go module.

func UnmarshalPacket(data []byte) (Packet, error) {
  if len(data) != 48 {
    return Packet{}, fmt.Errorf("slice is not 48 bytes")
  }
  srcIP, ok := netip.AddrFromSlice(data[0:16])
  if !ok {
    panic("invalid source IP")
  }
  dstIP, ok := netip.AddrFromSlice(data[16:32])
  if !ok {
    panic("invalid destination IP")
  }

  return Packet{
    SrcIP:   srcIP,
    DstIP:   dstIP,
    SrcPort: binary.BigEndian.Uint16(data[32:34]),
    DstPort: binary.BigEndian.Uint16(data[34:36]),
    Syn:     data[36] == 1,
    Ack:     data[37] == 1,
    // 2-byte hole
    Timestamp: Timestamp(binary.LittleEndian.Uint64(data[40:48])),
  }, nil
}

We can go back to the for loop wich reads new samples, and insert our unmarshalling logic.

for {
  select {
  case <-ctx.Done():
    return
  case raw := <-probe.Samples:
    p, err := UnmarshalPacket(raw)
    if err != nil {
      slog.Error("Failed to unmarshal packet", slog.Any("err", err))
      continue
    }
    fmt.Println(p)
  }
}

Let’s see its output.

> go build -o tcplat && sudo ./tcplat enp0s1
{::ffff:10.0.2.15 ::ffff:142.250.179.227 35760 443 true false 76653072243496}
{::ffff:142.250.179.227 ::ffff:10.0.2.15 443 35760 true true 76653115073081}

This is pretty cool, we can see the traffic used TCP over IPv4, shown by the fact that we have IPv4-mapped IPv6 addresses. These two logs actually represent a SYN and SYN-ACK packet pair, notice that the two packets have inverse four tuples, where one has SYN set to true, and the other has both the SYN and ACK fields set to true.

Scanning TCP Connection Pairs

To relate a packet to its four-tuple connection, let’s take its hash. We’re going to use the FNV hash function which is very quick and efficient, which ensures that the hash function does not degrade the capability of our program to process packets quickly.

Importantly, when we hash the packets, we ensure it’s a commutative operation by hashing the (source address, source port) and (destination address, destination port) tuples separately, and then adding them. This guarantees that two TCP packets with inverse four-tuples will hash to the same value.

type Hash uint64

func (p *Packet) Hash() Hash {
  f := func(v []byte) uint64 {
    h := fnv.New64a()
    h.Write(v)
    return h.Sum64()
  }

  src := binary.BigEndian.AppendUint16(p.SrcIP.AsSlice(), p.SrcPort)
  dst := binary.BigEndian.AppendUint16(p.DstIP.AsSlice(), p.DstPort)

  return Hash(f(src) + f(dst))
}

We’re going to store this hash in a connection table which will track the timestamp of each packet via its hash.

type ConnectionTable struct {
  table map[Hash]Timestamp
}

func NewConnectionTable() *ConnectionTable {
  return &ConnectionTable{
    table: make(map[Hash]Timestamp),
  }
}

Now we’re at the final part of our program, figuring out when packets have matched.

This is done in two parts:

  1. If the hash of our packet already exists in the connection table, and, our packet is an ACK, that means we have a match, and we calculate the duration
  2. Otherwise, if we are a SYN we will be the first packet intercepted, we insert the SYN into the connection table
func (c *ConnectionTable) Match(p Packet) (time.Duration, bool) {
  hash := p.Hash()

  timestamp, ok := c.table[hash]
  if ok && p.Ack {
    d := time.Duration(p.Timestamp-timestamp) * time.Nanosecond
    return d, true
  }
  if p.Syn {
    c.table[hash] = p.Timestamp
  }

  return 0, false
}

Let’s edit the sample handling loop one more time.

Note, we make sure to call Unmap() on the IP addresses in case they are mapped, this ensures that they are printed correctly as IPv4, instead of being printed as IPv6 addresses.

table := NewConnectionTable()
for {
  select {
  case <-ctx.Done():
    return
  case raw := <-probe.Samples:
    p, err := UnmarshalPacket(raw)
    if err != nil {
      slog.Error("Failed to unmarshal packet", slog.Any("err", err))
      continue
    }
    d, matched := table.Match(p)
    if !matched {
      continue
    }
    fmt.Printf("Matched SYN/SYN-ACK, Source: %s, Destination: %s, Latency %s\n",
      p.SrcIP.Unmap(), p.DstIP.Unmap(), d)
  }
}

And we can run the final program!

> go build -o tcplat && sudo ./tcplat enp0s1
Matched SYN/SYN-ACK, Source: 216.58.204.67, Destination: 10.0.2.15, Latency 45.540912ms
Matched SYN/SYN-ACK, Source: 104.16.132.229, Destination: 10.0.2.15, Latency 38.934399ms
Matched SYN/SYN-ACK, Source: 104.21.79.152, Destination: 10.0.2.15, Latency 45.702502ms
Matched SYN/SYN-ACK, Source: 104.21.79.152, Destination: 10.0.2.15, Latency 41.161913ms

Conclusion

Congratulations, as you’ve followed along, you’ve managed to write your own eBPF program that interfaces with a user space element.

Thank you for reading until the end! For any questions or comments about these posts or the code, please open an issue on GitHub.

© 2024-2025 Nadav Rahimi