Thu Sep 12 2024
This is Part 3 in a series of posts describing the implementation of tcplat
, a
Go program which incorporates eBPF to passively measure the latency of TCP flows.
In this post, we’ll implement a Go program which matches SYN
/SYN-ACK
pairs
and calculates their latency, (as seen below in purple).
All the code for this part is located in this directory.
To facilitate the passive monitoring of TCP latency, our user space implementation must do the following.
tcplat
eBPF programSYN
/SYN-ACK
pairs and calculate the TCP latency of matched pairsWe’re going to encapsulate the logic related to the loading and unloading of
our TC classifier into a package, which we’ll call probe
. This is where our
eBPF program is located. (The probe
package will be part of a Go module called
tcplat
.)
.
├── go.mod
├── go.sum
├── main.go
└── probe
├── probe.go
└── tcplat.bpf.c
Just like before, let’s add a go:generate
directive in probe.go
.
package probe
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go probe tcplat.bpf.c -- -O2 -Wall
And once generated, we should have the following directory.
probe> go generate
Compiled /home/soda/tcplat/probe/probe_bpfel.o
Stripped /home/soda/tcplat/probe/probe_bpfel.o
Wrote /home/soda/tcplat/probe/probe_bpfel.go
Compiled /home/soda/tcplat/probe/probe_bpfeb.o
Stripped /home/soda/tcplat/probe/probe_bpfeb.o
Wrote /home/soda/tcplat/probe/probe_bpfeb.go
probe> ls
probe_bpfeb.go probe_bpfeb.o probe_bpfel.go probe_bpfel.o probe.go tcplat.bpf.c
This is our probe’s skeleton which we’re going to flesh out.
It has two main parts:
Probe
structure will provide a read-only channel for raw bytes
from the ring bufferAttachProbe()
function will be used to attach a probe to a specified
interface and populate the channel with ring buffer samples.type Probe struct {
Samples <-chan []byte
// New code here...
}
func AttachProbe(ifaceName string) (*Probe, error) {
// New code here...
}
func (p *Probe) Detach() {
close(p.Samples)
// New code here...
}
Before we attach our probe, we need to load it up in addition to verfiying that the interface we’d like to attach it to exists.
iface, err := net.InterfaceByName(ifaceName)
if err != nil {
return nil, err
}
objs := probeObjects{}
if err := loadProbeObjects(&objs, nil); err != nil {
return nil, err
}
Next, we need to attach our probe to the Interface’s ingress and egress hooks.
Given the last post, you would think that we would do this by, (1) programmatically
creating clsact
qdic
’s, (2) attaching them on ingress and egress, and (3) then
attach our eBPF program to them as a direct-action
TC classifier. This would all
be performed using the Netlink
API which is used to configure TC.
Well, as of kernel version 6.6, we can leverage the bpf_link
to attach to
TCX,
which is a revamped TC datapath in the Linux kernel.
This makes it trivial to attach our eBPF program.
ingressLink, err := link.AttachTCX(link.TCXOptions{
Interface: iface.Index,
Program: objs.Tcplat,
Attach: ebpf.AttachTCXIngress,
})
if err != nil {
return nil, err
}
egressLink, err := link.AttachTCX(link.TCXOptions{
Interface: iface.Index,
Program: objs.Tcplat,
Attach: ebpf.AttachTCXEgress,
})
if err != nil {
return nil, err
}
Finally, we start a separate Goroutine which reads from the ring buffer
and sends the byte samples to our channel. To do this, we use the
cilium/ebpf/ringbuf
package.
samples := make(chan []byte)
go func() {
for {
event, err := reader.Read()
if err != nil {
slog.Error("Failed to read from ring buffer", slog.Any("err", err))
close(samples)
return
}
samples <- event.RawSample
}
}()
That’s our Attach()
function done, we just need to return our probe.
return &Probe{
Samples: samples,
bpfObjects: objs,
ingressLink: ingressLink,
egressLink: egressLink,
ringbuf: reader,
}, nil
Leading to this final Probe
definition.
type Probe struct {
Samples <-chan []byte
bpfObjects probeObjects
ingressLink link.Link
egressLink link.Link
ringbuf *ringbuf.Reader
}
Now, when we detach our program, we need to make sure we unload the eBPF program itself, the TCX links, and the ring buffer.
func (p *Probe) Detach() {
p.bpfObjects.Close()
p.ingressLink.Close()
p.egressLink.Close()
p.ringbuf.Close()
}
And that’s our probe done!
If we go back to the root of our Go module and edit the main.go
file, we can attach our probe and see what the raw sample data looks
like.
package main
import (
"context"
"fmt"
"log/slog"
"os"
"os/signal"
"syscall"
"tcplat/probe"
"time"
)
func main() {
if len(os.Args) < 2 {
fmt.Println("Please specify a network interface")
return
}
ctx, cancel := context.WithCancel(context.Background())
stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-stop
cancel()
}()
ifaceName := os.Args[1]
probe, err := probe.AttachProbe(ifaceName)
if err != nil {
slog.Error("Failed to attach probe", slog.Any("err", err))
return
}
defer probe.Detach()
for {
select {
case <-ctx.Done():
return
case raw := <-probe.Samples:
fmt.Println(raw)
}
}
}
When we run this, we’ll attach this to the enp0s1
interface which has
traffic flowing over it on my machine. If you don’t see any output, you
can generate TCP traffic by using curl
with a protocol that relies on
TCP such as HTTP 2.
> go build -o tcplat && sudo ./tcplat enp0s1
[0 0 0 0 0 0 0 0 0 0 255 255 10 0 2 15 0 0 0 0 0 0 0 0 0 0 255 255 142 250 180 3 150 50 1 187 1 0 0 0 73 188 37 49 107 69 0 0]
[0 0 0 0 0 0 0 0 0 0 255 255 142 250 180 3 0 0 0 0 0 0 0 0 0 0 255 255 10 0 2 15 1 187 150 50 1 1 0 0 13 127 226 51 107 69 0 0]
Each packet we read from the ring buffer via the Samples
channel will
be a byte slice []byte
, we’re responsible for parsing this data into
some user space representation of this, which will be a following Go
structure.
type Timestamp uint64
type Packet struct {
SrcIP netip.Addr
DstIP netip.Addr
SrcPort uint16
DstPort uint16
Syn bool
Ack bool
Timestamp Timestamp
}
C structures are allocated in a contiguous block of memory, however, there is no guarantee that the fields of a structure are all adjacent to each other. Compilers may pad fields so that they are aligned on certain boundaries, e.g. 4-byte or 8-byte boundaries. Alignment leads to more efficient memory access, by reducing cache misses and bus transactions.
We need to take this padding into account when unmarshalling our
Packet
structure.
To visualise this padding we can use the pahole
program on the
tcplat
object file. Note that we have to compile using the -g
which emits extra debugging information into the object file.
> clang -Wall -O2 -target bpf -c tcplat.bpf.c -o tcplat.bpf.o -g
> pahole tcplat.bpf.o
struct packet_t {
struct in6_addr src_ip; /* 0 16 */
struct in6_addr dst_ip; /* 16 16 */
__be16 src_port; /* 32 2 */
__be16 dst_port; /* 34 2 */
_Bool syn; /* 36 1 */
_Bool ack; /* 37 1 */
/* XXX 2 bytes hole */
uint64_t timestamp; /* 40 8 */
/* size: 48 */
};
Using this information we can extract the offset and size of any
field we’d like. For example, the timestamp
field begins at
offset 40, and it is 8 bytes long. It also tells us the total
size of our structure, which is 48 bytes long.
Let’s unmarshal our packet now — we use the binary
package
to parse the big endian and little endian values.
We’ll insert this function in a main.go
file at the root
directory of our Go module.
func UnmarshalPacket(data []byte) (Packet, error) {
if len(data) != 48 {
return Packet{}, fmt.Errorf("slice is not 48 bytes")
}
srcIP, ok := netip.AddrFromSlice(data[0:16])
if !ok {
panic("invalid source IP")
}
dstIP, ok := netip.AddrFromSlice(data[16:32])
if !ok {
panic("invalid destination IP")
}
return Packet{
SrcIP: srcIP,
DstIP: dstIP,
SrcPort: binary.BigEndian.Uint16(data[32:34]),
DstPort: binary.BigEndian.Uint16(data[34:36]),
Syn: data[36] == 1,
Ack: data[37] == 1,
// 2-byte hole
Timestamp: Timestamp(binary.LittleEndian.Uint64(data[40:48])),
}, nil
}
We can go back to the for
loop wich reads new samples, and
insert our unmarshalling logic.
for {
select {
case <-ctx.Done():
return
case raw := <-probe.Samples:
p, err := UnmarshalPacket(raw)
if err != nil {
slog.Error("Failed to unmarshal packet", slog.Any("err", err))
continue
}
fmt.Println(p)
}
}
Let’s see its output.
> go build -o tcplat && sudo ./tcplat enp0s1
{::ffff:10.0.2.15 ::ffff:142.250.179.227 35760 443 true false 76653072243496}
{::ffff:142.250.179.227 ::ffff:10.0.2.15 443 35760 true true 76653115073081}
This is pretty cool, we can see the traffic used TCP over IPv4, shown by the fact
that we have IPv4-mapped IPv6 addresses. These two logs actually represent a SYN
and SYN-ACK
packet pair, notice that the two packets have inverse four tuples, where one has SYN
set to true
, and the other has both the SYN
and ACK
fields set to true
.
To relate a packet to its four-tuple connection, let’s take its hash. We’re going to use the FNV hash function which is very quick and efficient, which ensures that the hash function does not degrade the capability of our program to process packets quickly.
Importantly, when we hash the packets, we ensure it’s a commutative operation
by hashing the (source address, source port)
and (destination address, destination port)
tuples separately, and then adding them. This guarantees that two TCP packets with
inverse four-tuples will hash to the same value.
type Hash uint64
func (p *Packet) Hash() Hash {
f := func(v []byte) uint64 {
h := fnv.New64a()
h.Write(v)
return h.Sum64()
}
src := binary.BigEndian.AppendUint16(p.SrcIP.AsSlice(), p.SrcPort)
dst := binary.BigEndian.AppendUint16(p.DstIP.AsSlice(), p.DstPort)
return Hash(f(src) + f(dst))
}
We’re going to store this hash in a connection table which will track the timestamp of each packet via its hash.
type ConnectionTable struct {
table map[Hash]Timestamp
}
func NewConnectionTable() *ConnectionTable {
return &ConnectionTable{
table: make(map[Hash]Timestamp),
}
}
Now we’re at the final part of our program, figuring out when packets have matched.
This is done in two parts:
ACK
, that means we have a match, and we
calculate the durationSYN
we will be the first packet intercepted,
we insert the SYN
into the connection tablefunc (c *ConnectionTable) Match(p Packet) (time.Duration, bool) {
hash := p.Hash()
timestamp, ok := c.table[hash]
if ok && p.Ack {
d := time.Duration(p.Timestamp-timestamp) * time.Nanosecond
return d, true
}
if p.Syn {
c.table[hash] = p.Timestamp
}
return 0, false
}
Let’s edit the sample handling loop one more time.
Note, we make sure to call Unmap()
on the IP addresses in case
they are mapped, this ensures that they are printed correctly as IPv4,
instead of being printed as IPv6 addresses.
table := NewConnectionTable()
for {
select {
case <-ctx.Done():
return
case raw := <-probe.Samples:
p, err := UnmarshalPacket(raw)
if err != nil {
slog.Error("Failed to unmarshal packet", slog.Any("err", err))
continue
}
d, matched := table.Match(p)
if !matched {
continue
}
fmt.Printf("Matched SYN/SYN-ACK, Source: %s, Destination: %s, Latency %s\n",
p.SrcIP.Unmap(), p.DstIP.Unmap(), d)
}
}
And we can run the final program!
> go build -o tcplat && sudo ./tcplat enp0s1
Matched SYN/SYN-ACK, Source: 216.58.204.67, Destination: 10.0.2.15, Latency 45.540912ms
Matched SYN/SYN-ACK, Source: 104.16.132.229, Destination: 10.0.2.15, Latency 38.934399ms
Matched SYN/SYN-ACK, Source: 104.21.79.152, Destination: 10.0.2.15, Latency 45.702502ms
Matched SYN/SYN-ACK, Source: 104.21.79.152, Destination: 10.0.2.15, Latency 41.161913ms
Congratulations, as you’ve followed along, you’ve managed to write your own eBPF program that interfaces with a user space element.
Thank you for reading until the end! For any questions or comments about these posts or the code, please open an issue on GitHub.
© 2024-2025 Nadav Rahimi