Measuring TCP Flow Latency Using eBPF: Part 1

Measuring TCP Flow Latency Using eBPF: Part 1 - "Hello World!"

Sat Aug 31 2024

This is Part 1 in a series of posts describing the implementation of tcplat, a Go program which incorporates eBPF to passively measure the latency of TCP flows. Here is a complete list:

Part 1: “Hello World!” (this post)
Part 2: Kernel Space
Part 3: User Space

The goal of this series of posts, is to help beginners who may only have limited knowledge of the Linux networking stack and the C programming language, develop the understanding and intuition for writing eBPF programs, and help them cross the first few hurdles they may encounter. Disclaimer: this series reflects my learnings, and is not a one-stop-shop for how eBPF or the Linux networking stack work.

In this post, we’ll review foundational eBPF concepts, and write our own eBPF-style “Hello World!” program to get to grips with the eBPF workflow.

All the code for this part is located in this directory.

Finally, this series of blog posts is based on previous work done by Pourya Jamshidi and Mark Pashmfouroush. All credit goes to them for inspiring the idea, and for sharing their knowledge thereby making eBPF a much more accessible technology.

Primer on eBPF

eBPF, (known as the extended Berkley Packet Filter), is a kernel technology which enables developers to write custom code that can be loaded into the kernel dynamically, affecting the way the kernel behaves — this enables a new generation of highly performant networking, observability, and security tools.

— Liz Rice, Learning eBPF

eBPF programs are typically written using pseudo-C code, (a restricted subset of the C programming language), or Rust. They are then compiled down to eBPF bytecode which is interpreted by the Linux kernel.

Once loaded into the kernel, programs are event-driven: they are attached to specific events, and are triggered by them each time they fire until they are unloaded from the kernel. Note that eBPF programs are not neccessarily unloaded once the process which loaded them has ended, meaning that once loaded, they may fire for an indefinite period of time.

eBPF Syscall Diagram

Attachable events include:

System calls
Network events (packet I/O)
Function entry/exit
Kernel tracepoints

For example, we could attach an eBPF program to an event which fires each time we receive an ingressing packet, and insert some logic which drops this packet if we believe it’s arriving from a malicious source — this is a common approach used in DDoS mitigation.

Critically, all eBPF programs have to pass a verification check which validates that they are safe to run, this is performed by an entity known as the eBPF verifier, it ensures that:

The program does not crash or harm the system, (i.e. we do not access out-of-bounds memory)
The program always runs to completion, (i.e. we never block or sit in a loop forever, which could hold up important kernel processing)

eBPF programs which are loaded into the kernel can communicate with user space applications by using constructs called eBPF maps. These are key-value stores which can be read from and written to in both kernel and user space.

Different maps are available to store different types of data, including:

Hash tables
Arrays
Ring buffers
Longest Prefix Match (LPM) tries

Finally, because eBPF programs are generic over kernel versions, they cannot call arbitrary kernel functions such as strcpy. Instead eBPF programs make function calls to a set of helper functions.

eBPF’s “Hello World!”

To illustrate how eBPF programs are written, loaded, and run, let’s write an eBPF equivalent of “Hello World!“.

Remember that we can’t run our eBPF program manually, instead we’re going to write a program which prints “Hello World!” each time some syscall is fired. The exact choice of syscall is arbitrary, for this example we’ll choose the execve syscall which is used by shells to execute programs. This makes it trivial to trigger an event and observe our program in action, (i.e. to trigger the syscall we just run some program such as ls).

(Note that our output is actually emitted to a trace pseudo-file which acts as a log, this is elaborated on later.)

"Hello World!" Diagram

To do this, we first need to set up an appropriate development environment, we’re going to install the following four packages:

clang — used to compile C source code into eBPF bytecode
llvm — necessary backend infrastructure required to run clang. LLVM generates ELF files which contain necessary information for eBPF loaders such as libbpf to load programs into the kernel
libbpf — C library used to interact with eBPF programs and maps
bpftool — utility for loading, attaching, and inspecting eBPF programs and maps

If you use nix, these would be the specific packages that you would install.

[
    clang_18
    llvm_18
    libbpf
    bpftools
]

Now, let’s analyse the full source code for hello.bpf.c, our eBPF “Hello World!” program.

(We use the .bpf.c suffix as convention to indicate that a file is written in pseudo-C code for eBPF purposes.)

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

SEC("kprobe/sys_execve")
int hello(void *ctx) {
    bpf_printk("Hello World!");
    return 0;
}

char LICENSE[] SEC("license") = "GPL v2";

We can split up what this program does into three key sections.

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

First, we import the BPF header files. These define the eBPF-related structures and functions that our “Hello World!” program is going to use.

SEC("kprobe/sys_execve")
int hello(void *ctx) {
    bpf_printk("Hello World!");
    return 0;
}

Second, we define our eBPF program as a C function called hello. We specify that this program should be run each time the execve syscall is fired by using libbpf’s SEC() macro, (which acts like a Python decorator). Specifically, we attach to the execve syscall using a Kernel Probe (Kprobe), which is why the section name begins with kprobe.

Kprobes are used to dynamically break into kernel routines to collect debugging and performance information non-disruptively.

They are one of many types of eBPF programs — the kernel may restrict or allow certain features depending on the program type, and the verifier will enforce such restrictions. In order for these probes to be loadable, the SEC() macro creates a section called kprobe/sys_execve in the compiled ELF object, so that libbpf knows to load our hello program as a Kprobe. (To know the specific function name for the execve syscall for your architecture, you can take a look the /proc/kallsyms file on your machine, which lists all the kernel symbols including their function names.)

We then use libbpf’s bpf_printk helper to log a “Hello World!” message each time a program is executed. These logs are always emitted to the /sys/kernel/debug/tracing/trace_pipe file. (Note that you have to be sudo to access it).

char LICENSE[] SEC("license") = "GPL v2";

Third, we use another SEC() macro to define the license string, this is a crucial requirement for eBPF programs. Some of the BPF helper functions in the kernel are defined as “GPL only”, so if we want to use any of these functions our BPF code must have a GPL-compatible license.

Now we can compile our program into an ELF object file.

> clang -Wall -O2 -target bpf -c hello.bpf.c -o hello.bpf.o

The clang command is structured as follows:

-Wall — this enables a set of common warnings that can help catch potential issues in code
-O2 — this tells the compiler that it should attempt to optimise the performance of the code in addition to reducing the size of the generated output
-target bpf — this specifies that clang should compile the code for the bpf target architecture
-c hello.bpf.c — we specify the name of our source code file
-o hello.bpf.o — we specify the name of the generated ELF file

For reference, we can also display the contents of the output ELF file. Notice that the fourth section (Idx 3) contains our Kprobe program, and the seventh section (Idx 6) contains the license.

> llvm-objdump -h hello.bpf.o

hello.bpf.o:	file format elf64-bpf

Sections:
Idx Name                          Size     VMA              Type
  0                       00000000 0000000000000000
  1 .strtab               00000077 0000000000000000
  2 .text                 00000000 0000000000000000 TEXT
  3 kprobe/sys_execve     00000030 0000000000000000 TEXT
  4 .relkprobe/sys_execve 00000010 0000000000000000
  5 .rodata               0000000d 0000000000000000 DATA
  6 license               00000007 0000000000000000 DATA
  7 .llvm_addrsig         00000003 0000000000000000
  8 .symtab               00000090 0000000000000000

Now, we can use bpftool to load our program into the kernel. This loads the eBPF program from the compiled object file and pins it to the location /sys/fs/bpf/hello.

Additionally, since our eBPF program is a tracing program, we must specify the autoattach argument so that bpftool attaches our program to the execve Kprobe once it has been loaded into the kernel, (see more discussion regarding autoattach here).

> sudo bpftool prog load hello.bpf.o /sys/fs/bpf/hello autoattach

No output in response to this command indicates success. If we’d like, we can verify that our program has loaded if it exists in the /sys/fs/bpf directory.

> sudo ls /sys/fs/bpf
hello

We can also use bpftool to list the currently loaded eBPF programs. For clarity, I will just show the lines related to our “Hello World!” program, which has been assigned an ID of 44.

> sudo bpftool prog list
...
44: kprobe  name hello  tag 08424f7d1079fa76  gpl
	loaded_at 2024-08-10T17:44:03+0100  uid 0
	xlated 48B  jited 128B  memlock 4096B  map_ids 9

To see the emitted tracing output, we can cat the /sys/kernel/debug/tracing/trace_pipe file. Depending on what is happening on your machine, you may see the tracing output instantly, because other processes could be executing programs using the execve syscall. If you don’t see anything, open a second terminal and execute any commands you like, (I recommend ls), and you’ll see the corresponding trace generated by the program.

> sudo cat /sys/kernel/debug/tracing/trace_pipe
  cat-2322    [001] d...1 12373.067163: bpf_trace_printk: Hello World!
<...>-2323    [002] d...1 12376.370957: bpf_trace_printk: Hello World!
<...>-2325    [001] d...1 12382.956763: bpf_trace_printk: Hello World!

These traces are structured in a specific format:

cat-2322 — refers to the process which triggered the hook and its PID
[001] — is the number of the CPU running the eBPF program
d...1 — the trace event’s flags, (d means kernel space)
12382.956763 — the timestamp since system boot
bpf_trace_printk: Hello World! — the log itself

Finally, to unload the program, we delete it from its pinned location.

> sudo rm /sys/fs/bpf/hello

Programmatic Loading of “Hello World!”

In some cases, we may want to programmatically load and unload eBPF programs instead of using bpftool. For example, consider some hypothetical scenario where there are multiple eBPF programs loaded into the kernel used for tracing different syscalls. We might want some management program to dynamically load and unload these programs at runtime based on user input.

To do this, we will use the cillium/ebpf Go library, which provides utilities to compile, load, and manage eBPF programs. Specifically, we use the bpf2go tool to compile eBPF programs into bytecode, and embed them into our Go source code. Once the Go program is compiled, we have a single Go binary which contains our eBPF bytecode that we can distribute.

"Hello World!" Diagram - Programmatic Version

Ok, let’s write a Go program which encapsulates our “Hello World!” eBPF program. We’ll have it take care of loading and attaching our eBPF program, in addition to streaming the contents of the trace_pipe file to os.Stdout.

So, in the same directory as our eBPF program, let’s create a Go module for our program, and also install the cilium/ebpf library which we will use to compile and load our program.

> go mod init hello
go: creating new go.mod: module hello

> go get -u github.com/cilium/ebpf
go: downloading golang.org/x/sys v0.24.0
go: downloading golang.org/x/exp v0.0.0-20240808152545-0cdaa3abc0fa
go: added github.com/cilium/ebpf v0.16.0
go: added golang.org/x/exp v0.0.0-20240808152545-0cdaa3abc0fa
go: added golang.org/x/sys v0.24.0

Our directory structure now looks like the following.

.
├── go.mod
├── go.sum
└── hello.bpf.c

Before we use cilium/ebpf to compile our hello.bpf.c program, we’re going to prepend //go:build ignore to it, so that the Go compiler ignores it when building the final executable. If we don’t do this, Go won’t be able to build our final program successfully.

It now looks like the following.

//go:build ignore

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

SEC("kprobe/sys_execve")
int hello(void *ctx) {
    bpf_printk("Hello World!");
    return 0;
}

char LICENSE[] SEC("license") = "GPL v2";

Compiling and embedding our program into Go source code is a two step process.

First, we create an (almost) empty main.go file with one go:generate directive, this dictates how our Go source code should be generated given our hello.bpf.c file. It creates skeleton code for manipulating the eBPF objects, in addition to the .o object files which contain the eBPF bytecode.

package main

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go probe hello.bpf.c -- -O2 -Wall

This directive is split into four parts:

go run github.com/cilium/ebpf/cmd/bpf2go — this invokes the bpf2go tool each time we call go generate
probe — this specifies that the generated files, (both object files and Go source code files), should be prefixed with the probe keyword
hello.bpf.c — this points bpf2go towards our pseudo-C eBPF program
-O2 -Wall — all arguments after the two dashed lines are passed to the Clang compiler which bpf2go uses to compile our eBPF program

> go generate
Compiled probe_bpfeb.o
Stripped probe_bpfeb.o
Wrote probe_bpfeb.go
Compiled probe_bpfel.o
Stripped probe_bpfel.o
Wrote probe_bpfel.go

Notice that bpf2go generated two versions of both the .o bytecode files and the Go source files, one version for big-endian architectures, and one for little-endian ones. These files are suffixed with _bpfeb.* and _bpfel.* respectively.

If we take a look at the generated Go code, we see that it contains structures representing our eBPF program. (See how our pseudo-C hello function is represented in the generated Go code as probePrograms.Hello?)

type probePrograms struct {
    Hello *ebpf.Program `ebpf:"hello"`
}

All auto-generated structures are grouped into a probeObjects structure which represents everything that’s being loaded into the kernel.

type probeObjects struct {
    probePrograms
    probeMaps
}

Cool — we’ve completed the first step for embedding our eBPF program. Let’s take a look at what our directory structure looks like now.

.
├── go.mod
├── go.sum
├── hello.bpf.c
├── main.go
├── probe_bpfeb.go
├── probe_bpfeb.o
├── probe_bpfel.go
└── probe_bpfel.o

Now that we have our auto-generated skeleton, we can begin to write code which loads and attaches our “Hello World!” program, in addition to streaming the contents of trace_pipe.

Let’s analyse the full source code for main.go.

package main

import (
    "io"
    "log"
    "os"
    "os/signal"
    "syscall"

    "github.com/cilium/ebpf/link"
)

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go probe hello.bpf.c -- -O2 -Wall

func main() {
    objs := probeObjects{}
    if err := loadProbeObjects(&objs, nil); err != nil {
        log.Fatal(err)
    }
    defer objs.Close()

    kp, err := link.Kprobe("sys_execve", objs.Hello, nil)
    if err != nil {
        log.Fatal(err)
    }
    defer kp.Close()

    tracePipe, err := os.Open("/sys/kernel/debug/tracing/trace_pipe")
    if err != nil {
        log.Fatal(err)
    }
    defer tracePipe.Close()

    stop := make(chan os.Signal, 1)
    signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-stop
        tracePipe.Close()
    }()

    io.Copy(os.Stdout, tracePipe)
}

We can divide this program into two key sections.

objs := probeObjects{}
if err := loadProbeObjects(&objs, nil); err != nil {
    log.Fatal(err)
}
defer objs.Close()

kp, err := link.Kprobe("sys_execve", objs.Hello, nil)
if err != nil {
    log.Fatal(err)
}
defer kp.Close()

First, we load our eBPF program using loadProbeObjects which populates the objs struct with relevant eBPF data, including the program’s name, type, license, and eBPF assembly instructions.

Then, (unlike with bpftool), we have to use link.Kprobe to manually attach our eBPF program, referred to as objs.Hello, to the execve syscall, which we specify as sys_execve.

tracePipe, err := os.Open("/sys/kernel/debug/tracing/trace_pipe")
if err != nil {
    log.Fatal(err)
}
defer tracePipe.Close()

stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
go func() {
    <-stop
    tracePipe.Close()
}()

io.Copy(os.Stdout, tracePipe)

Second, we acquire a read-lock on the trace_pipe file, and pipe its contents to os.Stdout until we receive either a SIGINT or a SIGTERM syscall from the kernel, (this enables us to use Ctrl+C to stop our program).

The execution of our “Hello World!” program is now trivially reduced to running one command. Also, since our program does not need to be indefinitely pinned in order to be loaded into the kernel, it is automatically unloaded each time our Go process ends.

> go build -o hello && sudo ./hello
<...>-895185  [001] ...21 3853541.869581: bpf_trace_printk: Hello World!
<...>-895187  [000] ...21 3853541.875591: bpf_trace_printk: Hello World!
<...>-895188  [001] ...21 3853542.488578: bpf_trace_printk: Hello World!
<...>-895190  [000] ...21 3853542.493632: bpf_trace_printk: Hello World!

Congrats, we’ve written our first eBPF program!

What’s Next?

Thank you for reading so far! Please let me if you have some feedback or corrections.

If you’re looking for more eBPF reading, you should check out:

Cillium’s BPF Reference
eBPF.io’s Introduction

Now you should be ready to move on to Part 2, where we start implementing the pseudo-C eBPF program to measure TCP flow latency.