Sat Aug 31 2024
This is Part 1 in a series of posts describing the implementation of tcplat
, a
Go program which incorporates eBPF to passively measure the latency of TCP flows.
Here is a complete list:
The goal of this series of posts, is to help beginners who may only have limited knowledge of the Linux networking stack and the C programming language, develop the understanding and intuition for writing eBPF programs, and help them cross the first few hurdles they may encounter. Disclaimer: this series reflects my learnings, and is not a one-stop-shop for how eBPF or the Linux networking stack work.
In this post, we’ll review foundational eBPF concepts, and write our own eBPF-style “Hello World!” program to get to grips with the eBPF workflow.
All the code for this part is located in this directory.
Finally, this series of blog posts is based on previous work done by Pourya Jamshidi and Mark Pashmfouroush. All credit goes to them for inspiring the idea, and for sharing their knowledge thereby making eBPF a much more accessible technology.
eBPF, (known as the extended Berkley Packet Filter), is a kernel technology which enables developers to write custom code that can be loaded into the kernel dynamically, affecting the way the kernel behaves — this enables a new generation of highly performant networking, observability, and security tools.
— Liz Rice, Learning eBPF
eBPF programs are typically written using pseudo-C code, (a restricted subset of the C programming language), or Rust. They are then compiled down to eBPF bytecode which is interpreted by the Linux kernel.
Once loaded into the kernel, programs are event-driven: they are attached to specific events, and are triggered by them each time they fire until they are unloaded from the kernel. Note that eBPF programs are not neccessarily unloaded once the process which loaded them has ended, meaning that once loaded, they may fire for an indefinite period of time.
Attachable events include:
For example, we could attach an eBPF program to an event which fires each time we receive an ingressing packet, and insert some logic which drops this packet if we believe it’s arriving from a malicious source — this is a common approach used in DDoS mitigation.
Critically, all eBPF programs have to pass a verification check which validates that they are safe to run, this is performed by an entity known as the eBPF verifier, it ensures that:
eBPF programs which are loaded into the kernel can communicate with user space applications by using constructs called eBPF maps. These are key-value stores which can be read from and written to in both kernel and user space.
Different maps are available to store different types of data, including:
Finally, because eBPF programs are generic over kernel versions, they cannot call arbitrary
kernel functions such as strcpy
. Instead eBPF programs make function calls to a set of
helper functions.
To illustrate how eBPF programs are written, loaded, and run, let’s write an eBPF equivalent of “Hello World!“.
Remember that we can’t run our eBPF program manually, instead we’re going to write a
program which prints “Hello World!” each time some syscall is fired. The exact choice of
syscall is arbitrary, for this example we’ll choose the
execve
syscall which is used by
shells to execute programs. This makes it trivial to trigger an event and observe our
program in action, (i.e. to trigger the syscall we just run some program such as ls
).
(Note that our output is actually emitted to a trace pseudo-file which acts as a log, this is elaborated on later.)
To do this, we first need to set up an appropriate development environment, we’re going to install the following four packages:
clang
— used to compile C source code into eBPF bytecodellvm
— necessary backend infrastructure required to run clang
. LLVM generates
ELF files which contain necessary information for eBPF loaders such as
libbpf
to load programs into the kernellibbpf
— C library used to interact with eBPF programs and mapsbpftool
— utility for loading, attaching, and inspecting eBPF programs and mapsIf you use nix
, these would be the specific packages that you would install.
[
clang_18
llvm_18
libbpf
bpftools
]
Now, let’s analyse the full source code for hello.bpf.c
, our eBPF “Hello World!”
program.
(We use the .bpf.c
suffix as convention to indicate that a file is written in
pseudo-C code for eBPF purposes.)
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("kprobe/sys_execve")
int hello(void *ctx) {
bpf_printk("Hello World!");
return 0;
}
char LICENSE[] SEC("license") = "GPL v2";
We can split up what this program does into three key sections.
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
First, we import the BPF header files. These define the eBPF-related structures and functions that our “Hello World!” program is going to use.
SEC("kprobe/sys_execve")
int hello(void *ctx) {
bpf_printk("Hello World!");
return 0;
}
Second, we define our eBPF program as a C function called hello
. We specify
that this program should be run each time the execve
syscall is fired by using
libbpf
’s SEC()
macro, (which acts like a Python decorator). Specifically, we
attach to the execve
syscall using a Kernel Probe (Kprobe), which is why the
section name begins with kprobe
.
Kprobes are used to dynamically break into kernel routines to collect debugging and performance information non-disruptively.
They are one of many types
of eBPF programs — the kernel may restrict or allow certain features depending
on the program type, and the verifier will enforce such restrictions. In order
for these probes to be loadable, the SEC()
macro creates a section called
kprobe/sys_execve
in the compiled ELF object, so that libbpf
knows
to load our hello
program as a Kprobe. (To know the specific function name for
the execve
syscall for your architecture, you can take a look the /proc/kallsyms
file on your machine, which lists all the kernel symbols including their function
names.)
We then use libbpf
’s bpf_printk
helper
to log a “Hello World!” message each time a program is executed. These logs are
always emitted to the /sys/kernel/debug/tracing/trace_pipe
file. (Note that you
have to be sudo
to access it).
char LICENSE[] SEC("license") = "GPL v2";
Third, we use another SEC()
macro to define the license string, this is a
crucial requirement for eBPF programs. Some of the BPF helper functions in the
kernel are defined as “GPL only”, so if we want to use any of these functions our
BPF code must have a GPL-compatible license.
Now we can compile our program into an ELF object file.
> clang -Wall -O2 -target bpf -c hello.bpf.c -o hello.bpf.o
The clang
command is structured as follows:
-Wall
— this enables a set of common warnings that can help catch potential
issues in code-O2
— this tells the compiler that it should attempt to optimise the performance
of the code in addition to reducing the size of the generated output-target bpf
— this specifies that clang should compile the code for the bpf
target architecture-c hello.bpf.c
— we specify the name of our source code file-o hello.bpf.o
— we specify the name of the generated ELF fileFor reference, we can also display the contents of the output ELF file.
Notice that the fourth section (Idx 3
) contains our Kprobe program, and
the seventh section (Idx 6
) contains the license.
> llvm-objdump -h hello.bpf.o
hello.bpf.o: file format elf64-bpf
Sections:
Idx Name Size VMA Type
0 00000000 0000000000000000
1 .strtab 00000077 0000000000000000
2 .text 00000000 0000000000000000 TEXT
3 kprobe/sys_execve 00000030 0000000000000000 TEXT
4 .relkprobe/sys_execve 00000010 0000000000000000
5 .rodata 0000000d 0000000000000000 DATA
6 license 00000007 0000000000000000 DATA
7 .llvm_addrsig 00000003 0000000000000000
8 .symtab 00000090 0000000000000000
Now, we can use bpftool
to load our program into the kernel. This loads
the eBPF program from the compiled object file and pins it to the location
/sys/fs/bpf/hello
.
Additionally, since our eBPF program is a tracing program, we must specify the
autoattach
argument so that bpftool
attaches
our program to the execve
Kprobe once it has been loaded into the kernel, (see
more discussion regarding autoattach
here).
> sudo bpftool prog load hello.bpf.o /sys/fs/bpf/hello autoattach
No output in response to this command indicates success. If we’d like,
we can verify that our program has loaded if it exists in the /sys/fs/bpf
directory.
> sudo ls /sys/fs/bpf
hello
We can also use bpftool
to list the currently loaded eBPF programs. For
clarity, I will just show the lines related to our “Hello World!” program,
which has been assigned an ID of 44.
> sudo bpftool prog list
...
44: kprobe name hello tag 08424f7d1079fa76 gpl
loaded_at 2024-08-10T17:44:03+0100 uid 0
xlated 48B jited 128B memlock 4096B map_ids 9
To see the emitted tracing output, we can cat
the /sys/kernel/debug/tracing/trace_pipe
file. Depending on what is happening on your machine, you may see the tracing
output instantly, because other processes could be executing programs using
the execve
syscall. If you don’t see anything, open a second terminal and
execute any commands you like, (I recommend ls
), and you’ll see the corresponding
trace generated by the program.
> sudo cat /sys/kernel/debug/tracing/trace_pipe
cat-2322 [001] d...1 12373.067163: bpf_trace_printk: Hello World!
<...>-2323 [002] d...1 12376.370957: bpf_trace_printk: Hello World!
<...>-2325 [001] d...1 12382.956763: bpf_trace_printk: Hello World!
These traces are structured in a specific format:
cat-2322
— refers to the process which triggered the hook and its PID[001]
— is the number of the CPU running the eBPF programd...1
— the trace event’s flags, (d
means kernel space)12382.956763
— the timestamp since system bootbpf_trace_printk: Hello World!
— the log itselfFinally, to unload the program, we delete it from its pinned location.
> sudo rm /sys/fs/bpf/hello
In some cases, we may want to programmatically load and unload eBPF
programs instead of using bpftool
. For example, consider some hypothetical
scenario where there are multiple eBPF programs loaded into the kernel used
for tracing different syscalls. We might want some management program to
dynamically load and unload these programs at runtime based on user input.
To do this, we will use the cillium/ebpf
Go library, which provides utilities to compile, load, and manage eBPF programs.
Specifically, we use the bpf2go
tool to compile eBPF programs into bytecode, and
embed them into our Go source code. Once the Go program is compiled, we have a
single Go binary which contains our eBPF bytecode that we can distribute.
Ok, let’s write a Go program which encapsulates our “Hello World!” eBPF program.
We’ll have it take care of loading and attaching our eBPF program, in addition
to streaming the contents of the trace_pipe
file to os.Stdout
.
So, in the same directory as our eBPF program, let’s create a Go module for our
program, and also install the cilium/ebpf
library which we will use to compile
and load our program.
> go mod init hello
go: creating new go.mod: module hello
> go get -u github.com/cilium/ebpf
go: downloading golang.org/x/sys v0.24.0
go: downloading golang.org/x/exp v0.0.0-20240808152545-0cdaa3abc0fa
go: added github.com/cilium/ebpf v0.16.0
go: added golang.org/x/exp v0.0.0-20240808152545-0cdaa3abc0fa
go: added golang.org/x/sys v0.24.0
Our directory structure now looks like the following.
.
├── go.mod
├── go.sum
└── hello.bpf.c
Before we use cilium/ebpf
to compile our hello.bpf.c
program, we’re
going to prepend //go:build ignore
to it, so that the Go compiler
ignores it when building the final executable. If we don’t do this, Go
won’t be able to build our final program successfully.
It now looks like the following.
//go:build ignore
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("kprobe/sys_execve")
int hello(void *ctx) {
bpf_printk("Hello World!");
return 0;
}
char LICENSE[] SEC("license") = "GPL v2";
Compiling and embedding our program into Go source code is a two step process.
First, we create an (almost) empty main.go
file with one go:generate
directive,
this dictates how our Go source code should be generated given our hello.bpf.c
file. It creates skeleton code for manipulating the eBPF objects, in addition to
the .o
object files which contain the eBPF bytecode.
package main
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go probe hello.bpf.c -- -O2 -Wall
This directive is split into four parts:
go run github.com/cilium/ebpf/cmd/bpf2go
— this invokes the bpf2go
tool each
time we call go generate
probe
— this specifies that the generated files, (both object files and Go source
code files), should be prefixed with the probe
keywordhello.bpf.c
— this points bpf2go
towards our pseudo-C eBPF program-O2 -Wall
— all arguments after the two dashed lines are passed to the Clang
compiler which bpf2go
uses to compile our eBPF program> go generate
Compiled probe_bpfeb.o
Stripped probe_bpfeb.o
Wrote probe_bpfeb.go
Compiled probe_bpfel.o
Stripped probe_bpfel.o
Wrote probe_bpfel.go
Notice that bpf2go
generated two versions of both the .o
bytecode files
and the Go source files, one version for big-endian architectures, and one
for little-endian ones. These files are suffixed with _bpfeb.*
and _bpfel.*
respectively.
If we take a look at the generated Go code, we see that it contains structures
representing our eBPF program. (See how our pseudo-C hello
function is
represented in the generated Go code as probePrograms.Hello
?)
type probePrograms struct {
Hello *ebpf.Program `ebpf:"hello"`
}
All auto-generated structures are grouped into a probeObjects
structure which
represents everything that’s being loaded into the kernel.
type probeObjects struct {
probePrograms
probeMaps
}
Cool — we’ve completed the first step for embedding our eBPF program. Let’s take a look at what our directory structure looks like now.
.
├── go.mod
├── go.sum
├── hello.bpf.c
├── main.go
├── probe_bpfeb.go
├── probe_bpfeb.o
├── probe_bpfel.go
└── probe_bpfel.o
Now that we have our auto-generated skeleton, we can begin to write code which
loads and attaches our “Hello World!” program, in addition to streaming the
contents of trace_pipe
.
Let’s analyse the full source code for main.go
.
package main
import (
"io"
"log"
"os"
"os/signal"
"syscall"
"github.com/cilium/ebpf/link"
)
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go probe hello.bpf.c -- -O2 -Wall
func main() {
objs := probeObjects{}
if err := loadProbeObjects(&objs, nil); err != nil {
log.Fatal(err)
}
defer objs.Close()
kp, err := link.Kprobe("sys_execve", objs.Hello, nil)
if err != nil {
log.Fatal(err)
}
defer kp.Close()
tracePipe, err := os.Open("/sys/kernel/debug/tracing/trace_pipe")
if err != nil {
log.Fatal(err)
}
defer tracePipe.Close()
stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-stop
tracePipe.Close()
}()
io.Copy(os.Stdout, tracePipe)
}
We can divide this program into two key sections.
objs := probeObjects{}
if err := loadProbeObjects(&objs, nil); err != nil {
log.Fatal(err)
}
defer objs.Close()
kp, err := link.Kprobe("sys_execve", objs.Hello, nil)
if err != nil {
log.Fatal(err)
}
defer kp.Close()
First, we load our eBPF program using loadProbeObjects
which populates
the objs
struct with relevant eBPF data, including the program’s name,
type, license, and eBPF assembly instructions.
Then, (unlike with bpftool
), we have to use link.Kprobe
to manually
attach our eBPF program, referred to as objs.Hello
, to the execve
syscall, which we specify as sys_execve
.
tracePipe, err := os.Open("/sys/kernel/debug/tracing/trace_pipe")
if err != nil {
log.Fatal(err)
}
defer tracePipe.Close()
stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-stop
tracePipe.Close()
}()
io.Copy(os.Stdout, tracePipe)
Second, we acquire a read-lock on the trace_pipe
file, and pipe its
contents to os.Stdout
until we receive either a SIGINT
or a SIGTERM
syscall from the kernel, (this enables us to use Ctrl+C
to stop our
program).
The execution of our “Hello World!” program is now trivially reduced to running one command. Also, since our program does not need to be indefinitely pinned in order to be loaded into the kernel, it is automatically unloaded each time our Go process ends.
> go build -o hello && sudo ./hello
<...>-895185 [001] ...21 3853541.869581: bpf_trace_printk: Hello World!
<...>-895187 [000] ...21 3853541.875591: bpf_trace_printk: Hello World!
<...>-895188 [001] ...21 3853542.488578: bpf_trace_printk: Hello World!
<...>-895190 [000] ...21 3853542.493632: bpf_trace_printk: Hello World!
Congrats, we’ve written our first eBPF program!
Thank you for reading so far! Please let me if you have some feedback or corrections.
If you’re looking for more eBPF reading, you should check out:
Now you should be ready to move on to Part 2, where we start implementing the pseudo-C eBPF program to measure TCP flow latency.
© 2024-2025 Nadav Rahimi