Introduction to eBPF - Part 1
Some months ago, I was tasked with building an eBPF powered tcp proxy, which proved to be a fun experience. Only problem is, I spent quite sometime understanding what exactly eBPF was. And so like the good human that I am (haha), I’ve decided to centralize all my findings for future eBPF noobs. It’s important to note that I am by no means an expert with this technology.
There’s 2 parts to this article. The first chunk(this very one you’re reading) is just some history about the tech + a high level overview. In the second part however, we’ll work through a simple ebpf powered program. Spoiler alert! we’ll write some C & Go, glued together with ebpf-go.
The purpose of this article is to take you from “noob” to “ohhh okay, so this is what ebpf is”. With all that in place, it’s time dive in.
Well before we dive in, here are some quotes from myself;
“Think of the kernel as a web-browser, ebpf as the js engine embedded in the browser & ebpf programs as the functions passed to js event listeners” - griff, 2022
“I think of ebpf as a framework that allows you to write middlewares for kernelspace & userspace activities” - griff, 2022
I don’t take myself very seriously, you shouldn’t either. (I swear this article is legit though)
BPF(Berkeley Packet Filter) is a technology provided by unix kernels that allows you to efficiently capture and filter network packets based on custom rules. The rules/filters are not statically linked to the kernel, rather, they are written as programs & injected into the kernel at runtime. These filters are evaluated by the bpf virtual machine and their result determines whether a packet is dropped or not. This version of bpf is also known as cBPF(classic BPF). Think iptables, but more programmable.
eBPF(extended BPF) is an improved version of cBPF that allows programmers to do more than just packet filtering. cBPF was purposely for packet filtering, but along the way, some smart folks realized they could basically enhance it to allow kernel-wide programming. eBPF introduced optimizations to the bpf VM and also added datastructures(eg. maps) to facilitate data storage & communication between eBPF programs. These days, eBPF is not regarded as an acronym for berkeley packet filter, since that pretty much undersells the technology. It’s best to think of eBPF as a technology for extending the functionalities of the kernel.
- Hookpoints - eBPF programs are executed in an event driven manner. By attaching themselves to the execution path of an operation, they register themselves to be executed. These attachment points are called hookpoints. For example, you could attach an eBPF program to a socket using the SO_ATTACH_FILTER hookpoint and your eBPF program will see packets right before they reach the socket. That way, you can filter network traffic on a per socket basis.
- ebpf program - An ebpf program is code(who would’ve thought), usually written in restricted C, that is compiled & attached to hookpoints in the kernel. Once these hookpoints are traversed, the program gets executed. The code is written in restriced C because the bpf virtual machine is resource constrained, and so there’s an upper boundary on the complexity of your programs. A good example is how eBPF programs can’t contain loops(there are ways around this particular behaviour, but that’s out of scope).
- program type - This is a “tag” that determines the hookpoint of an ebpf program. An ebpf program that drops packets and a very useful eBPF program that logs all system calls won’t have the same program type because they’ll need to be attached to specific hookpoints to get the information they need. The former will have to be attached to the xdp and the latter will have to be attached as a kprobe.
- helper functions - These are pre-defined functions that provide a safe api for ebpf programs to make kernel calls. Different program types grant you acess to different helper functions.
- maps - Maps are data structures used in eBPF programs. The programs store data in maps and also use these maps to communicate with other eBPF or userspace programs.
- userspace program - This is a program that is either powered or complemented by the eBPF program. Example, your eBPF program can collect network metrics & store it in maps, but it woudln’t be able to push those metrics to your datadog agent. So, as the programmer that you are, you’ll have to write a lightweight userspace program to pull those metrics from the maps & write them to your metrics database. In my case, my bpf program allowed my proxy to serve requests targetted at multiple ports using only 1 socket(goodluck pulling that off without ebpf).
- Loading - Loading is the process of injecting eBPF programs into the kernel for execution. It’s important to note that the program has to be converted to bytecode before loading into the kernel.
- Verifier - The verifier ensures that eBPF programs are safe to run. Due to resource constraints & security reasons, the program is analyzed to ensure that it’s safe & can’t crash the kernel. It also ensures that the code wouldn’t take much time to execute. With great power comes great responsibility, or whatever that quote says. Bottom line is, there’s a cap to the complexity of ebpf programs and the verifier enforces those limits.
- JIT compiler - The just-in-time compiler transforms verified eBPF bytecode into machine code for faster execution.
- Pinning - Bpf objects (programs, maps, etc) are detached when their reference count drops to 0. This usually happens when the process that loaded them into the kernel exits, except that’s not always very useful. So to get these objects to outlive their initiating process, we pin them to a virtual filesystem so they can outlive their creator.
More on pinning. Let’s pretend we’re NASA for a moment. Think of an eBPF object as a space ship, and the initiating process as a rocket booster. When the rocket boosters get the spaceship to good altitude, they detach and fall back to earth. Now, imagine if the spaceship followed the rocket boosters back to earth. Clearly that isn’t productive. So to prevent that, we pin our spaceship into space. You can ask the astronauts if that’s a productive approach, but hopefully you get the general idea.
How it works
The are numerous ways you can program the ebpf virtual machine. I’m only going to outline a highlevel procedure, but you can read more on it here.
- You write an eBPF program. Mostly in restricted C.
- Compile the program into bytecode using tools like clang.
- Use bpftool or another highlevel program to load the bytecode into the kernel.
- The verifier evaluates the eBPF program and ensures it’s safe to run.
- The JIT compiler converts the bytecode to native assembly for faster execution.
- The program is then attached/linked to is hookpoint.
- Anytime the hookpoint is traversed, our attached middleware gets executed.
eBPF programs can be attached to almost any kernel or userspace operation. Some of the hookpoints include:
- System Calls - ex. Executing some code when read file call is made.
- Network Events - ex. Executing some code when packets are received.
- Function Entry and Exit - ex. Intercepting calls to/from functions.
There’s a crazy range of things you can do with eBPF. For example
- DDOS Mitigation - An xdp ebpf program can intercept and drop network packets immediately they exit physical devices(NIC). Even before they enter the tcp/ip stack.
- Socket Steering - An sk_lookup eBPF program can be used to steer a wide number of tcp/udp connections to a single single socket. You could easily serve a range of 65,535 ports with just one socket. Damn.
- Security - A seccomp eBPF program can be used to restrict execution of certain system calls. You can easily sanbox any system using using eBPF.
- Observability - You can aggregate and export metrics on any almost any activity right from the kernel. For example, you can attach to network traffic control and collect metrics on ingress/egress.
- BPF Paper
- Official eBPF website
- Thorough introduction to eBPF
- A very good article I literally found some minutes before publishing this
Hopefully this helped you understand the technology a bit better. I think it’s really powerful & quite underrated. Or maybe I’m just overexcited. Regardless, reach me @gwuah_ on the bird app with any comments or feedback. 👋🏾