OpenBSD celebrated release 3.5 on 1 May 2004. In honor of this release, Federico Biancuzzi interviewed the developers of OpenBSD's PF, a powerful and flexible packet filtering interface. This is the second half of an interview that began in PF Developer Interview, Part 1.
Federico Biancuzzi: A recent research by Paul Watson entitled Slipping in the Window: TCP Reset Attacks caused a lot of rumors in the media. After the public presentation at the CanSecWest conference, we understood that the real situation was not so terrible for every OS. How does OpenBSD reduce the risk of this type of attack?
Henning Brauer: OpenBSD is not vulnerable to these attacks. We use random ephemeral ports since 1996. We require RSTs to be right in the edge of the window since 1998.
Federico: How can PF protect a vulnerable server from this type of attack?
HB: The scrubber has just been modified to drop RSTs which are not right on the edge of the window. A special form of NAT where only the src port is modified is beeing worked on.
Federico: During the end of April most PF developers flew to Ryan McBride's house to start a short PF hackathon. (See Figure 1 for a photo from Ryan himself.) What are the new features and enhancements that were planned or already committed for the next release?

Figure 1. OpenBSD hackers hard at work at a recent hackathon.
HB: We had an extremely productive four-day hacking party at Ryan's house in the woods near Sechelt (British Columbia, Canada — about 2 hours from Vancouver), yes.
It was really a network hackathon, not limited to pf. Besides a lot of smaller things, the stuff we hacked on includes:
bgpdbgpdauthpf executes pfctl instead of sharing a lot of
code with it, which was a bit fragilebgpdbgpdifconfig rather than attach time
for fxp(4), em(4) and sis(4), others to
follow. This reduces the number of mbuf clusters sitting around unused.bgpdifconfig cleanupauthpftcpdump about carpreturn-rst now works on IPless bridgesauthpf now loads the autheticated users IP addresses into a
table tooOngoing work includes a rework of how anchors work — they can be nested soon, and the ruleset name (which is really just one level of nesting) will go away.
Federico: What type of platform do you use for PF development? Has the code any optimization for a particular architecture, like 64bit (Athlon64, Sparc64, G5, Prescot)?
HB: I mostly hack pf on my i386 notebook and, less often, my sparc workstation. Tests on sparc64 are mandatory as this is the most picky platform, especially due to its memory alignment requirements. I occasionally test on alpha, hppa, amd64, mvme68k, mvme88k, mac68k and VAX as well.
pf operates completely in MI (Machine Independent) land, there is no MD (Machine Dependent) code whatsoever.
Ryan McBride: I develop on sparc64 and i386. sparc is used for testing, and I have VAX and cats boxes sitting here waiting to be set up for testing (too slow for real development)
Cedric Berger: Mostly i386 gear. When doing big changes, I usually try to compile the code on a Sparc64 box since unlike i386, it uses 64-bit, big-endian, and gcc3. If PF code works on Sparc64 and i386, it has good chances to work well on the other architectures.
Federico: Which platform would you suggest to deploy an OpenBSD based firewall for various scenarios (home, office, enterprise)?
HB: Well, basically, any.
Vax may not be a good choice for filtering Gigabit speeds, obviously. From
the modern archs, sparc64, alpha, and amd64 are all good choices. i386 and
powerpc less so, I like W^X purity, but should not impose
performance problems either.
RM: For small sites, the Soekris 4501 or 4801 make
excellent devices. For bigger links, an AMD64 based system is an excellent
choice. Even if you're only dealing with 100Mbit links, using gigabit cards
like em(4) or sk(4) will improve performance as these
cards are designed to handle much higher numbers of packets per second than
100Mbit cards.
Federico: Reading the pf@ mailing list I found that the code has some limit regarding the hardware usage. What type of limits does PF present in the 3.5 release? How many concurrent states and how much RAM can PF handle?
HB: That thread is full of lies and uninformed guesses. Ignore it.
pf itself doesn't impose many limits. We have the settable state and fragment limits to prevent pool exhaustion, the amount of memory available for the pools used by pf varies depending on the hardware.
I don't have exact numbers; but 50,000 state entries are not a problem on a i386 with 128 MB.
That said, there is ongoing work which changes the way OpenBSD handles kernel memory used for the network stack — pf is not special here. This will allow for both more efficient usage, backpressure when needed, and more total memory available to the network stack including pf, thus allowing for much bigger state stables etc.
CB: With this patch by Mike Frantzen, you can use up to 768MB of RAM on i386 to store table entries, versus 64MB previously. We're looking for a similar improvement for the state table, but that's a bit more difficult.
Federico: PF is going to be ported to other BSDs. Daniel, are you proud of this? Are you working on these portings?
Daniel Hartmeier: I've been working with Pyun YongHyeon and Max Laier who do the FreeBSD port. They did all of the work, I merely tried to answer questions and sometimes aid in debugging. This has proven valuable to the base source, several bugs were found and fixed in the process. Maintaining several ports will cause additional work, of course, but the additional user base producing feedback makes it worth the effort.
|
Federico: Now that PF has been imported into the FreeBSD base system, do you think that some people will consider moving from OpenBSD to FreeBSD for its SMP support and better performance?
HB: Not at all. First, the performance advantages of FreeBSD, that is largely a myth. All BSDs are rather close to each other with respect to general performance.
Second, I don't expect any OpenBSD/pf user to switch, as pf on OpenBSD will
always have some advantages — as we develop pf here, we can embed it much
deeper. What I expect is FreeBSD users switching to pf from ipf
and ipfw.
RM: On most firewalls, the CPU is not the bottleneck, so adding a second one will not help (and may even slow things down). That being said, SMP support is being worked on actively for OpenBSD, and will likely appear in 1 or 2 releases, depending on how long it takes to get it right.
CB: Competition is good. If FreeBSD is better than OpenBSD for some application, I will use FreeBSD for that application. If OpenBSD performance lacks in some areas, we will try to fix it. That being said, I believe OpenBSD is clearly the best choice for a firewall now, for various reasons.
Federico: Why does PF still miss one basic feature like IPFilter
return-icmp-as-dest?
HB: Because nobody thought it was needed? There were no requests for this whatsoever, nobody of us had a need to, so nobody wrote that. That nobody requested it after, what is it now, 2.5 years?, is a strong sign it is not needed at all.
CB: Because nobody in the developer or user community cared enough to send a working patch.
Implementing that functionality is not easy, but I'm now looking at
implementing return-icmp for pure bridges, when there is no
routing table (return-rst for bridges has just been committed). It
is very likely that once I've return-icmp working on bridges,
adding return-icmp-as-dest support will be trivial.
Federico: Why doesn't PF come with an internal version number to track various updates typical of -stable branches?
HB: what for?
OpenBSD 3.5 is OpenBSD 3.5 is OpenBSD 3.5, period.
Can Erkin Acar: Since the kernel and userland is always synchronized,
there is not much point in adding a version identifier. For external utilities,
such as pftop (which still compiles on OpenBSD 3.0) the OpenBSD
release numbers are usually sufficient.
Federico: What type of support does PF provide for IPv6? Are there any interesting features specifically for IPv6? What features are still missing?
HB: PF has full IPv6 support — there's nothing really special or different opposed to v4.
RM: We're missing IPv6 fragment reassembly support, but
this is being worked on actively and will probably be included in 3.6.
pfsync does not support IPv6 as a transport.
Federico: Since the ALTQ merge in the 3.3 release, a lot of people enjoyed shaping the bandwidth with PF rules. Is there anything new for ALTQ, like other types of queues?
HB: Kenjiro and myself added HFSC for 3.4 and
polished things a little, and I rewrote the ID allocator. No real changes were
done after that in the queueing arena, and there are no planned currently. In
our eyes the current state is quite fine.
Federico: How do PF and IPSEC interact? What type of problems have you resolved and what stills need to be solved?
DH: pf sees IPsec encapsulated traffic both encapsulated on
the real interface as well as decapsulated on enc(4). Filtering
and translation can be done on either, with various effects. Apart from that,
pf doesn't treat IPsec traffic differently from other protocols. It doesn't
filter on SPI or other IPsec specific fields. UDP encapsulation for NAT
traversal have recently been added, but that's outside of pf, in IPsec/stack
code.
Several special requirements, like static source ports for
isakmpd, have been addressed, so pf basically works at least as
well with IPsec as any other packet filter not doing IPsec protocol
inspection.
HB: They do not interact more than pf interacts with anything else network related — pf passes or blocks ipsec traffic.
CB: They interact pretty well. You can filter ESP packets
on the real interface or decrypted packet on enc0. Nat/rdr is
possible on enc0, but that's tricky. What I'd like to do for next
release is to remove the need for the pass on enc0 proto ipencap
all rule, that is just wrong.
Federico: WiFi network are becoming widely (and wildly) used. What can PF do for a wireless network? Is there any new idea specific for wireless filtering?
HB: I don't see wireless much different than wired networks
with respect to pf. authpf can be especially neat in wireless
networks, but it already is neat in wired ones too.
Federico: If I'm not wrong, tools that use raw access to network data bypass PF because the filtering happens after. How can this be solved? Is this a behavior you want to change?
HB: This is not true.
It is true that bpf is outside pf. This is actually very good
for debugging.
We might add a possibility for bpf-based tools to request to be hooked in
before pf. It might be useful for the dhcp programs. But then, that is not a
real-world problem — I have privilege revoked dhcpd and
dhcrelay so that they don't run as root anymore, and
canacar@ helped out with bpf write filters (we have
read filters already) and lock the bpf device so that no changes
in those filters are possible anymore. Especially for dhcpd that
means that one very worrysome piece of code is now locked away that nicely that
you don't have to worry much anymore. And of course besides the privdrop and
bpf security work, we cleaned that mess up big time...
The most worrysome of those programs is now dhclient which is
scary, huge and still runs as root — even given we cut about half of its
code out already. I have it running privilege separated on my machine
already...
RM: I don't see this as a problem, and don't think that this will be changed.
CEA: This is by design, and I do not want/see this behavior changing. We have introduced bpf security extensions to solve this problem on a case-by-case basis. We are going through every program in the tree and modify them to use the security extensions and drop/separate privileges. At some point we may also start looking at critical applications in the ports tree.
|
Federico: What type of bpf security extensions have been
introduced?
CEA: bpf is a device designed to capture
packets from an interface. It has a filter language for selecting a subset of
packets to be read, used mainly for performance reasons. bpf also
contains some functionality for injecting packets into the network.
Programs use bpf by opening one of the
/dev/bpfX devices, and obtaining a file descriptor. The
access to the devices is restricted to root by default (through file
permissions). The problem happens when a program wishes to drop privileges, or
use privilege separation, after obtaining a bpf descriptor.
Even with dropped privileges, a program can change the filters, and the
interface and, thus sniff any interface on the host. Furthermore, if
the descriptor was opened with write access (some daemons require this, and
libpcap does this by default) it is possible to inject packets to
any of the available interfaces.
This had to be solved before any bpf-using program can be safely privilege separated. Two security mechanisms were introduced:
write filtering allows setting bpf filters for write
operations
locking prevents "dangerous" changes to the descriptor such as modifying the read/write filters, and changing the interface. Obviously the descriptor cannot be unlocked once it is locked.
If the descriptor is properly configured and locked before dropping
privileges, an exploit will not be able to further compromise the system
through the bpf descriptor.
Federico: I've read this thread
on the misc@ mailing list and I'm wondering what are the advantages of
tcpdump privilege separation?
CEA: Network data is untrusted, and parsing them into a
readable form is difficult and error-prone, especially for complex or obscure
protocols, thus tcpdump (and most other sniffers) are complex and
potentially dangerous pieces of code. A look at recently discovered
vulnerabilities in such programs should give an idea. Even saved binary files
may not be safe, and could act as time bombs. Privilege separation is used in
these programs to isolate the dangerous packet parsing code into an
unprivileged chroot jail.
At this point running tcpdump as root in OpenBSD is much more
safer than running it unprivileged, since being root allows it to properly
privsep. Hopefully this will be improved to cover unprivileged use, possibly
using setuid after we resolve some signal issues. Yes, there is an irony here,
making a program setuid root to make it safer :)
Federico: Can, could you provide a short history of pflogd?
CEA: The logging mechanism in pf is ingenious. Saving the
raw packet dumps loses minimum information, (usually) uses less space than
ASCII logs, and allows the logs to be analyzed using a variety of tools,
including passive OS fingerprinting recently added to our tcpdump,
and all the other cool analyzers/sniffers available.
The first version of pflogd is imported into the three about 2
months after the pflog interface. It had a basic functionality:
dump the logs to a file in binary tcpdump format, make sure the
existing file has the correct header before appending, handle SIGALRM for
flushing logs, and SIGHUP for re-opening the log files for working correctly
with log rotation.
At the last hackathon, right after OpenBSD 3.3 is released I have added
support for the new pflog format. pflogd supports
both, but refuses to overwrite an old file and outputs a 'Move away' warning to
the syslog.
After OpenBSD 3.4 is released, the bpf extensions was ready,
so pflogd was privilege separated. The privileged parent handled
the bpf descriptor stuff and opening/positioning of log files,
while the child running chrooted under _pflogd user is used for
logging.
Later, (January 2004) it was noticed that the pflogd files may
become corrupted if the partition gets full, or after an unclean shutdown. In
this case some or all of the appended logs would become unreadable.
pflogd now scans the complete log file, and detects corruption,
and gives the "Move away" warning, refusing to append until the log is moved or
rotated away.
Future: I have a (half forgotten) diff that handles the infamous "move away" part by renaming the existing log if a problem is detected. I have also not yet abandoned my plans for having ASCII pflogs. tcpdump is
safe, and more powerful than anything I could put into pflogd, but
lacks the rotation functionality.
Federico: Can, could you provide a short history of the PF logs format?
CEA: The pf logs contain a header and the logged packet itself. In the initial version, the header length was fixed, and very simple. It contained "interface, direction, action, a rule number" and a sub reason (why passed or dropped). This old format contained an unofficial link type (an identifier that determines the interface type and header format) and having a fixed length with no empty fields, it was not extendable.
After 3.3, we have improved the format to contain anchor and ruleset names
and a header length field which will allow the format to be extended later as
required. We have also changed the link type to the official id obtained from
the libpcap/tcpdump maintainers.
Federico: The PF log format has changed over time. Do other operating systems
or common software such as ethereal and the standard
tcpdump support this format?
CEA: I have contributed patches to ethereal,
and snort also supports the format. The standard tcpdump includes
support to the new format since tcpdump_3_8rel2 (although they
dropped backward compatibility to the old format).
|
Federico: Mike, you wrote most of the stateful engine. How did you audit it?
MF: Auditing is more of a continual process than one would think. Whenever something that you never thought of before comes out, you have to go through all your code again to make sure you're not affected. Often that is a self sustaining process. Inspiration often strikes during the audit and you think of more gotchas so you re-audit and think of more gotchas to audit for. I end up staring [at] a lot of weird and funky traffic at work; corner cases of corner cases. Most of my audits are walking those weird packets or connections through PF's state engine to make sure we don't block a valid packet.
Federico: Mike, you wrote most of the scrub feature. What improvements are planned for the future? What about TCP scrubbing and normalization?
MF: Niels Provos did the initial fragmentation and flag scrubbing support based on Paxson and Handley's USENIX paper. My planned improvements are in the normalization of TCP segments. The TCP protocol was designed so that there are only two active participants in any given connection. Making PF more active in the TCP stream will be done very gradually since mistakes lead to massive ACK storms or connections mysteriously freezing.
Federico: Mike, your job focuses on IDS development. Had you any ideas to make PF capable of interacting with an IDS like Snort?
MF: There are a few levels of interaction between a
firewall and an IDS. I believe Snort has been able to do intrusion detection on
any packet logged by PF for a year or two now (PF logged packets appear on the
pflogd0 virtual interface which you can monitor with many
libpcap based programs like tcpdump or
ethereal).
There are already two ways to emulate Linux's DIVERT sockets and turn an
IDS into an IPS (Intrusion Prevention System). One could use PF to route the
packets to a tunnel device and read them there. Or one could block the packet
in PF and watch the full packet show up on the pflogd0 logging
interface.
And a passive IDS running on the firewall could easily tell PF to kill all of an attackers connections and add his IP address to a blacklist or even redirect any new connections from him to a honeypot.
The tools are all there. All it takes is someone to add the code to Snort. Someone whose employer doesn't compete with Snort :-).
Federico: Are there any plans to develop application proxies for common protocols like HTTP, SMTP and POP3?
DH: They are easily done in userland, see
ftp-proxy(8). We agree to not do it in kernel. If there is enough
demand for protocols besides FTP, someone will step up and do it, I'm sure.
HB: No.
ftp-proxy is needed due to the nature of ftp with its two
connections, the place for other proxies is in ports IMHO. That said, some
stuff is imaginable in-tree, and if somebody steps up and writes good code,
this is certainly welcome. Whether it turns into a port then or goes into the
main tree, I can't say yet.
RM: In the kernel? Certainly not, the risk for compromise
is too great. In userland, already have ftp-proxy, and there are
3rd party applications which handle many of the other protocols:
apache or squid for HTTP; MTAs such as
sendmail are basically SMTP proxies by nature.
CEA: These protocols are quite firewall friendly since they
use well defined ports. These protocols could benefit from content filtering or
caching. OpenBSD already has spamd which is a proxy for spam :)
and there are a number of http proxies in the ports tree. I am not aware of any
specific plans, but someone might just decide to write one.
CB: I've no idea.
Federico: Some OpenBSD developers wrote various tools to analyze PF logs and statistics (pfstat, Hatchet,...). Is there any project to create a global and unique graphical interface to work with PF?
HB: No.
RM: Not by any OpenBSD developers. Doing a good GUI configuration tool for PF is very difficult because there are so many options, and laying them out intuitively in a graphical interface is nearly impossible.
Federico: Can, could you explain pftop?
CEA: It started as a text-mode realtime display tool for
active pf states. It improved quickly with feedback and patches from the
community. It now has rule and queue pages, and can compute per state
throughput. I should probably find some time to catch up with the new pf
features in 3.5 though. pftop is available in the ports tree as
sysutils/pftop.
Federico: Please, tell us the all the truth about PF performance tweaking. What are the right settings to build a stable and network optimized kernel?
HB: Use GENERIC.
I totally don't get that tweak tweak tweak attitude. GENERIC
works fine in almost all circumstances. If you have a real problem
with GENERIC, as in a problem that shows up during real world
usage, then post to misc@ and you'll get help.
Of course there are cases where you need to tune; I have machines where I
need to crank nmbclusters a little. BUT: the key point is, as long
as GENERIC works and doesn't show a problem that is the best you
can get. And making GENERIC work for more and more scenarios is
one of the major things we are doing since some releases, by switching from
compile time options to runtime controllable stuff (mostly sysctl)
or at least allow changing from config/ukc.
In fact, a lot of knobs should not ever be touched. This especially is true
for nkmemclusters. The myth about that being needed comes from old
days where we had a leak in the routing table code under some circumstances...
nowadays you will in almost all cases shoot yourself in the foot by mucking
with nkmemclusters. You increase it, something else needs more
kernel memory, and you run out. You will crash. So don't touch it.
OpenBSD 3.6 will come with big improvements in that area.
Federico: How will PF evolve? More features or performance tweak?
DH: I think addition of new features will slow down over time, most things are covered by now. New features have to be justified against the instability changes introduce. Changes have been frequent at first, I don't mind changes settling down. That allows [us] to more carefully search for performance improvements and plain bugs.
MF: My PF wishlist is almost empty. I imagine most of the next big evolutions will be from spontaneous ideas someone has in the bar or us feeding off ideas of the others during the next PF hackathon.
HB: There's ongoing work on bpf security. We
are also looking at further flexibility in the language and some internal
changes that solve little problems.
There are no "big" new features planned for pf; maybe some of the stuff we do outside pf gets to interact with it in some way. We've been at the "pf is done" point quite a few times now, and there have been great ideas later on. pf development slowed down, and will slow down even more — not because we don't have enough developers or something, but simply because it is, well, pretty much done.
Compatibility becomes a much larger issue with every release that contains pf and with each and every pf installation, that's an area where I think we have to get better.
CB: The one thing I'd like to have for 3.6 is the ability
for firewalls working in a bridged environment to send IP packets (i.e.,
firewalls without IP address and/or routing table). Currently, features like
syn-proxy, return-icmp, return-rst don't
work on such a firewall because PF does not know how and where to send packets.
Fortunately, I think there are good 95% solutions to that problem, which is
probably the most requested feature on PF lists.
Besides that, I'm not aware of any "big" things that would come, but we've been saying that for every past release, as far back as I remember :).
RM: Because of the very clean initial design, PF has never had real performance problems — in the vast majority of PF deployments, the CPU sits essentially idle. So there's not much incentive for developers to spend long hours squeezing a bit more performance out of the code — such efforts would likely increase the complexity and thus the chance of bugs.
With regards to new features, it is hard to say; PF is becoming fairly feature complete and eventually development will slow down to a maintenance mode, like many other areas of OpenBSD. On the other hand, put 2 PF developers in a room together, and they immediately begin to come up with new crazy ideas. So I don't see it stopping within the next few releases.
However, I think the bulk of new work being done will not be in adding new features, but in the following 2 areas: fine tuning the features which already exist as we learn more about how they are used in production deployments, and making internal changes which simplify or reduce the code base. The latter is very boring for users, but actually quite important in reducing the potential for bugs.
CEA: I think, with the recent failover work, the feature range is quite complete. There would definitely be some performance tweaks, and improvements/additions to supporting userland tools before new features.
Federico Biancuzzi is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.
Return to the BSD DevCenter.
Copyright © 2009 O'Reilly Media, Inc.