Building Detailed Network Reports with Netflow
Pages: 1, 2, 3, 4
flow-stat
Another obvious question is "Who is our biggest traffic consumer?" The flow-stat(1) command allows you to run broad-scale reports on any combination of flow files. The flow-stat man page lists many supported reports. Several of them are not yet implemented, however, and inform you so only when you try to run them. Others strike me as not entirely useful unless you're running BGP. Some of the report formats I find most useful are:
0--Overall summary.5--TCP/UDP destination port. This counts traffic in both directions, so you must sort it carefully.10--Source and destination IP. This lets you sort traffic usage between particular machines, so you can identify connections that hog bandwidth.11--Source or destination IP. This lets you identify particular machines that use large amounts of traffic.
Indicate the report format with the -f option.
The -s option sorts the results in ascending order, while -S sorts in descending order. Both take a single argument, the number of the column to sort on. Flow-stat numbers columns starting at 0.
For example, suppose that you need to identify the most heavily used TCP/IP ports on your network. This is flow-stat report format 5. When running a new report, I run it once without any sorting option, just to see what the columns are, and then run it a second time sorted on the desired column. Here I want to sort on column 1, which is the number of flows.
# flow-cat -p flowfiles | flow-stat -f 5 -S 1 | less
# --- ---- ---- Report Information --- --- ---
#
# Fields: Total
# Symbols: Disabled
# Sorting: Descending Field 1
# Name: UDP/TCP destination port
#
# Args: flow-stat -f 5 -S 1
#
#
# port flows octets packets
#
443 191969 959951160 2554217
80 42740 345856613 2384044
25 1022 8152412 14890
53 346 57947 730
445 307 20352 424
135 249 11952 249
110 212 111870 2342
44473 203 247298 757
...
This report actually goes on for several pages, with a "trailing edge" of ports that have had only a few contacts.
You can learn a lot about my network here. The most popular port is 443, for SSL web traffic. Ports 80 (http) and 25 (smtp) are also popular, as well as 53 (dns). I also get a lot of requests for Microsoft protocol ports, 445 and 135. The big surprise for me is port 110; my network doesn't provide POP3 services! I really need to identify where this traffic is coming from and where it's going. I could use flowdumper for that, or write a more complicated report with flow-nfilter.
As Netflow tracks a single TCP/IP connection as two flows (one from the client to the server, and one from the server to the client), you will see lots of smaller requests to off-numbered ports. One flow's source is its mirror's destination, after all.
Perhaps my FlowScan graph displays heavy traffic usage, and I want to see what's going on. One possibility is that a particular client is making especially heavy demands upon one of our servers. I want to see which combinations of clients and servers are using the most flows, which is flow-stat report format 10. I'm sorting on column 2 (flows), in descending order. (Again, I don't know that it's column 2 I want until I run the report unsorted; there is no magic way to extract the knowledge of which column I want from the ether.)
# flow-cat -p flowfiles | flow-stat -f 10 -S 2 | less
# --- ---- ---- Report Information --- --- ---
#
# Fields: Total
# Symbols: Disabled
# Sorting: Descending Field 2
# Name: Source/Destination IP
#
# Args: flow-stat -f 10 -S 2
#
#
# src IPaddr dst IPaddr flows octets packets
#
105.157.204.33 192.168.88.230 5167 15766678 52283
192.168.88.230 105.157.204.33 5123 5200858 34541
105.157.204.33 192.168.88.243 4121 27993332 63995
192.168.88.243 105.157.204.33 4112 30655019 53695
109.116.147.7 192.168.88.243 3071 8296022 23541
192.168.88.243 109.116.147.7 3069 13705493 18890
24.105.3.130 192.168.88.230 1533 4718630 16236
192.168.88.230 24.105.3.130 1521 1326074 11375
...
The 192.168.88 addresses are local servers, and everything else is remote. Obviously, the 105.157.204.33 host is the single biggest client--the first four lines involve that host! One thing to realize is that it's common for these reports to give pairs of records together, especially if you're sorting by flows. My biggest flow user is 105.157.204.33 sending traffic to 192.168.88.230, and the second flow is 192.168.88.230 sending traffic back to 105.157.204.33. That makes sense--the server is answering the client about as often as the client talks to the server. In this case, though, one client IP appears in the first four records; it's obviously using the resources heavily.
Another common question is "Which server receives the most connections?" A report on single hosts is format 11. Sort this by flows again, which is column 1.
# flow-cat -p flowfiles | flow-stat -f 11 -S 1 | less
# --- ---- ---- Report Information --- --- ---
#
# Fields: Total
# Symbols: Disabled
# Sorting: Descending Field 1
# Name: Source or Destination IP
#
# Args: flow-stat -f 11 -S 1
#
#
# IPaddr flows octets packets
#
192.168.88.230 224442 459688424 1878817
192.168.88.243 153038 1453679456 2635365
192.168.88.247 34503 124729216 291507
An interesting point here is that the host that receives the most connections is not the host that receives the most octets of traffic or the greatest number of packets. You might find octets (column 2) or packets (column 3) a more sensible measure for your situation. Do whatever works for you.
