HTTP Proxies
Pages: 1, 2
A Bit About HTTP
If you've never managed an HTTP server or HTTP proxy before, you may be amazed at the amount of interaction that occurs whenever a web browser connects to a web server. I mentioned in the last article that we would be referring to the HTTP RFC (2616). Let's do a very quick rundown on how the HTTP protocol works; I'll leave it to you to refer to the RFC to fill in the details that interest you.
Whenever you browse a web site, your browser must make a separate request for
every item on that page. For example, if I type slashdot.org into
my browser, I'll see the following entries in my mman cache:
- http://images.slashdot.org:80/topics/topicgamesrts.gif
- http://images.slashdot.org:80/topics/topicinternet.gif
- http://images.slashdot.org:80/title.gif
- http://slashdot.org:80/
- http://images.slashdot.org:80/topics/topicaposx.gif
- http://images.slashdot.org:80/topics/topicms.gif
- http://images.slashdot.org:80/topics/topiccomdex.gif
- http://images.slashdot.org:80/slc.gif
- http://images.slashdot.org:80/pix.gif
- http://images.slashdot.org:80/topics/topicscience.gif
- http://images.slashdot.org:80/greendot.gif
- http://slashdot.org:80/favicon.ico
- http://images.slashdot.org:80/topics/topichardware.gif
Note that every GIF or image is a separate request, as each is stored as a
separate file on the web server. In order for my web browser to display the
main page of Slashdot's site, it had to individually request each of the 11
.gifs, the one .ico, and the HTML page that
explained how to format everything together.
In HTTP, there are two types of packets: request packets and response packets. The request packet always comes from the web browser. This makes sense, as a web browser is a client and the job of a client is to make requests. Not surprisingly, the response packets always come from the web server.
A web browser's request packet has three components:
- Method
- Header
- Body
The method indicates what the client is requesting. The methods are all
listed and explained in the RFC and typically are written in uppercase. The
most common method is the GET method, as typically your web
browser wants to "get" a particular page or image from the web browser. If you
take a look at your mman log, or for that matter, the log from any
HTTP proxy or HTTP server, you'll see GET requests:
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] request: GET http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] cache: create: http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topiccomdex.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topiccomdex.gif
<snip>
Here, mman issued the GET request on behalf of my
browser, then placed a copy of the requested item into its cache.
A web server's response packet also has three components:
- Status
- Headers
- Body
That is, the request packet sends a method, and the web server responds with
a status message. Status messages are numerical, and again are listed in the
RFC. You've probably run across a "404 error," as 404 is the status
number representing "not found." The most common status is 200 or
OK. If a web browser issues a GET request and the
server finds the requested resource, it will send it back along with a status
of 200. If it can't find the requested file, it will instead send
a status of 404.
You probably noticed that both request and response packets contain headers
and a body. The body usually contains the requested page or image. So, when my
web browser made a GET request for
http://images.slashdot.org:80/greendot.gif, the web server found
the GIF and sent a response packet with a status of 200 and the
GIF itself in the body of that packet.
Displaying Headers with mman
Headers are the interesting part of HTTP packets. They contain useful
information that help the web browser and web server to communicate
effectively. They also contain sensitive information about both the web server
and web browser. Here are the results of my clicking on Show Headers in
mman's web interface:
Unfiltered
Type Value
Host mman
User-Agent Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1)Gecko/20030619
Accept text/xml,application/xml,application/xhtml+xml,text/html;
q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
q=0.2,*/*;q=0.1
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate,compress;q=0.9
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Proxy-Connection keep-alive
Referer http://mman/headers
Filtered
Type Value
Host mman
Accept text/xml,application/xml,application/xhtml+xml,text/html;
q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
q=0.2,*/*;q=0.1
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate,compress;q=0.9
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Referer http://mman/headers
User-Agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)
Remember, every HTTP packet includes headers. Here you are seeing the
values that are sent by my web browser. The Unfiltered section
contains the defaults used by my web browser. It clearly shows my operating
system and the version and type of web browser I am using. The
Filtered section shows that mman changed some of
those headers before sending them to the web server. If I don't like those new
values, I can simply click on Config, select header,
and edit, say, the User-Agent. This configuration section is quite
powerful, as you can add, delete, and modify the contents of headers. Don't do
this just for kicks, however. Make sure you've read the RFC and understand the
ramifications of the particular header value you have the urge to muck about
with.
It's also interesting to see the headers being sent by a web server. If I type this URL into my browser and remember to use two periods between the word "headers" and the URL:
headers..www.mp3.com
I'll see this:
*Server header:*
HTTP/1.1 200 OK
Date: Sat, 21 Jun 2003 21:17:43 GMT
Server: Apache/1.3.12m1 (Unix) yasl/2.25 sw/1.7 mod_rdbcookie/1.2
mod_mp3idver/0.12 rwh/1.1 bw/3.37 rewrite/3.3 include/3.6
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
Notice that there aren't any secrets on the server end either. The header clearly indicates the type and version of web server software in use. If you are responsible for maintaining a web server, remember that every HTTP packet leaving your server reveals whether or not you've kept up with your web server patches!
Controlling Access
mman also supports features that can be very useful in a
networked environment. One, it can force users to authenticate before they are
allowed to use the Internet. I'll click on config then select
access and add a policy. I'll then be presented with a form.
If I leave the IP address section empty, the access policy will affect every IP address that connects to the proxy. I can then set values in the username and password fields. Before saving the policy, I need to configure what access users will be allowed once they input the correct username and password. My choices are:
- Web interface: This will allow users to configure the proxy, so I will leave this option unchecked.
- Proxy requests: If I check this option, the proxy will accept requests from web browsers that have been manually configured to use the IP address and port number of the proxy.
CONNECTrequests:CONNECTis an HTTP method that is often disabled due to its associated security risks.- HTTP requests: I want to remember to select this option, or users won't be able to access HTTP servers.
- Transparent proxying: If I check this option, the proxy will intercept web requests, even if the web browser hasn't been configured to use the proxy. This is generally a good thing in a network, as it ensures users won't be able to bypass your proxy server.
- Allow bypassing:
mmanhas keywords that can be included with an URL to bypass restrictions for a particular site. For example, if I wanted to see the popups for a site, I could type this in my browser:bypass[f]..www.mp3.com. If you don't want users bypassing your filters, don't select this option.
If you decide to create your own policy, remember to create a second policy
that will allow you as an administrator to configure mman. If you
plan on configuring mman on the same computer that is running the
proxy software, keep the default policy, but place it below your new policy
that affects your users.
Now, when users open up their web browsers, the browser itself will prompt them for the username and password you created in your policy. If they type it in correctly, they will be able to access the Internet, according to the parameters you set in your policy.
|
Also in FreeBSD Basics: |
The last feature I wish to mention is limits. This
configuration allows you to control Internet access according to month, day,
and time. For example, you could configure a policy that limits Internet
access to the hours of 9:00 to 17:00 on Monday to Friday.
Conclusion
It seems that I've barely scratched the surface of the middleman proxy
server. Perhaps I've piqued your interest and you will try this application for
yourself.
In the next article, I'd like to finish the proxy series by taking a look at SMTP proxies.
Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.
Read more FreeBSD Basics columns.
Return to the BSD DevCenter.



Apache admins, look at the ServerTokens option.