Here's my braindump of setting this up. I don't have time right now to make this a more complete wiki page anywhere but maybe later I cut & paste this to e-linux or somewhere suitable
First... the caveats ...
If you have a home network with a few hosts, the RPi is a perfectly suitable way to provide a web content filter. If you have a school or business network and you need access logs, etc. then Squid and Dansguardian are still suitable tools but you're going to want to install them on a better powered desktop/server.
Also, I won't be explaining is how to prevent users bypassing the RPi and accessing the internet directly because that is a wider topic and dependent on the exact network setup you have. The following would be suggested ways to prevent this:
- Implement desktop controls to prevent people editing their proxy settings. Seems like overkill at home.
- Block all web access on the internet access router from any host other than the RPi and don't let anyone use the RPi as a desktop. Often specific filtering like this is not possible on a low-cost, provider supplied broadband modem.
- Add a second (USB) network port to the RPi and make it a gateway between the hosts & broadband router so all traffic has to flow through the RPi.
Now we've got the negative stuff out the way, lets start at the beginning. I'm assuming you have your RPi installed and connected to a LAN that has the hosts (laptops, PCs, etc.) that you would like to pass through a web content filter.
Step 1 - Add the required packages
This is as simple as:
# apt-get install squid dansguardian
.. as root (or sudo).
Squid is the proxy server and dansguardian is the content filter.
I recommend the home pages of squid & dansguardian for more info:
Step 2 - Configure the installed services
(This is from memory, if it contains errors, I apologise... just reply and I'll answer)
The default squid config should be ok. The config is at /etc/squid/squid.conf and you'll need to be root to access that directory.
The key line in this file is:
http_port 3128 transparent
which specifies squid will run on port 3128
The dansguardian config file is at /etc/dansguardian/dansguardin.conf and again, you'll need to be root to edit this file. You need to comment (or delete) this line:
# UNCONFIGURED - Please remove this line after configuration
Just FYI You'll note there is also a line
proxyport = 3128
which specifies which port to find the proxy (squid) running on. Note it also says port 3128 as specified in the squid config. Its rather important these two values match.
There are plenty of other options. The config files are both written comprehensively and you can do worse than simply reading the comments in the config files.
Step 3 - Tune the setup for RPi
Ok, these are optional steps but here's why I carried them out. The RPi has fairly good network throughput for general internet size packets and will certainly have enough network throughput to deal with a typical broadband service (20Mb and less). Perhaps it will struggle on some cable/fibre services running at higher throughputs e.g. 100Mb.
What I found the RPi did struggle with is writing logs & caches. This caused a noticeable delay in web access. I tried using a USB disk as root filesystem and this did not improve the situation and caused other kernel errors. So, I limited caching/logging to the very minimum of that required.
First, lets disable squid writing a local content cache. On a home network there's really very little to be gained from providing a local cache. (FYI a cache is a copy of the pages people have accessed so that squid can serve them directly rather than go out to internet each time to fetch them). In /etc/squid/squid.conf look for the line:
# OPTIONS FOR TUNING THE CACHE
and in this section add a line
cache deny all
This stops squid caching anything at all.
Now, lets turn off squid from logging anything (we'll get dansguardian to log the info we want). In /etc/squid/squid.conf look for the line:
access_log /var/log/squid/access.log squid
we comment out this line and stop logging. Change this config entry to:
#access_log /var/log/squid/access.log squid
Finally, lets switch dansguardian from logging every piece of web access and only log those that are denied. Look for the "logging level" section and change it to option "1" like this:
# Logging Settings
# 0 = none 1 = just denied 2 = all text based 3 = all requests
loglevel = 1
Step 4 - Start the services
First we start squid with (as root)
# service squid start
If squid fails complaining about swap its because I think before first run these have to be built by running
# squid -z
.. and then you should be able to run # service squid start
You should also be able to start dansguardian with:
# service dansguardian start
Step 5 - Test
Ok, at this point you should be able to configure a browser somewhere on you network and point it to dansguardian. E.g. in windows 7 its Control Panel -> Network and Internet -> Internet Options -> Connections (tab) -> LAN Settings. In firefox its Options -> Advanced (tab) -> Network (tab) -> Settings
In both cases tick "use a proxy" and enter the IP address of your RPi as the proxy address and "8080" as the port. 8080 is the default port used by dansguardian. If necessary select "use the same proxy server for all protocols".
Now when you access the internet you should be routing through the content filter. If you're accessing a permitted page then you shouldn't notice anything different. If you access something undesirable e.g. an adult site, then you should get the dansguardian "this is blocked" page.
Step 6 - Customising what Dansguardian permits/blocks
Before you have tuned dansguardian to your needs you will get a lot of false positives i.e. sites blocked that shouldn't be blocked.
All sites that are blocked will be recorded in /var/log/dansguardian/access.log. To show the blocked sites just do a simple search like
# grep DENIED /var/log/dansguardian/access.log
You'll find two things commonly blocked. A certain page/website has scored too highly or a certain filetype is banned. By default dansguardian is quite restricive. e.g. It won't let you download binary/multimedia files and blocks pages that refer to sex, drugs (and rock'n'roll...no, just joking) and proxies.
Luckily, dansguardian has a simple set of files to manage what is blocked.
Dansguardian works on lists of words called "phraselists" which are installed with dansguardian. Some words are considered negative and some are considered positive. Each page is given a total score depending on the words matched and over a certain score, the page is blocked e.g. a site that just talks about "drugs" is likely to get blocked but a site that uses the word "drugs" in combination with words like "health" and "advice" is less likely to get blocked.
The word lists are stored under /etc/dansguardian/lists and the files are sensibly named for you to understand. I'll use two common examples.
If you want to permit all pages from a particular domain e.g. bbc.co.uk then edit the "exceptionsitelist" file and add the domain to the bottom file:
(Note, you don't include the www). Once change, all pages under "www.bbc.co.uk/
..." will be permitted no matter what the content. Permitting entire sites is the easiest approach. However, there is also an "exceptionurllist" file if you want to permit a certain page on a site but don't want to allow all the other pages on that site.
If you want to allow download of mp3's / binaries then edit the "exceptionextensionlist" file and add the permitted filetypes. E.g.
... and so on with the other files. As well as "exception..." files that permit there are also "banned..." files that will block whatever entries you add.
If you make any changes you will need to restart dansguardian for it to be recognised via:
# service dansguardian restart
# danguardian -r
Step 7 - Customising the dansguardian blocked message
... and finally, you may want to change the html that dansguardian displays when something is blocked.
In /etc/dansguardian/dansguardian.conf you probably have the following lines:
# -1 = log, but do not block - Stealth mode
# 0 = just say 'Access Denied'
# 1 = report why but not what denied phrase
# 2 = report fully
# 3 = use HTML template file (accessdeniedaddress ignored) - recommended
reportinglevel = 3
The HTML template its referring to is stored at:
( You can change the language used in dansguardian.conf )
Just edit this template.html to be what you want and that is what is displayed.
Like any other change to dansguardian configuration it is only changed after a restart
# dansguardian -r
That's all I can think of for now.