Use RPi as a proxy / net filter


22 posts
by drjaking » Tue Jan 24, 2012 12:00 pm
I can't wait to get an RPi for my kids, hopefully it will help wean them off miniclip (and I think by the way, game programming has to be one of the gateway activities). But I'm also looking for a little always-on device to use as a parental control system. I've thought about running Squid and Dansguardian on a Sheevaplug, but it strikes me the same might work on an RPi, am I right? Main worry is whether it could cope with the network traffic in real time.
Posts: 1
Joined: Tue Jan 24, 2012 11:55 am
by mgmt_idiot » Tue Jan 24, 2012 6:51 pm
I looked at something similar. Could really do with multiple NICs for squid and for several other things I've got in mind. The problem is throughput on the second NIC via USB, which would bottleneck my ISP connection if I used it for squid. IIRC, about 30Mbit/s is the most you can get through a USB NIC.

You'd face the same problem on a Sheeva as on a RasPi for this. A slight step up from the Sheeva is the Dreamplug http://www.newit.co.uk/shop/pr.....php?cat=21

My problem is that I'm a fan of Redhat derivatives like Fedora, rather than Debian.

More NICs would be my requested feature on a Model C. Ideally 3. But I want the RasPi team to stay focussed on current delivery.

The other thing is that for squid, you'll want some disk space for the cache, and a platter may be best for the large number of re-writes involved in web caching.

That's my best understanding of the squid type issues here. The floor is now open for everyone to tell me I'm wrong :-)
Posts: 95
Joined: Sun Jan 22, 2012 5:46 pm
by kevco » Wed Jan 25, 2012 4:22 pm
You should be able to make a one arm proxy by using two IP addresses for the same NIC. Here is a blog post that should get you going.  I'm hoping to do this as well.

http://blog.andrisetiawan.com/.....h-one-nic/
Posts: 18
Joined: Sun Jan 08, 2012 4:13 pm
by error404 » Wed Jan 25, 2012 6:07 pm
I agree mgmt_idiot, this box is less than ideal for this kind of task. The onboard NIC is USB as well.

There are a few solutions out there that are much more suitable (though indeed more costly) for networking duties. At least something that has an MII interface for its NIC and an 802.1Q capable switch would be better than the Pi, even if its hardware spec is considerably lower. This SoC is not optimized for networking, and there are plenty that are.
Posts: 351
Joined: Wed Dec 21, 2011 11:49 pm
by h4ppycl0wn » Thu May 10, 2012 1:28 pm
Thread has been dead for a while but running squid + dansguardian on the Pi is exactly the reason why I want one... So when one arrives I will happily share experience. Although I registered at 6:03ish I wasn't in the first 5000 users for RS components :-( so no Pi yet.

I'm trying to understand the reason for 2 NICs or 2 IP addresses raised above. I'm already happily running squid+dansguardian on a single IP in the same subnet as my home hosts + internet GW. To make the setup secure you do need to restrict HTTP traffic through the GW to only the proxy so that the hosts can't send HTTP direct to the internet. The idea of using the Pi was to provide a low-cost, low-power alternative to a PC as the proxy needs to be always on.

I'm thinking that a low-cost, configurable web content filter could be something useful  for home/small networks. I love dansguardians ability to set a custom error message + the editable lists for permitting/denying different sites/keywords/domains makes for a very flexible solution. Everything will depend on the Pi's throughput of course. No one wants a bottleneck :-)
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by error404 » Thu May 10, 2012 6:58 pm
I think most people are wanting to use the Pi both as a gateway and proxy, not using a separate box for the Pi.
Posts: 351
Joined: Wed Dec 21, 2011 11:49 pm
by emg » Thu May 10, 2012 7:25 pm
2 network cards allows the device to become a 'router' and to act as a firewall, this is pretty standard. Yes, you can run with one NIC and some fancy config, but it's probably not suitable for the Rpi. For an example on how this is done, have a look at  Smoothwall Express that provides a web proxy/cache. This might be more suitable on an old PC.
Posts: 87
Joined: Wed Jan 11, 2012 11:01 pm
by nickheppleston » Thu May 17, 2012 10:08 am
Hi All,

Just to give you an update on the squid3+DansGuardian question, I have been doing the same on a SheevaPlug and following my recent delivery of my Pi, have the same setup running on said device.

Def. not as quick as the Sheeva, but it holds its own. The current problem I have is around ipv6 and the Squid3 Debian Package - there is a config issue here somewhere that I need to track down. Other than that, (plain) HTTP traffic being filtered through DG and Squid absolutely fine (litmus test: my 13 year old hasn't complained that facebook is 'running slowly'!)

I did have to rip out a number of un-necessary packages from the Debian distribution provided at raspberrypi.org/downloads but that was relatively trivial. I would like to get a custom built image at some point without any of the graphical packages installed and running - a real 'bare-bones' installation, however I will need to get upto speed with debootstrap before I cross that bridge.

If anyone has any questions re. the setup, please drop me a line.

Rgds, Nick.
Posts: 4
Joined: Thu May 17, 2012 10:01 am
by abishur » Thu May 17, 2012 8:23 pm
what are you using for your second usb ethernet device? I tried using one but it didn't play nice with the pi (extremely large latency)
Dear forum: Play nice ;-)
User avatar
Forum Moderator
Forum Moderator
Posts: 4262
Joined: Thu Jul 28, 2011 4:10 am
Location: USA
by h4ppycl0wn » Mon May 21, 2012 10:01 am
FYI
Just to confirm, I have squid+dansguardian set up on the default debian image using the stock debian packages in apt. It works fine. CPU is topping at 80 -> 90% with a single session but for my small home network its a fine replacement for a PC.

Am having some problems with stability. Pi seems to lock up fairly frequently. I've updated the firmware to latest so hopefully this improves.

FYI: I'm using a single NIC. The RPi sits on the LAN and my gateway router provides the filtering to prevent any device other than the RPi sending HTTP traffic internet. I have not attempted to set up the Pi as a gateway with an extra USB based NIC.
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by nickheppleston » Mon May 21, 2012 10:10 am
Hi h4ppycl0wn,

I have pretty much the same setup, although I'm not seeing anything like that CPU usage and I'm running 3 - 4 filter groups.

I also only have the single NIC.

I am however seeing lock-ups after 30/60 mins of usage, however having issues debugging them as I don't have immediate access to a DVI enabled monitor (I'm connecting over SSH). I did run the stock Debian RPI image, but have recently moved to the stripped down version from bootc.net (v. minimal in footprint and works well).

Out of interest, are you running with the arm224 start.elf on boot to get max memory for the Pi?

N.
Posts: 4
Joined: Thu May 17, 2012 10:01 am
by h4ppycl0wn » Mon May 21, 2012 1:52 pm
Sounds like our setups are pretty much identical. I was seeing lock-ups not as often as every hour. Also running device as headless and connecting via SSH so equally, couldn't debug.

Today I used rpi-updater as described here http://hexxeh.net/?p=328117855 because this other page suggested some kernel panics due to high memory load occur in the stock image (http://www.ctrl-alt-del.cc/2012/05/rasp ... s-ada.html). Wasn't sure this was the cause for my hang ups but updating firmware seemed like a good place to start.

rpi-update by default also switches to 224MB memory split. CPU load during proxy/dansguardian flow now also lower, down to about 30 - 40%. I was originally using default 192MB split so perhaps it was a memory thing and CPU was also being used to page excess memory... no idea, not that skilled at debugging this kind of stuff, more of a networks guy.

So far today, everything stable but lets give it a few days and see.
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by nickheppleston » Mon May 21, 2012 1:57 pm
Sounds good - I was also looking at the rpi-updater tool just this morning and plan on giving it a go this evening after work. Please to hear you're no longer having the locking issues.

Only other this I can point to for the lock-ups is the fact that I am running Wheezy which is a bit more unstable than Squeeze, so there may be a few issues there. I do however have my Wheezy install on a separate SD card to my Squeeze install so I will have a play and see whether I have stability issues with the latest firmware.

N.
Posts: 4
Joined: Thu May 17, 2012 10:01 am
by jrg » Wed Dec 12, 2012 8:26 am
h4ppycl0wn -- Any chance you have a blog or other resource with some instructions for getting squid + dansguardian setup and configured?

Thanks!
Posts: 1
Joined: Wed Dec 12, 2012 8:22 am
by h4ppycl0wn » Thu Dec 13, 2012 2:13 pm
Fair request. In the next day or two I'll post a more extended reply here. It had slipped my mind to do so.
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by h4ppycl0wn » Mon Dec 17, 2012 11:09 am
Hi,

Here's my braindump of setting this up. I don't have time right now to make this a more complete wiki page anywhere but maybe later I cut & paste this to e-linux or somewhere suitable :-)

First... the caveats ...
If you have a home network with a few hosts, the RPi is a perfectly suitable way to provide a web content filter. If you have a school or business network and you need access logs, etc. then Squid and Dansguardian are still suitable tools but you're going to want to install them on a better powered desktop/server.

Also, I won't be explaining is how to prevent users bypassing the RPi and accessing the internet directly because that is a wider topic and dependent on the exact network setup you have. The following would be suggested ways to prevent this:
- Implement desktop controls to prevent people editing their proxy settings. Seems like overkill at home.
- Block all web access on the internet access router from any host other than the RPi and don't let anyone use the RPi as a desktop. Often specific filtering like this is not possible on a low-cost, provider supplied broadband modem.
- Add a second (USB) network port to the RPi and make it a gateway between the hosts & broadband router so all traffic has to flow through the RPi.

Now we've got the negative stuff out the way, lets start at the beginning. I'm assuming you have your RPi installed and connected to a LAN that has the hosts (laptops, PCs, etc.) that you would like to pass through a web content filter.

Step 1 - Add the required packages
===========================
This is as simple as:
# apt-get install squid dansguardian
.. as root (or sudo).
Squid is the proxy server and dansguardian is the content filter.

I recommend the home pages of squid & dansguardian for more info:
http://www.squid-cache.org/
http://dansguardian.org/

Step 2 - Configure the installed services
==============================
(This is from memory, if it contains errors, I apologise... just reply and I'll answer)
The default squid config should be ok. The config is at /etc/squid/squid.conf and you'll need to be root to access that directory.
The key line in this file is:
http_port 3128 transparent
which specifies squid will run on port 3128

The dansguardian config file is at /etc/dansguardian/dansguardin.conf and again, you'll need to be root to edit this file. You need to comment (or delete) this line:
# UNCONFIGURED - Please remove this line after configuration
Just FYI You'll note there is also a line
proxyport = 3128
which specifies which port to find the proxy (squid) running on. Note it also says port 3128 as specified in the squid config. Its rather important these two values match. :-)

There are plenty of other options. The config files are both written comprehensively and you can do worse than simply reading the comments in the config files.

Step 3 - Tune the setup for RPi
========================
Ok, these are optional steps but here's why I carried them out. The RPi has fairly good network throughput for general internet size packets and will certainly have enough network throughput to deal with a typical broadband service (20Mb and less). Perhaps it will struggle on some cable/fibre services running at higher throughputs e.g. 100Mb.

What I found the RPi did struggle with is writing logs & caches. This caused a noticeable delay in web access. I tried using a USB disk as root filesystem and this did not improve the situation and caused other kernel errors. So, I limited caching/logging to the very minimum of that required.

First, lets disable squid writing a local content cache. On a home network there's really very little to be gained from providing a local cache. (FYI a cache is a copy of the pages people have accessed so that squid can serve them directly rather than go out to internet each time to fetch them). In /etc/squid/squid.conf look for the line:
# OPTIONS FOR TUNING THE CACHE
and in this section add a line
cache deny all
This stops squid caching anything at all.

Now, lets turn off squid from logging anything (we'll get dansguardian to log the info we want). In /etc/squid/squid.conf look for the line:
access_log /var/log/squid/access.log squid
we comment out this line and stop logging. Change this config entry to:
#access_log /var/log/squid/access.log squid
access_log none

Finally, lets switch dansguardian from logging every piece of web access and only log those that are denied. Look for the "logging level" section and change it to option "1" like this:
# Logging Settings
#
# 0 = none 1 = just denied 2 = all text based 3 = all requests
loglevel = 1

Step 4 - Start the services
====================
First we start squid with (as root)
# service squid start
If squid fails complaining about swap its because I think before first run these have to be built by running
# squid -z
.. and then you should be able to run # service squid start

You should also be able to start dansguardian with:
# service dansguardian start

Step 5 - Test
==========
Ok, at this point you should be able to configure a browser somewhere on you network and point it to dansguardian. E.g. in windows 7 its Control Panel -> Network and Internet -> Internet Options -> Connections (tab) -> LAN Settings. In firefox its Options -> Advanced (tab) -> Network (tab) -> Settings

In both cases tick "use a proxy" and enter the IP address of your RPi as the proxy address and "8080" as the port. 8080 is the default port used by dansguardian. If necessary select "use the same proxy server for all protocols".

Now when you access the internet you should be routing through the content filter. If you're accessing a permitted page then you shouldn't notice anything different. If you access something undesirable e.g. an adult site, then you should get the dansguardian "this is blocked" page.

Step 6 - Customising what Dansguardian permits/blocks
===========================================
Before you have tuned dansguardian to your needs you will get a lot of false positives i.e. sites blocked that shouldn't be blocked.

All sites that are blocked will be recorded in /var/log/dansguardian/access.log. To show the blocked sites just do a simple search like
# grep DENIED /var/log/dansguardian/access.log

You'll find two things commonly blocked. A certain page/website has scored too highly or a certain filetype is banned. By default dansguardian is quite restricive. e.g. It won't let you download binary/multimedia files and blocks pages that refer to sex, drugs (and rock'n'roll...no, just joking) and proxies.

Luckily, dansguardian has a simple set of files to manage what is blocked.

Dansguardian works on lists of words called "phraselists" which are installed with dansguardian. Some words are considered negative and some are considered positive. Each page is given a total score depending on the words matched and over a certain score, the page is blocked e.g. a site that just talks about "drugs" is likely to get blocked but a site that uses the word "drugs" in combination with words like "health" and "advice" is less likely to get blocked.

The word lists are stored under /etc/dansguardian/lists and the files are sensibly named for you to understand. I'll use two common examples.

If you want to permit all pages from a particular domain e.g. bbc.co.uk then edit the "exceptionsitelist" file and add the domain to the bottom file:
bbc.co.uk
(Note, you don't include the www). Once change, all pages under "www.bbc.co.uk/..." will be permitted no matter what the content. Permitting entire sites is the easiest approach. However, there is also an "exceptionurllist" file if you want to permit a certain page on a site but don't want to allow all the other pages on that site.

If you want to allow download of mp3's / binaries then edit the "exceptionextensionlist" file and add the permitted filetypes. E.g.
# Media
.mp3
.zip
.gz
.bz2
.iso
.exe
.msi
.tar

... and so on with the other files. As well as "exception..." files that permit there are also "banned..." files that will block whatever entries you add.

If you make any changes you will need to restart dansguardian for it to be recognised via:
# service dansguardian restart
OR
# danguardian -r

Step 7 - Customising the dansguardian blocked message
============================================
... and finally, you may want to change the html that dansguardian displays when something is blocked.

In /etc/dansguardian/dansguardian.conf you probably have the following lines:
# -1 = log, but do not block - Stealth mode
# 0 = just say 'Access Denied'
# 1 = report why but not what denied phrase
# 2 = report fully
# 3 = use HTML template file (accessdeniedaddress ignored) - recommended
#
reportinglevel = 3

The HTML template its referring to is stored at:
/etc/dansguardian/languages/ukenglish/template.html
( You can change the language used in dansguardian.conf )

Just edit this template.html to be what you want and that is what is displayed.

Like any other change to dansguardian configuration it is only changed after a restart
# dansguardian -r

===================
That's all I can think of for now.

Cheers.
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by abishur » Mon Dec 17, 2012 5:33 pm
So how did you setup your network? Did you do two ethernet adapters or some other method for routing traffic?
Dear forum: Play nice ;-)
User avatar
Forum Moderator
Forum Moderator
Posts: 4262
Joined: Thu Jul 28, 2011 4:10 am
Location: USA
by h4ppycl0wn » Tue Dec 18, 2012 8:44 am
My network looks:

INTERNET (WAN) --- Router with Switch --- [ RPi & other hosts (LAN / WLAN)

The internet router physically connects broadband on one side & on the other side connects all the hosts including PC's, a Wireless AP and the RPi. The LAN is a single IP subnet.

I'm assuming your question is driven by the desire to understand how to avoid the proxy being bypassed.

In my case, the router has a filter that blocks port 80 & 443 (http & https) from all IP's other than the RPi. This means if anyone tries to go direct to the internet (no proxy) the traffic is blocked by the router. If they send web traffic via the proxy (RPi) then it works.

I am lucky enough in my job (as a networks geek) to have an enterprise router in my house on which these detailed filter lists are possible but I don't think your average service provider supplied, home-user, broadband router would be capable of a filter down to individual IP level.

So, probably the best approach in that situation is to use the RPi with a USB based second ethernet port so that all internet traffic has to go through the RPi like this:

INTERNET --- Router --- RPI --- Switch ---[ All other hosts, pc's, etc.

While throughput will be limited, assuming like the majority of people you have a broadband connection < 50Mbps it would probably be ok. Not sure as I haven't got into configuring the RPi as an IP gateway but maybe another reader of the forum has & can post a guide....

The final security thing to consider is the fact that a user could configure their browser to connect to the RPi proxy on port 3128 (on which squid runs), rather than 8080 on which dansguardian runs. This would proxy the users traffic to the net but not filter anything. The easiest way to avoid this is to configure squid to only allow connections from "localhost" rather than any local IP subnets. I completely forgot to add this step originally. In short, you change /etc/squid/squid.conf to:
#http_access allow localnet
http_access allow localhost

One last thing, different but related topic. If you want to have content filtering on an RPI desktop...

...you can still do this by pointing the browser on the RPi to "localhost" or "127.0.0.1" and port 8080. In this case, if you want to block users from bypassing the proxy you need to use iptables to filter based on the user who is sending the traffic. There's a good description of how to do this here:
http://www.howtoforge.com/dansguardian- ... .10-karmic
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by abishur » Tue Dec 18, 2012 1:25 pm
That was my question thanks :-) I've set it up using the two ethernet ports, with the setup you've mentioned the content filter could be bypassed even by just changing the port your browser uses right? (which is to say set the proxy to your router gateway and use a port other than 80)
Dear forum: Play nice ;-)
User avatar
Forum Moderator
Forum Moderator
Posts: 4262
Joined: Thu Jul 28, 2011 4:10 am
Location: USA
by h4ppycl0wn » Tue Dec 18, 2012 3:10 pm
Not sure I'm entirely understanding the question but... no... If you configured your browser proxy settings to point to the router (not the RPi) then the packets will get ignored because the router is not acting (serving) as a proxy and wouldn't know what to do with a TCP packet arriving on port 8080 or similar.

You could, theoretically, re-configure your browser to point to a proxy somewhere else on the public internet and bypass the RPi and router filters. I am aware there are public, anonymising proxies out there.

So.... essentially you're right, if you want to avoid the proxy being bypassed, it must sit as a layer 3 gateway between your hosts & the internet.

How is the performance of the RPi as a gateway with a USB ethernet port? I'm interested.

It seems this would be the most likely setup for home where people will be dealing with a simple DSL modem or similar. Would be nice to know it works ok.
Posts: 17
Joined: Thu May 10, 2012 12:12 pm
by abishur » Tue Dec 18, 2012 4:33 pm
h4ppycl0wn wrote:Not sure I'm entirely understanding the question but... no... If you configured your browser proxy settings to point to the router (not the RPi) then the packets will get ignored because the router is not acting (serving) as a proxy and wouldn't know what to do with a TCP packet arriving on port 8080 or similar.

You could, theoretically, re-configure your browser to point to a proxy somewhere else on the public internet and bypass the RPi and router filters. I am aware there are public, anonymising proxies out there.

So.... essentially you're right, if you want to avoid the proxy being bypassed, it must sit as a layer 3 gateway between your hosts & the internet.

How is the performance of the RPi as a gateway with a USB ethernet port? I'm interested.

It seems this would be the most likely setup for home where people will be dealing with a simple DSL modem or similar. Would be nice to know it works ok.


Yes, you answered the question that I asking horribly (shesh, that seems to be a theme with me this morning :roll: ) It's been a while since I had my pi acting as a bridge, and I never got around to setting it up to act as content filter under this configuration, but as just the bridge it handled things fairly well. If push came to shove I'd say I saw a performance hit to basic web browsing, but I never got around to running any speed tests so that might have been a mental perception. :?:

My main thoughts in avoiding proxy bypass when it comes to internet sites which do that for you is to use Dan's Guardian to blacklist the sites I can find (and keep an eye open in the logs for any unknown site being accessed over and over again indicating a new proxy bypass site) and to only let kids have a basic user account (which is a good practice anyways) so they can't change DNS settings to try and bypass the pi and use some open DNS type site.

I'm trying to think of ways that the average home user with their random router could plug the pi into one of the ports and have it server as a transparent proxy without having to make changes to every PC that connects to it, but sadly I cannot! I have a dinky little router that I've turned into an impressive thing via DD-wrt, but, most people wouldn't want to / have the ability to do something like that. *sigh* I'm probably just asking for too much, If only there was a cheap little board like the pi that had several Ethernet ports and a wireless feature... so basically a router with the power and flexibility of the pi :lol:
Dear forum: Play nice ;-)
User avatar
Forum Moderator
Forum Moderator
Posts: 4262
Joined: Thu Jul 28, 2011 4:10 am
Location: USA
by h4ppycl0wn » Wed Dec 19, 2012 10:01 am
A Pi with lots of LAN ports, that'd be nice .. but seems unlikely.

Here's where I personally think this topic sits. I honestly think that if your average home user is looking for content filtering & desktop protection then they are best off looking at desktop tools & other assistance from their service provider.

The RPi is also never going to be powerful enough to capture all the logging & deal with the high level of traffic for a more formal office/school infrastructure.

However, for any young (or old) geek with the personal motivation to learn network ops, the RPi can act as a cheap training ground to learn about IP networking at home (which seems fairly important as its the basis of both the internet & mobile networks) e.g. dhcp, filesharing, name resolution, routing, proxy & filtering. The skills learned could be easily applied to higher spec. hardware to propose a more "professional" solution in a formal role.

I promised myself I would write a "learning enteprise networking via the RPi" set of wiki pages but job & life got in the way. This thread has re-stimulated my interest. Hopefully over the holiday break I can find a day or two to make some effort on this topic as an escape from consuming endless food. :)
Posts: 17
Joined: Thu May 10, 2012 12:12 pm