Natural Language Toolkit computational linguistics on a Pi

7 posts
by thekeywordgeek » Fri May 18, 2012 2:32 pm
I am setting up my Pi to run a Python port of my keyword analysis system which currently runs in PHP on my Windows laptop ( I hope this is the right place for this post, getting this software right will probably become quite a lengthy project for me.

This isn't a hardware project, in fact it could be done on any internet connected computer capable of running Python. What the Pi brings me though is the ability to leave a computer on 24/7 processing feeds without using significant amounts of power.

It uses the Natural Language Toolkit (, a very powerful set of computational linguistic libraries and a fascinating set of toys in their own right. I'm writing about it here bacause I'm sure other people will want to use the NLTK on the Pi, and though the installation process isn't arduous I couldn't find any FAQs or HOWTOs about it written with respect to the Pi.

So as a first post, here's how to install NLTK on your Pi. I'm using the currently downloadable Debian image, the process is not likely to be vastly different for other distributions with the exception of substituting your package manager for apt.

First port of call: The basic instructions are below, with my pi-specific observations.

  1. Open a prompt and type python -V to find out what version of Python is installed. On my Pi, the version is Python 2.6.
  2. Install Setuptools: Point your browser (in my case Midori) at and download the corresponding version of Setuptools (scroll to the bottom, and pick the filename that contains the right version number and which has the extension .egg).

    In my case the file required was I downloaded it and saved it in my home directory (/home/pi).

    Install it by typing sudo sh Downloads/setuptools-...egg, giving the location of the downloaded file.
  3. Install Pip: run sudo easy_install pip
  4. Install Numpy: Now the NLTK page suggests running run sudo pip install numpy --upgrade, but sadly that failed on my distribution citing missing libraries. Fortunately it is available ready-compiled via apt, so I ran sudo apt-get install python-numpy
  5. Install NLTK: run sudo pip install nltk --upgrade
  6. Test installation: run python then type import nltk

After following the steps above I had a fully functional NLTK. (!/thekeywordgeek/status/202876787301158912)
If you've followed these instructions and are wondering what you can do with NLTK, I suggest looking at and skipping to "Getting started with NLTK".
I make and sell radio kits for the Raspberry Pi and more.
User avatar
Posts: 105
Joined: Fri May 18, 2012 1:48 pm
by filmo » Wed Aug 08, 2012 8:26 am
Awesome. Worked flawlessly. I'm running 2010-07 Wheezy
Posts: 7
Joined: Tue Aug 07, 2012 5:44 pm
by thekeywordgeek » Wed Aug 08, 2012 9:50 am
Good to know it works on Wheezy too. I'm still running my NLTK stuff on Squeeze, haven't updated that SD card yet.

What are you using it for, just out of curiosity?
I make and sell radio kits for the Raspberry Pi and more.
User avatar
Posts: 105
Joined: Fri May 18, 2012 1:48 pm
by winwaed » Fri Aug 24, 2012 7:31 pm
Worked for me here with Wheezy too.

Note that you need to use a larger SD card and expand the Wheezy partition if you want to do anything more than a minimal install. Eg. a user will probably want to install some of the corpora, data models, etc.
Also I had a lot of trouble with - it wouldn't download all the data automatically. I had to manually pick which corpora/etc I wanted. It *seemed* like a timing issue with the larger downloads but I don't know for sure.
Posts: 19
Joined: Wed Mar 07, 2012 9:13 pm
by winwaed » Fri Aug 24, 2012 8:26 pm
I was going to start a new thread, but perhaps this is the best place to post...

When I was a kid there were a lot of cool or potentially cool things that could be done with a compute. Of course most were beyond the immediate abilities. Things like photorealistic 3d, robotics, AI, etc.
(as an aside, my own dabbling with 3d wireframe graphics introduced me to matrix multiplication and trigonometry a few years before high school thought I needed to be taught such stuff!)

Obviously with the design, the various breakout boards, GertBoard, etc, I think the robotic/peripheral side of things is covered. I wish I had such stuff as a kid. About all I did was a 1 bit (PWM) audio sampler for a 286. Playback was through the m/board "beep" speaker and had a surprisingly high fidelity!

Anyway, when it comes to AI, and especially natural language processing, things have progressed a lot in the last 30 years. Is there scope for some educational projects based around NLTK? Such projects would be considered advanced when compared to Scratch or "introductory Python", but i think a wide range of interesting functionality could be provided. Such projects should be interesting, not trivial, but also use a black box approach where necessary (e.g. Machine Learning models are probably best left as black boxes, although advanced students could read up on how the simpler models work). They could bring programming (Python) and language (word types, phrases, syntax, semantics) together in a similar way that I found computer graphics brought mathematics and programming together.
Perhaps some kids could mash projects together - perhaps using NLTK with their robotic creations?

What do people thing?
I could help with the code side of things, but I would need assistance when it came to putting project classes together (they shouldn't be too complex/advanced, and they shouldn't talk down or be trivial), and getting such projects in-front of kids.
Posts: 19
Joined: Wed Mar 07, 2012 9:13 pm
by thekeywordgeek » Mon Aug 27, 2012 9:34 pm
It's certainly an idea for a cool project, is there a Python speech recognition library too?

Outside my experience though I'm afraid, my use of NLTK is aimed very much at humans.
I make and sell radio kits for the Raspberry Pi and more.
User avatar
Posts: 105
Joined: Fri May 18, 2012 1:48 pm
by winwaed » Tue Aug 28, 2012 2:08 am
I don't know of one, and if there is, it is almost certainly a set of wrappers for a library coded in C++ or something like that.

Yes, only used it with text and humans.
Posts: 19
Joined: Wed Mar 07, 2012 9:13 pm