Best Database


31 posts   Page 1 of 2   1, 2
by gjs » Tue May 22, 2012 10:08 am
What would be a good database to use for:

Application to be written in python
Simple structure (one table) but large, up to several GB
Runs locally on the RPi as part of the application, not on a server

The programmer is:

Fairly new to Linux
Very new to Python
Wants to get up and running quickly rather than have the most elegant/efficient solution.

Thanks in advance for any suggestions.
Posts: 148
Joined: Fri Dec 02, 2011 8:04 pm
by SN » Tue May 22, 2012 10:25 am
MySQL would be the obvious choice
Steve N – binatone mk4->intellivision->zx81->spectrum->cbm64->cpc6128->520stfm->pc->raspi ?
User avatar
Posts: 1009
Joined: Mon Feb 13, 2012 8:06 pm
Location: Romiley, UK
by walney » Tue May 22, 2012 10:44 am
Any option would require compilation from source.
MySQL should work OK - I was happily running a v4 database and tools last year on an 800MHz Pentium 3 with 256MB RAM on Windows 2K. But you might run into a couple of difficulties. The toolset has now moved into an all-in-one, MySQL Workbench, which is a lot more of a memory hog. Since you have only one table, you could probably do without this and run from the command line, however. The other issue depends on how much data you need to keep in memory.

As long as it's a single-user database, a second option might be to look at SQLite (http://sqlite.org). A quick google brought up this tutorial: http://zetcode.com/db/sqlitepythontutorial/ which runs through how to interface with Python. Personally, I think ultimately SQLite might be the way to go. It's renowned for being light on resources, and it's already integrated into both Android and iOS (so much more likely to have any ARM issues sorted out). (You will still need to bear in mind the in-memory dataset caveats.)
User avatar
Posts: 233
Joined: Wed Nov 30, 2011 6:57 pm
by tufty » Tue May 22, 2012 10:56 am
SN wrote:MySQL would be the obvious choice

And PostgreSQL would be a better one if you care about your data :)

Seriously, MySQL comes with several fairly large caveats with regards to standardisation and moving your data, queries and code across to other platforms. Use with caution. Obviously, if you understand what the problems are, then it's worth looking at, and if you absolutely definitely require something that's (at least available under the) GPL, then MySQL is a decent enough choice. Although Ingres is also available under the GPL these days, so...

MySQL is owned by the evil Oracle, so if you're a Google fanboy, stay away.

A single, massive, table shouldn't be a problem for any of the open source databases out there (although, from a distance, it smells like a bad design), and only having a single table should mitigate the issues with MySQL's support for foreign key constraints.

Really, I'd go with PostgreSQL or Ingres if SQL support is necessary.

If you can live without SQL, Berkley DB might well be a decent one to use. It's got a small overhead for your application, requires no "server", and can certainly handle the size of data you're talking about. But again, it's owned by Oracle :(

Be aware that *any* database system is going to hit your SDcard pretty damned hard. Performance is going to be poor.

Simon
Posts: 1365
Joined: Sun Sep 11, 2011 2:32 pm
by Pirx-Danford » Tue May 22, 2012 1:06 pm
I would go for SQLite on the Pi, because getting Postgres or MySQL running would require a massive amount of learning and tweaking the Pi.

Look at that tutorial to get started
http://zetcode.com/db/sqlitepythontutorial/

Personally I most often used MySQL but for new projects I would go with Postgres by now, because it just has a ton more features and a much better design.
But on the Pi - go for SQLite, else the database system itself will take up much too much resources.
Posts: 62
Joined: Thu Oct 06, 2011 2:05 pm
by AlArenal » Tue May 22, 2012 1:11 pm
gjs wrote:Runs locally on the RPi as part of the application, not on a server


Depends a bit on what exactly you mean by that. The aforementioned MySQL is an external software, a server, that of course can run on the Pi beside your app. What people sometimes get wrong is the word "server" which just stands for a system waiting for requests and answering to them. It's in fact always a software that does the job as a computer without software does nothing at all. Commonly machines dedicated to running such software are called servers, too. But that does not mean, that you have to have a dedicated machine as a database server for your app. Most web server setups run web, database and often mail servers on one and the same machine.

A database system that's not running as a system service and can be used within your software is SQLite. It's also pretty widely used. The system would have to have the SQLite libraries installed, so it's not actually written in Python (it's written in C) but has bindings to a lot of languages including Python.
Posts: 141
Joined: Sun Feb 26, 2012 6:58 pm
Location: Germany
by gjs » Tue May 22, 2012 8:36 pm
Thank you all for your replies. I have heard of SQlite and so it's interesting that this features highly in the replies.

To clarify a few points:

The application I am thinking of writing will run on the Pi and the database will reside on the Pi. Only the application will access it, there is no (multiple) user scenario. The app is a simple low speed data logger for NMEA strings coming in through the serial port.

In the past I have used text files for logging but I thought a move to a database would be better because I do want to periodically run queries from within the application e.g. retrieve all data recorded between certain dates, or all records where field y has a value greater than x etc.

I was planning to write the data to a USB flash drive or SSD, while running the OS and program from the SD card so it should not get hammered. I'll have to work out how long a flash drive will last with the amount of data being logged.

I also want to log the data on a round-robin basis so that it never gets full i.e. oldest data is overwritten when a preset size is reached. Does SQLite or the Python bindings have any built in mechanism for this?

Thanks again.
Posts: 148
Joined: Fri Dec 02, 2011 8:04 pm
by AlArenal » Tue May 22, 2012 8:47 pm
gjs wrote:I also want to log the data on a round-robin basis so that it never gets full i.e. oldest data is overwritten when a preset size is reached. Does SQLite or the Python bindings have any built in mechanism for this?


Just delete the oldest row in your table when you consider it is "full" and you are about to insert a new row. You could use a serial number or a timestamp on your rows for that.
Posts: 141
Joined: Sun Feb 26, 2012 6:58 pm
Location: Germany
by error404 » Tue May 22, 2012 9:20 pm
Given your requirements I would definitely say SQLite is the right choice. You could implement a trigger to automatically prune records older than a certain date (if you store a timestamp), or the oldest ones when the count is more than you set or whatever. Or you could just do it in your application code.

Just beware that SQLite is even looser than MySQL when it comes to data integrity and consistency, so you'll have to be a little more careful in your application code.

Also glad to see MySQL getting the smack down. :)
Posts: 351
Joined: Wed Dec 21, 2011 11:49 pm
by fromagique » Tue May 22, 2012 10:49 pm
I would concur on using SQLite, but I would add that personally I would use Django to interface with the database. Django is a web toolkit, but you can use its Models to work with a database without using any of the other parts. I would use it because I already know Django from web work and I find it very amenable, but if you're just starting out with object-relational mappers (ORM) you could also look at SQLAlchemy, which I understand is quite good as well. Using an ORM like Django Models or SQLAlchemy makes things much smoother and pythonic.

You can find documentation on the models part of Django here: https://docs.djangoproject.com/en/1.4/topics/db/models/

I just wanted to give you another option that might prove easier to work with than raw SQL.
Posts: 31
Joined: Tue May 22, 2012 4:41 am
by fromagique » Tue May 22, 2012 11:10 pm
One thing to note about SQLite in regard to your round-robin requirement is that by default it always writes to the end of the database file. This means you can copy an SQLite database while it's being used, but you need to use VACUUM to reclaim space, and for that operation "as much as twice the size of the original database file is required in free disk space." (quoted from the SQLite docs, found here: http://www.sqlite.org/lang_vacuum.html and here: http://www.sqlite.org/pragma.html#pragma_auto_vacuum)
Posts: 31
Joined: Tue May 22, 2012 4:41 am
by diarmuidwrenne » Wed May 23, 2012 7:49 pm
Hi,
Sqlite is a great small database and would be the only database solution for you. It has tight integration with python and has a low footprint. It produces a single file and can be easily moved across platforms. I have used it for years in an embedded solution with very few issues. That said I would say a few things about it:
It is not thread safe so two separate python programs cannot use the one database safely (you can use threads from within the one app).
Writing loads (you said a GB) of data to the sd card will kill it quite quickly (don't forget they are designed for use in cameras, with say 100,000 writes over their life. One solution that I use is to create a database on /tmp and copy it over to the card on a schedule. I do mine daily. It is not mission critical data. That way the card is only touched for write once.
You might be better off logging to a file and copying that over to the sdcard on a schedule. You could use a clever directory structure to log the files so that they could easily be looked up.
You might want to check the journal mechanism of the sdcard format to make sure that it can take the many writes you are proposing.
Alternativly if you are on the web, log to a webserver. I do that as well so all my data is secure and the sqlite db is only used for local business logic decisions
Cheers

Diarmuid
Posts: 1
Joined: Wed May 23, 2012 7:37 pm
by xkxx » Wed May 23, 2012 10:08 pm
Should you decide to stick with SQL, I'd recommend SQLite as well.
Otherwise, I personally use Redis for REST API development. The 2.2 version of MongoDB looks promising, too.
Posts: 39
Joined: Sat Dec 10, 2011 5:26 am
by spurious » Wed May 23, 2012 10:19 pm
linked list data seems to fit your requirement. are you sure you need the overhead of a database?
Posts: 343
Joined: Mon Nov 21, 2011 9:29 pm
by gjs » Wed May 23, 2012 10:41 pm
Thanks for your replies.

I'm planning to log to a USB flash drive, not the SD card. I've done some calculations today and the life, surprisingly, is not an issue. It's many years.

Spurios: Can you explain what you mean by linked list data?

Also, VACUUM is the only way to reclaim space and that uses twice the space of the whole database. Is that right? If so, that will preclude a round-robin arrangement completely.

And I can't access the data from a seperate program while still logging? Hmm...I can do better than this with text files!
Posts: 148
Joined: Fri Dec 02, 2011 8:04 pm
by error404 » Wed May 23, 2012 11:33 pm
VACUUM works basically by copying the data to a new file and then deleting the old file, so if the file is all empty space it won't require twice the database size, just twice the data size. It's still a pretty inefficient operation obviously. auto_vacuum is probably a good solution for your case. You should have very little fragmentation due to your round robin approach, all your data should be ordered and your deletes will leave large blocks empty to be reclaimed.

There's a difference between thread safety and concurrency. The library being thread safe means that its internal data structures won't be corrupted if multiple threads (of the same process - different processes don't share state so can't break libraries) access the library simultaneously. Concurrency is about how many concurrent transactions can be running against the database at a time. SQLite3 is both thread safe and concurrent, but its locking is quite a bit less granular than most true RDBMS', so a writing transaction is more likely to stall readers. Locking is implemented such that the database will remain consistent, though race conditions are possible and writers can be blocked until the locks from other transactions are freed. Generally multiple readers and a single writer will never cause locking problems. You can read more about locking and concurrency in SQLite3 here.

I don't think SQLite 2 is still in common use anywhere, but much of this probably does not apply to that version.
Posts: 351
Joined: Wed Dec 21, 2011 11:49 pm
by FeersumEndjinn » Mon May 28, 2012 1:30 pm
Pirx-Danford wrote:I would go for SQLite on the Pi, because getting Postgres or MySQL running would require a massive amount of learning and tweaking the Pi.



Seconded - SQLite is a single easy to work-with file which is ideal for someone new to databases and Linux, as they then only have 1 file to backup (outside of their python code).
Morrolan / FeersumEndjinn

"And the lord God said unto John, 'Come forth and receive eternal life', but John came fifth and won a toaster."
User avatar
Posts: 144
Joined: Mon Nov 21, 2011 4:02 pm
by tickett » Mon May 28, 2012 3:59 pm
This is an interesting thread- as i intend to do something similar as part of a home automation project.
Posts: 15
Joined: Sun Feb 12, 2012 6:09 pm
by mkeeley » Tue May 29, 2012 5:41 pm
For your requirements SQLite would be fine, MySQL, Progress etc are simply overkill.
Posts: 29
Joined: Sun Feb 19, 2012 12:44 am
by Sutty » Tue Jun 05, 2012 2:38 pm
Hi,

I'm also very new to Debian and python and would like to get Sqlite working. So any help on getting it onto the Pi would be useful. As for a round-robin I would have thought that its very simple if you use 2 tables. One pre-populated with dummy rows and the other which just stores the last row written. You then only need to update rows in the main table (and increment the count) until the count reaches a max and then start from 1 again. You can always add more sophistication by storing additional analytical info in the count/control table.

Good luck

Dave
Posts: 12
Joined: Tue May 29, 2012 10:20 am
by error404 » Tue Jun 05, 2012 9:00 pm
Not sure why I didn't mention it before, but for round robin data, depending on the type of course, you may just want to use rrdtool. It's kind of limited in the types of data it can hold and how you can query it, but for lots of logging type datasets it works very well, and can generate fancy graphs for you and so on.

SQLite has been included with Python since 2.5. You don't need to install anything extra to use it (though the sqlite3 command line tools would probably be useful to look at your data, create the schema and so on). See http://docs.python.org/library/sqlite3.html
Posts: 351
Joined: Wed Dec 21, 2011 11:49 pm
by HansH » Tue Jun 05, 2012 9:32 pm
sqlite if you require sql.... but that means also some overhead.
But if you want to have a really light and fast db for 1 table, you should take berkeley db

Both are used for embedded systems and both have support in python...
Depends a bit on how you will access the data.

BDB is twice the size of SQLite. A comparison between BDB and SQLite is similar to a comparison between assembly language and a dynamic language like Python or Tcl. BDB is probably much faster if you code it carefully. But it is much more difficult to use and considerably less flexible.


So it depends on your usage ....
.
Posts: 212
Joined: Mon Sep 05, 2011 7:49 am
by John Cooke » Thu Jun 07, 2012 8:42 pm
SQLite is the #1 choice. Python integration is a doddle, especially since a row returned by execute() is an iterable.
To do round-robin you can use the INSERT OR UPDATE sql statement with a WHERE field = sequenceNo % rowmax;
(or you can do it in other ways!)

:)
Posts: 3
Joined: Thu Jun 07, 2012 8:24 pm
by John Cooke » Sat Jun 09, 2012 12:09 pm
Here is one way of doing it in Python using SQLite.
The log table has a given number of "slots" (NSLOTS, which in my test I only set to 10, but you'd want to set it to some sensibly large number). The next slot number is stored in the metadata table and is updated on each insert. So every time you want to log a message, you can either:
  • retrieve the next slot number from the metadata table
  • keep track of the next slot number (it is returned by the add/append functions)
Obviously you'd want a richer schema that just slot+message (timestamp, severity, etc) but that should be straightforward

Enjoy
Code: Select all
import sqlite3

NSLOTS=10

def create_tables(dbname):
    db=sqlite3.connect(dbname)
    db.execute("CREATE TABLE log (slot INT PRIMARY KEY, data TEXT)")
    db.execute("CREATE TABLE metadata (metakey TEXT, metavalue INT)")
    db.execute("INSERT INTO metadata VALUES('nextslot',0)")
    db.commit()
    db.close()

def addlogmessage(db, slot, msg):
    'Add a message to the log in the given slot.'
    # slot is a value between 0 and NSLOTS
    db.execute('BEGIN TRANSACTION')
    db.execute('INSERT OR REPLACE INTO log VALUES(?,?)', (slot,msg))
    nextslot = (slot+1) % NSLOTS
    db.execute("UPDATE metadata SET metavalue=? WHERE metakey='nextslot'", (nextslot,))
    db.commit()
    return nextslot

def appendlogmessage(db, msg):
    'Add a message to the log in the next slot.'
    # get next slot
    cur = db.cursor()
    cur.execute("SELECT metavalue FROM metadata WHERE metakey='nextslot'")
    nextslot = cur.fetchone()[0]
    # add the message at that slot location
    return addlogmessage(db, nextslot, msg)
Posts: 3
Joined: Thu Jun 07, 2012 8:24 pm
by timhoffman » Mon Jun 11, 2012 3:47 am
Hi

If you don't care about SQL, then a native python object datastore called ZODB is available.

http://www.zodb.org/

It has a single DB file support and can also transparently support network access to
are remote store by using ZEO.

Cheers

Tim
Posts: 85
Joined: Sat Nov 05, 2011 11:31 pm