This post is part 1 in a little how-to series on how to use the IBM Informix database on the Raspberry Pi to build an efficient sensor data data database for local sensor data analytics/processing/caching.
The other posts can be found here:
viewtopic.php?f=37&t=97199&p=674959 (Informix Intro and Installation)
viewtopic.php?f=37&t=100029&p=693935 (An Informix Sensor DB - Part 2)
viewtopic.php?f=37&t=137392&p=912405 (The Informix REST API - Part 3)
viewtopic.php?uid=131887&f=37&t=140398 (Round robin sensor data storage with IBM Informix - Part 4)
Hi,
since I posted a quick How-To on how to create a simple Informix database instance on Raspian a few days ago, I wanted to show you how easy it is to create an Informix based sensor / time series database.
Before I start, just a little bit of technology background on how Informix is handling sensor data:
Informix is an (object-) relational database and hence its usage and its SQL language very similar to other relational databases which are available for the RPi. BUT, it has built-in enhancements to very efficiently handle time stamped (e.g. sensor-) data plus JSON documents.
Let's have a look on how one would typically store sensor data in any relational database. One would create a table, similar to the following:
Code: Select all
create table sensor_data
(
sensor_id char(40) not null ,
sensor_type char(20),
sensor_unit char(6),
timestamp datetime year to fraction(5),
value decimal(15,2),
primary key (sensor_id)
);
Code: Select all
create unique index sensor_data_ix1 on sensor_data(timestamp, sensor_id);
The IBM Informix approach on how to handle sensor / time series data
Informix supports an extended SQL data type called 'TIMESERIES'. An Informix database table can have one or more TIMESERIES columns. A TIMESERIES column can store time series data with regular or irregular time stamp intervals.
An example for an 'irregular time series' would be some kind of event log or a sensor which generates its data in changing measurement intervals. A 'regular time series' could be a sensor which either produces data in equal time intervals (e.g. every 60 seconds) or is being read out in equal intervals (e.g. once every 15 minutes).
Informix is storing each time series type in a database object called 'container'. The storage behavior is optimized for each type ('regular', 'irregular').
In the case of a regular time series Informix only stores the actual 'payload' of a time series (e.g. a sensor's measurement value). Time stamp information are ignored and meta data are only stored once for each unique sensor id. For a regular time series time stamp information can be ignored, since Informix knows when a time series started ('origin') and in what kind of interval the data is provided (through a so called 'calendar' and 'calendar pattern' concept).
For an irregular time series the time stamps are stored together with the payload, but the meta information is again only stored once.
Compared to a traditional relational approach you will notice a significant (e.g. up to 60- 80%) less storage usage by using the Informix TIMESERIES data type. In addition all of the time series data will be nicely clustered on disk. The combination of optimal clustering and less disk usage leads to a much better I/O performance compared to a pure relational approach.
To work with the time series data, Informix provides a built-in set of more than 100 SQL time series focused functions which support an application developer in accessing the data on the database w/o the requirement to retrieve them into a client application first. So instead of e.g. reading thousands of records into an application for validation or aggregation, one simply does those steps in the database and only returns the results back to the client. If there is a requirement for a specialized function, one can write a server side function either in C or in Java (that Informix functionality had been inherited from Illustra, an early commercial version of PostgreSQL).
I don't want to spend too much time now on the additional technical details, but I should at least mention that Informix has a built-in time series data high performance loader SQL API which can achieve e.g. on a current Intel CPU (I guess I need to run a benchmark on an RPi soon) about 60k sensor values per CPU(core) per second with a near linear scalability.
Ok, enough theory, let's get started.
In my How-To guide I will build upon the the relational example above, but by using the Informix TIMESERIES data type instead.
Create a basic sensor database with Informix
In my example I will create a sensor database with with a sensor data table to hold the data for sensors which are producing measurements in 1-minute intervals. So we will be dealing with regular time series.
In the very first step let's create an empty Informix database which will become the 'playground' for all my future how-tos. I am using the Informix instance which I created in my other How-To guide on creating an Informix instance.
Code: Select all
echo "create database sensor_db with log" | dbaccess - -Code: Select all
create row type sensor_t
(
timestamp datetime year to fraction(5),
value decimal(15,2)
);
The 'value' element will later store our measurement value.
In the next step we need to create a 'container' which will store the actual time series data. Each container is being optimized for a specific time series type. Let's create a container named 'sensor_cont' for the time series row type 'sensor_t' in the Informix storage area 'rootdbs' with an initial container extend size of 2048 kB and a next extend size of 2048 kB:
Code: Select all
execute procedure TSContainerCreate (
'sensor_cont',
'rootdbs',
'sensor_t',
2048,
2048);
Code: Select all
create table sensor_ts (
sensor_id char(40),
sensor_type char(20),
sensor_unit char(6),
sensor_values timeseries(sensor_t),
primary key (sensor_id)
) lock mode row;
To make it easy for us to use that table in a way that the most of you are used to it, I am now creating a virtual, relational view/table on the time series data. Those virtual tables are called 'Virtual Table Interface' tables or simply VTI tables. Those VTI tables allow an easy relational-like handling of Informix time series data.
I am creating a VTI table with the name 'sensor_data' (like the relational table in the beginning of my post) based on the time series base table 'sensor_ts'.
Code: Select all
execute procedure TSCreateVirtualTab (
'sensor_data',
'sensor_ts',
'origin(2015-01-26 00:00:00.00000),calendar(ts_1min),container(sensor_cont),threshold(0),regular',0,'sensor_values');
After executing that stored procedure, you will have a virtual table called 'sensor_data' with the following schema:
Code: Select all
create table sensor_data
(
sensor_id char(40) not null ,
sensor_type char(20),
sensor_unit char(6),
timestamp datetime year to fraction(5),
value decimal(15,2)
);
Code: Select all
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-26 08:00"::datetime year to minute, 21.5);
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-26 08:01"::datetime year to minute, 21.6);
insert into sensor_data values ("Sensor02", "Temp", "C", "2015-01-26 15:45"::datetime year to minute, 35.9);
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-26 08:02"::datetime year to minute, 22.1);
insert into sensor_data values ("Sensor02", "Temp", "C", "2015-01-26 15:46"::datetime year to minute, 35.2);
insert into sensor_data values ("Sensor02", "Temp", "C", "2015-01-26 15:47"::datetime year to minute, 33.5);
Each time you insert a row with an existing 'sensor_id' into the VTI table 'sensor_data' the provided value is either added or updated (depending on its time stamp) in the 'sensor_values' column.
With those few example rows inserted into the database, now let's work bit with them. Do a 'SELECT *' on the sensor_data table first:
Code: Select all
echo "select * from sensor_data" | dbaccess sensor_db -
Database selected.
sensor_id Sensor01
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 08:00:00.00000
value 21.50
sensor_id Sensor01
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 08:01:00.00000
value 21.60
sensor_id Sensor01
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 08:02:00.00000
value 22.10
sensor_id Sensor02
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 15:45:00.00000
value 35.90
sensor_id Sensor02
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 15:46:00.00000
value 35.20
sensor_id Sensor02
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 15:47:00.00000
value 33.50
6 row(s) retrieved.
Database closed.
Code: Select all
echo "SELECT sensor_id, sensor_type, sensor_unit FROM sensor_ts" | dbaccess sensor_db -
Database selected.
sensor_id sensor_type sensor_unit
Sensor01 Temp C
Sensor02 Temp C
2 row(s) retrieved.
Database closed.
Before I finish that first and very basic part on how to use the Informix time series data type, I would like to give you a glimpse on what you can do with the stored data. Before I do that let's add a few additional example rows through our 'sensor_data' VTI table:
Code: Select all
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-26 07:58"::datetime year to minute, 20.2);
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-26 07:59"::datetime year to minute, 20.7);
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-27 10:11"::datetime year to minute, 26.3);
insert into sensor_data values ("Sensor01", "Temp", "C", "2015-01-27 10:45"::datetime year to minute, 26.9);
Such an aggregation can be done in Informix either on the VTI table level and by using traditional SQL statements, or more efficiently by using some of the built-in time series functions on the time series base table.
In the following SQL example we are aggregating the minute interval based data for 'sensor_id' 'Sensor01' to an hourly granularity:
Code: Select all
SELECT AggregateBy(
'SUM($value)',
'ts_1hour',
sensor_values,
0,
'2015-01-26 00:00'::datetime year to minute,
'2015-01-31 23:59'::datetime year to minute)
FROM sensor_ts
WHERE sensor_id = "Sensor01";
Code: Select all
(expression) origin(2015-01-26 00:00:00.00000), calendar(ts_1hour), container(sensor_cont), threshold(0), regular, [NULL, NULL, NULL, NULL, NULL, NULL, NULL, (40.90), (65.20), NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, (53.20)]
1 row(s) retrieved.
If you need a different aggregation granularity, simply supply a different calendar to the 'AggregateBy()' function. In the following example I am using exactly the same SELECT statement like before, except for a different calendar ('ts_1day'):
Code: Select all
SELECT AggregateBy(
'SUM($value)',
'ts_1day',
sensor_values,
0,
'2015-01-26 00:00'::datetime year to minute,
'2015-01-31 23:59'::datetime year to minute)
FROM sensor_ts
WHERE sensor_id = "Sensor01";
Code: Select all
(expression) origin(2015-01-26 00:00:00.00000), calendar(ts_1day), container(sensor_cont), threshold(0), regular, [(106.10), (53.20)]
1 row(s) retrieved.
Code: Select all
SELECT sensor.timestamp::datetime year to hour, sensor.value FROM TABLE (
TRANSPOSE ((
SELECT AggregateBy(
'SUM($value)',
'ts_1hour',
sensor_values,
0,
'2015-01-26 00:00'::datetime year to minute,
'2015-01-31 23:59'::datetime year to minute)
FROM sensor_ts
WHERE sensor_id = "Sensor01"
))::sensor_t
) AS TAB (sensor);
Code: Select all
timestamp value
2015-01-26 07 40.90
2015-01-26 08 65.20
2015-01-27 10 53.20
3 row(s) retrieved.
Finally re-run the daily aggregation in a similar way:
Code: Select all
SELECT sensor.timestamp::datetime year to day timestamp, sensor.value FROM TABLE (
TRANSPOSE ((
SELECT AggregateBy(
'SUM($value)',
'ts_1day',
sensor_values,
0,
'2015-01-26 00:00'::datetime year to minute,
'2015-01-31 23:59'::datetime year to minute)
FROM sensor_ts
WHERE sensor_id = "Sensor01"
))::sensor_t
) AS TAB (sensor);
Code: Select all
timestamp value
2015-01-26 106.10
2015-01-27 53.20
2 row(s) retrieved.
Now that you have learned the basics on how to create an Informix sensor data table based on structured data types so far, let's have a brief look on how to create a JSON based sensor data table.
The following steps are more or less the same like before, except that the required Informix row type looks differently:
Code: Select all
create row type sensor_json_t
(
timestamp datetime year to fraction(5),
values bson
);
In the following SQL script we are creating a new time series container for the JSON time series, a new base table with the TIMESERIES column and a new VTI table to easily access the JSON based sensor data:
Code: Select all
execute procedure TSContainerCreate (
'sensor_json_cont',
'rootdbs',
'sensor_json_t',
2048,
2048);
create table sensor_json_ts (
sensor_id char(40),
sensor_type char(20),
sensor_unit char(6),
sensor_values timeseries(sensor_json_t),
primary key (sensor_id)
) lock mode row;
execute procedure TSCreateVirtualTab (
'sensor_json_data',
'sensor_json_ts',
'origin(2015-01-26 00:00:00.00000),calendar(ts_1min),container(sensor_json_cont),threshold(0),regular',0,'sensor_values');
Code: Select all
insert into sensor_json_data values ("Sensor03", "Temp", "C", "2015-01-26 08:00"::datetime year to minute, '{ "temp1":21.5 }'::json);
insert into sensor_json_data values ("Sensor03", "Temp", "C", "2015-01-26 08:01"::datetime year to minute, '{ "temp1":23.1 }'::json);
insert into sensor_json_data values ("Sensor03", "Temp", "C", "2015-01-27 10:45"::datetime year to minute, '{ "temp1":22.8 }'::json);
insert into sensor_json_data values ("Sensor04", "Temp", "C", "2015-01-26 12:02"::datetime year to minute, '{ "temp1":30.2, "temp2":10.3}'::json);
insert into sensor_json_data values ("Sensor04", "Temp", "C", "2015-01-27 15:46"::datetime year to minute, '{ "temp1":34.0, "temp2":9.7}'::json);
insert into sensor_json_data values ("Sensor04", "Temp", "C", "2015-01-27 15:47"::datetime year to minute, '{ "temp1":33.9, "temp2":9.2}'::json);
Code: Select all
select sensor_id, sensor_type, sensor_unit, timestamp, values::json values from sensor_json_data;
Code: Select all
sensor_id Sensor03
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 08:00:00.00000
values {"temp1":21.500000}
sensor_id Sensor03
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 08:01:00.00000
values {"temp1":23.100000}
sensor_id Sensor03
sensor_type Temp
sensor_unit C
timestamp 2015-01-27 10:45:00.00000
values {"temp1":22.800000}
sensor_id Sensor04
sensor_type Temp
sensor_unit C
timestamp 2015-01-26 12:02:00.00000
values {"temp1":30.200000,"temp2":10.300000}
sensor_id Sensor04
sensor_type Temp
sensor_unit C
timestamp 2015-01-27 15:46:00.00000
values {"temp1":34.000000,"temp2":9.700000}
sensor_id Sensor04
sensor_type Temp
sensor_unit C
timestamp 2015-01-27 15:47:00.00000
values {"temp1":33.900000,"temp2":9.200000}
6 row(s) retrieved.
Code: Select all
SELECT sensor.timestamp::datetime year to hour timestamp, sensor.value temp1 FROM TABLE (
TRANSPOSE ((
SELECT AggregateBy(
'SUM($temp1)',
'ts_1hour',
sensor_values,
0,
'2015-01-26 00:00'::datetime year to minute,
'2015-01-31 23:59'::datetime year to minute)::TimeSeries(sensor_t)
FROM sensor_json_ts
WHERE sensor_id = "Sensor03"
))
) AS TAB (sensor);
Code: Select all
timestamp temp1
2015-01-26 08 44.60
2015-01-27 10 22.80
2 row(s) retrieved.
Code: Select all
create row type sensor_twovals_t
(
timestamp datetime year to fraction(5),
value1 decimal(15,2),
value2 decimal(15,2)
);
Code: Select all
SELECT sensor.timestamp::datetime year to hour timestamp,
sensor.value1 temp1, sensor.value2 temp2 FROM TABLE (
TRANSPOSE ((
SELECT AggregateBy(
'SUM($temp1),SUM($temp2)',
'ts_1hour',
sensor_values,
0,
'2015-01-26 00:00'::datetime year to minute,
'2015-01-31 23:59'::datetime year to minute)::TimeSeries(sensor_twovals_t)
FROM sensor_json_ts
WHERE sensor_id = "Sensor04"
))
) AS TAB (sensor);
Code: Select all
timestamp temp1 temp2
2015-01-26 12 30.20 10.30
2015-01-27 15 67.90 18.90
2 row(s) retrieved.
- Alexander