DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

4273pi: Bioinformatics education on low cost ARM hardware

Mon Aug 12, 2013 10:27 am

I would like to announce our paper on 4273π:

Barker, D., Ferrier, D.E.K., Holland, P.W.H., Mitchell, J.B.O., Plaisier, H., Ritchie, M.G. and Smart, S.D. (2013) 4273π: bioinformatics education on low cost ARM hardware. BMC Bioinformatics 14:243. http://dx.doi.org/10.1186/1471-2105-14-243

4273π is for those wishing to teach, learn or use bioinformatics (computational biology) on the Raspberry Pi. 4273π includes 4273π Bioinformatics for Biologists, an Open Access bioinformatics course.

This is, perhaps, the first paper about the Raspberry Pi in the peer-reviewed life sciences literature. (If I'm wrong about this, please let me know!)

There is an associated question-and-answer article in BMC's Biome magazine: http://bit.ly/14GILtg

Best regards,

Daniel Barker
http://biology.st-andrews.ac.uk/staff/db60

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 22, 2013 10:57 am

Daniel:

I certainly know of no other scientific paper that mentions the Raspberry Pi.

Yours has certainly rekindled my interest in cell biology and Perl!

However, it would be good to point out that the downloaded image file very nearly fills a true 32 Gbyte SD card. Some nominal 32 Gbyte cards deliver only about 30 Gbytes when formatted and cannot hold the image.

Regards,
Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 22, 2013 11:28 am

Dear Alan,

Thank you for your interest in 4273pi, and for bringing up the SD card size issue. I have just now edited the 4273pi Web site, making the SD card size requirement more prominent.

If you know specific SD cards which do not work, this would be extremely useful to know. Please reply either here or by email to me at: db60@st-andrews.ac.uk

Thanks again,

Daniel Barker

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 22, 2013 12:15 pm

Daniel,

The "Intenso" brand SDHC card, nominally 32 GB, that I bought from Maplin last week formats to just under 30 GB.

I don't think that the formatted size is manufacturer-related. The shortfall is probably within the manufacturing spread of each card. (If you were using the card for its intended purpose in a video camera, you probably wouldn't notice.)

Could you trim the image file a wee bit?

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 22, 2013 12:35 pm

Dear Alan,

I've added this to the list of known problems in 4273pi:

http://eggg.st-andrews.ac.uk/files/2013 ... 4273pi.txt

It will be fixed in a later release.

I have also ordered an Intenso SD card so I can test this myself.

I think your options are: wait for the fix; use another SD card; or (more painful) go through the 'work instruction' to create something similar to 4273pi yourself.

Thank you again for your help.

Daniel Barker

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Fri Aug 23, 2013 8:56 am

Glad it's now flagged as a known problem!

Regarding your options:

1) Trying different cards would be an expensive option, if it's pot luck seeing what size each formats to!

2) I don't mind waiting for a fix, since I have a lot of catching up in bioinformatics to do.

3) I've only skimmed your "work instructions". Is it possible to go through them without having access to the University's network?

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Mon Aug 26, 2013 2:21 pm

Daniel:

Success!

I followed your "Work Instruction" (more or less), but downloaded the course material directly to my freshly-formatted 32 GB SD Card.

See the attached fragment of output from blastall.

Code: Select all

ENSLAFP00000000422	gi|62899704|sp|Q5NU32.1|AOFA_HORSE	90.93	518	47	0	1	518	5	522	0.0	 931
ENSLAFP00000000422	gi|13878320|sp|P58027.1|AOFA_CANFA	89.00	518	57	0	1	518	5	522	0.0	 917
Curiously, file manager tells me that only about 50% of the SD card is used!

I'm going to continue with the course materials - after backing up my SD card, of course!

Regards,
Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

inder
Posts: 23
Joined: Wed Aug 22, 2012 7:44 pm

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Mon Aug 26, 2013 5:04 pm

This card also comes in at 29.2 gb and thus not usable. Another user in waiting line for the slimmed down update.
Sony 32 GB Secure Digital High Capacity (SDHC/SDXC)- Class 10 /UHS-I 40 MBps Read - SF32UY/TQMN

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Tue Aug 27, 2013 10:05 am

inder:

Why don't you create your own image by following Daniel's "Work Instruction", as I did. It fits easily on to a crippled 32 GB card.

With a freshly-installed copy of wheezy-rasbian, start at "Obtain BLAST database" at the foot of page 3 of the Instruction.

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

inder
Posts: 23
Joined: Wed Aug 22, 2012 7:44 pm

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Tue Aug 27, 2013 6:31 pm

@jardino: Thanks Alan. The idea had occurred to me; however the output from blastall that you showed seemed to indicate that it could be quite complicated. I shall take a look and see if I can figure it out. If not, I might be back asking for detailed instructions.

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Wed Aug 28, 2013 9:54 am

inder:

It's not really difficult, but does take a long time. Much of the apparent complexity is due to Daniel's need to connect to the University's network, which the home user doesn't need.

I'll put some details up later, but have some other work to do first.

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 29, 2013 3:51 pm

I spent many hours today re-building my 32 GB card from scratch - while carefully documenting the steps - only to have script 6 fail after 2 hours 12 minutes because of no space on disk! :cry:

Further analysis is needed.

Fortunately, I'd backed up the previous version.

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 17887
Joined: Sat Jul 30, 2011 7:41 pm

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 29, 2013 4:09 pm

Just out of interested, what takes up 32GB? Lots of video? That's a hell of a lot of data....!
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Aug 29, 2013 8:37 pm

James:

No video - just "big data".

It is data about biological cells, proteins, amino acids and genomes - human and other. The developing science of bioinformatics analyses this data for medical and other purposes.

The University of St. Andrews is using RPis rather than traditional computer networks for their bioinformatics course (see original post).

While you're in here, I have an RPI question:

Is there any possible way to use raspi-config (or something similar) to expand the root file system to some proportion of the size of the SD card?

Our problem herein is that the dataset we need seems to be bigger than 16 GB, but somewhat less than 30 GB (I've not analysed it in detail yet). Some nominal 32 GB cards seem to format to as little as 28 or 29 GB, so an image created on a "big 'un" won't fit onto a "little 'un".

So it would be good if the root file system on a new installation on a 32 GB card could be expanded to, say, 25 GB - or whatever the dataset needs, plus a margin.

Any help offered would be appreciated.

(By the way, I'm not associated with the University - I just like tasking RPis!)

Cheers,
Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

inder
Posts: 23
Joined: Wed Aug 22, 2012 7:44 pm

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Tue Sep 03, 2013 8:29 pm

Alan, Wanted to thank you for the effort you're putting into this. Let me know if I can help in any feeble way.

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Wed Sep 04, 2013 7:21 am

Thanks, inder.

I've been away from the problem for a few days but hope to get back to it tomorrow (5th).

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Sep 05, 2013 3:27 pm

A big use of space in 4273pi is the entire publicly available protein sequence database ('nr' from the NCBI, in BLAST format).

Does anyone actually want the entire protein sequence database in 4273pi?

It's fun, and mildly amazing, that it fits on an SD card at all. In practice, due to its size, it really takes too long to search on the Pi. The download also gets larger as more sequencing gets done. (SwissProt - a high-quality subset of the protein sequence database - will remain in 4273pi and is no problem.)

I've ordered the 'problematic' Sony SD card inder mentioned. I will test the next release of 4273pi on that; on the SanDisk card which we use extensively and is known to work (http://eggg.st-andrews.ac.uk/files/2013 ... 4273pi.pdf); and the 'problematic' Intenso SD card. I put 'problematic' in quotes because I'm sure these are fine SD cards in general.

Finally, the work instruction ... I'm glad this is proving some use.

- As Alan said, it includes some hoops so I can use a static IP suited to my part of the University of St Andrews, yet set the card up to also work with DHCP. If you only want DHCP, the static IP stuff is not useful or important.

- I would suggest to use a large swapfile (script_3.sh). Some sources suggest this can eventually damage the SD card, but I'm not sure this is likely. More important, it's not nice to run out of address space.

- The entire protein sequence database gets uncompressed by script_6.sh. If it's now too big, try commenting-out ir deleting this part:

Code: Select all

echo Verifying and untarring nr BLAST database ...
echo
cd /home/pi/4273pi/blastdb || exit 1
for i in nr.*.gz
do
        md5sum $i > $tmp || exit 1
        cmp ${i}.md5 $tmp || exit 1
        rm ${i}.md5 || exit 1
        tar xzf $i || exit 1
        rm $i || exit 1
        echo $i - OK
done
echo ... OK
Thank you,

Daniel Barker
http://biology.st-andrews.ac.uk/staff/db60

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Fri Sep 06, 2013 9:20 am

Here is where I've got to so far:

1) My first attempt at using your work instruction, when I thought I'd only filled about half of the card, was an erroneous one - in that I hadn't let script_6 run to completion.

2) My second attempt, using my modifications to your work instruction (which I've documented), failed when the card became full as script_6 was trying to unpack nr.10.

3) I now propose to remove nr.08 through nr.12 (say) from the card before running script_6.

Which brings me to the question: how many of the nr data sets are needed to complete your course?

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Fri Sep 06, 2013 9:58 am

Dear Alan,

The NCBI's 'nr' BLAST database is split into several physical files to make it more manageable. But it is still one single BLAST database, as far as the user is concerned. When searching 'nr', BLAST 'knows' to look in all of these files.

I don't think there's any biological rationale behind the assignment of sequences to the component files. Omitting some of the files is only OK if you never use the 'nr' database. It doesn't make sense to omit some files if you do use it. An arbitrary set of sequences will be unavailable and no warning will be issued. To avoid confusion, it might be safer to omit the 'nr' database entirely. I think if you leave all the 'nr' files in gzipped form, then I think BLAST will not use them. (This would be worth checking. If blastall doesn't find 'nr', it should issue an error for any search with '-d nr'.)

For the 4273 Bioinformatics for Biologists course as it currently stands, all of the 'nr' files are required, just because students are asked to search 'nr' in an 'own-time' exercise in the Week 1 practical (practical_linux_perl.pdf, p. 5). However, this task isn't very central to the course and could be omitted. This would be my suggestion.

An alternative, but more difficult and time-consuming exercise would be to install 'nr' on a USB stick before performing this search. This would involve unzipping the 'nr' files into a directory on the USB stick and making sure BLAST can find them, for example by modifying the BLASTDB environment variable.

Best wishes,

Daniel Barker

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Mon Sep 16, 2013 6:30 pm

Dear Alan and inder,

The SD card size problems should be fixed in 4273pi version 1.1:

http://eggg.st-andrews.ac.uk/4273pi

I have tried this release of 4273pi with the Intenso and Sony SD cards you mentioned, and it worked fine.

It no longer includes the entire publicly available protein sequence database. (So far as I know, no-one was using this.)

Any problems let me know.

Thank you for your help,

Daniel Barker db60@st-andrews.ac.uk

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Tue Sep 17, 2013 9:05 am

Daniel:

That's good news!

I'll check it out later.

Using the "Work Instruction" that I adapted from yours, I managed to get eight of the BLAST modules, together with SwissProt, on to my card with 5 GB to spare.

Cheers,
Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

jardino
Posts: 87
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Sep 19, 2013 2:21 pm

Works for me! And my card has nearly 20 GB to spare.

Nice home screen, too.

Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Thu Sep 18, 2014 5:21 pm

For anyone interested in teaching and learning computational biology on the Raspberry Pi: we are delighted to announce release 1.2 of 4273pi. Please download it at:

http://eggg.st-andrews.ac.uk/4273pi

New in this version:

- The Open Access course, 4273pi Bioinformatics for Biologists, is now arranged into separate 'components'. This makes it far easier to create your own short course or integrate components with other teaching material.

- Latest Raspbian, Swissprot protein database, lectures and practical classes.

4273pi Bioinformatics for Biologists is based on undergraduate-level teaching material at the University of St Andrews. We have also used two components in extremely successful events with schools: see http://synergy.st-andrews.ac.uk/biooutr ... graston-pi and http://synergy.st-andrews.ac.uk/biooutr ... academy-pi .

Thank you,

Daniel Barker http://biology.st-andrews.ac.uk/staff/db60

yohananw
Posts: 1
Joined: Wed Jul 08, 2015 2:43 pm

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Wed Jul 08, 2015 2:45 pm

>" The current version is 4273π version 1.31, released on 19 June 2015."
http://eggg.st-andrews.ac.uk/4273pi/

DanielBarker
Posts: 72
Joined: Tue May 29, 2012 7:53 am

Re: 4273pi: Bioinformatics education on low cost ARM hardwar

Mon Oct 19, 2015 10:27 am

The 4273pi Web site may be temporarily unavailable whilst it moves to a new domain. The new site is:

http://4273pi.org

By this time tomorrow (20 October 2015), it will work for everyone. It already works in some places.

Apologies for any inconvenience.

Daniel Barker <db60@st-andrews.ac.uk>

Return to “General discussion”

Who is online

Users browsing this forum: asavah and 49 guests