The first phylogeny software for Rasbperry Pi?


11 posts
by DanielBarker » Tue May 29, 2012 8:02 am
We are happy to announce that LVB 2.31 for Raspberry Pi was released yesterday. Download it here:

http://biology.st-andrews.ac.uk/cegg/lvb.aspx

LVB reconstructs phylogeny from a nucleotide multiple alignment, using parsimony and simulated annealing.

Daniel Barker
http://biology.st-andrews.ac.uk/staff/db60
Posts: 53
Joined: Tue May 29, 2012 7:53 am
by DanielBarker » Tue May 29, 2012 2:53 pm
To answer my own question: LVB is one of the first pieces of phylogeny software for Raspberry Pi, but not the very first. E.g. there is a Debian squeeze phylip package.

Daniel Barker
http://biology.st-andrews.ac.uk/staff/db60
Posts: 53
Joined: Tue May 29, 2012 7:53 am
by AndrewS » Tue May 29, 2012 3:14 pm
User avatar
Posts: 3626
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
by nick.mccloud » Tue May 29, 2012 4:46 pm
Can you tell us what it is/does.

And type slowly, we're not bad at computers but can't vouch for us keeping up with anything else!
User avatar
Posts: 795
Joined: Sat Feb 04, 2012 4:18 pm
by DanielBarker » Tue May 29, 2012 5:48 pm
nick.mccloud wrote:Can you tell us what it is/does.

And type slowly, we're not bad at computers but can't vouch for us keeping up with anything else!


Given a bunch of DNA sequences and some pre-processing, LVB gives you an idea how they are related. If you give it one sequence per species, it gives you an idea of how the species are related. LVB combines an optimality criterion that's rapid to evaluate and a subtle heuristic search - with the intention of working fast, and reasonably well, with large input.

For a really huge study you'd still want rapid hardware. But plenty of studies would work fast enough on the Raspberry Pi.

Come over to the biological side! Keen readers please see:

Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology, 20, 406-416.

Kirkpatrick S, Gelatt CD, Vecci MP (1983) Optimization by simulated annealing. Science, 220, 671-680.

White WTJ, Holland BR (2010) Faster exact maximum parsimony search with XMP. Bioinformatics, 27, 1359-1367. [Presents very clever bitwise operations not yet in LVB.]
Posts: 53
Joined: Tue May 29, 2012 7:53 am
by rasbeer » Tue May 29, 2012 8:20 pm
Attempting to put this into words of one syllable after googling some of these terms...

So it's a way of deriving a phylogenetic tree from a bunch of DNA sequences (aka a ' a nucleotide multiple alignment')?

& simulated annealing is a kind of local search that helps figure out how close pairs of sequences are by making small random changes to them (much like genetic mutations) until one observed sequence changes into another?

& parsimony is assuming the chain of mutations linking two sequences is as short as possible?

As a matter of interest, why did you do a raspberry pi version?
Posts: 242
Joined: Wed Mar 07, 2012 8:35 am
by DanielBarker » Wed May 30, 2012 10:55 am
So it's a way of deriving a phylogenetic tree from a bunch of DNA sequences (aka a ' a nucleotide multiple alignment')?


Yes.

& simulated annealing is a kind of local search that helps figure out how close pairs of sequences are by making small random changes to them (much like genetic mutations) until one observed sequence changes into another?


A hypothetical phylogenetic tree is evaluated by some quantity. For parsimony, this quantity is the smallest number of mutations required, for the particular tree to lead to the particular sequences we have. This is known as the 'length' of the tree. For a particular tree, exact length is easily discovered e.g. by the Fitch algorithm. One can do this by hand for small cases and it's certainly no problem for a computer.

It is the trees which are subject to a local search. The number of tree topologies increases factorially with the number of sequences in the study. So typically, there are more hypothetical trees than atoms in the universe and it's not feasible to calculate the length of each of them (even though each individual length calculation is rapid). The local search aims to find the one hypothetical tree whose length is minimal, compared to the other hypothetical trees, in a feasible amount of CPU time. Of course, we don't know if it really gets there.

LVB's approach is simulated annealing. Begin with a random tree topology; perturb it; if this decreases tree length, accept the change; if it increases tree length, accept the change with a probability that decreases as the search progresses (and decreases with the magnitude of the change in length). Finally, to make sure it's at least found a local optimum, LVB does a round of hill-climbing, in which the 'neighbourhood' of the current tree is thoroughly explored and changes increasing length are never accepted.

As a matter of interest, why did you do a raspberry pi version?


I read online somewhere that the Raspberry Pi does for computers what biros (or Bíró?) did for pens. I agree!

Daniel Barker
Posts: 53
Joined: Tue May 29, 2012 7:53 am
by rasbeer » Wed May 30, 2012 11:11 am
Thanks - I've seen phylogenetic trees now and wondered how they were derived, and that's a shorter clearer explanation than I've previously come across. I may even have understood it! ;-)
Posts: 242
Joined: Wed Mar 07, 2012 8:35 am
by rew » Tue Jun 12, 2012 6:51 am
Let me start by saying that I still don't understand the term "phylogenetic tree".

But I was close to the labs/people who did "simulated annealing" for the first time. I can (try to) explain the basics.

You have a complex optimisation problem. Say for instance a salesman has to travel along a bunch of cities, what is the shortest route to do this? If there are say only4 cities you can do this by trying all possibilties. Call the cities ABC and D. First try all routes that start with A. ABCD, ABDC, ACBD, ACDB, ADBC, ADCB. There are six more that start with B, and 12 more that start with C or D. You can do this by hand calculate all the route lengths and pick the best route. But the total number of possible routes becomes enormous as the number of cities grows. 4 cities you dan do by hand in 15 minutes. 10 you can do by computer. 20 you'll have to get some smarts in the computer program because it would otherwise take too long.

So you will have to find a trick not to have to try ALL possibilities. One of the tricks is to start with just a random arrangement, and then see if you can improve on it. For the travelling salesman problem take one city out of the current route, and try if the route is shorter if you put it in another place in the route. So if you have ABC....DE..., try AC....DBE...., if it becomes shorter, you continue with this shorter route. This is called hill-climbing. You improve on the solution until you find a solution that cannot be improved with any simple modification.

It turns out that with this strategy you will not always find the best solution: It is quite possible that ALL simple modifications of a solution lead to a worse solution, while in fact two or more modifications would lead to an even better solution.

To overcome this, there are several options. One thing you can try is to start with several different random initial solutions and see what this algorithm comes up with. Chose the best option. This works slightly better.

What, when properly implemented, works a LOT better is simulated annealing. Instead of rejecting ALL solutions that are worse, and accepting ALL solutions that are better, you make it a random process: worse solutions are accepted with a smaller chance than better solutions. In the beginning you start out with the chances reasonably equal. It doesn't make much difference if a solution is worse or better. But as the optimisation continues, you start changing the chances. The further you go along, the chances of a better solution to be accepted get better. And the chances of a worse solution to be accepted get worse. In the end you get a 100/0 chances distribution and you are left with a standard "hill climbing" step.

THAT my friends is simulated annealing.
Check out our raspberry pi addons: http://www.bitwizard.nl/catalog/
User avatar
Posts: 396
Joined: Fri Aug 26, 2011 3:25 pm
by rasbeer » Tue Jun 12, 2012 10:01 am
rew wrote:What, when properly implemented, works a LOT better is simulated annealing. Instead of rejecting ALL solutions that are worse, and accepting ALL solutions that are better, you make it a random process: worse solutions are accepted with a smaller chance than better solutions. In the beginning you start out with the chances reasonably equal. It doesn't make much difference if a solution is worse or better. But as the optimisation continues, you start changing the chances. The further you go along, the chances of a better solution to be accepted get better. And the chances of a worse solution to be accepted get worse. In the end you get a 100/0 chances distribution and you are left with a standard "hill climbing" step.

That's also a nice clear explanation - thanks. :D
Seems like evolution might be quite similar to simulated annealing. (But building a phylogenetic tree isn't modelling evolution, despite that superficial resemblance of the algorithms ; the selection criteria are quite different.)

My gonzo understanding of a phylogenetic tree is 'a way of organising genetic sequences that shows how the organisms they come from might have evolved into one another'.
Posts: 242
Joined: Wed Mar 07, 2012 8:35 am
by DanielBarker » Wed Dec 04, 2013 4:26 pm
We are happy to announce the release of LVB 3.0 Beta. Download it here:

http://eggg.st-andrews.ac.uk/lvb

LVB reconstructs phylogeny from a nucleotide multiple alignment.

With this release, the ready-to-run Raspberry Pi version is now for Raspbian (armhf).

There are also major algorithmic improvements which benefit all users.

[ In answer to the subject: not quite. The honour should go to PHYLIP and perhaps other phylogeny software, available as Debian packages for Raspberry Pi at a very early stage. viewtopic.php?p=86636#p86636 .]

Daniel Barker
http://biology.st-andrews.ac.uk/staff/db60
Posts: 53
Joined: Tue May 29, 2012 7:53 am