Here's a useful explanation of the 30 H5N1 strain set
that was generated by Rudi Cilibrasi
that was blogged earlier in the week where Dr. Paul Vitanyi
, a Professor of Computer Science at the University of Amsterdam
, clarifies what exactly Rudi meant by the 'overall S(T) score of 0.990241 when he mentioned -- 'the computer believed it had figured out the structure nearly perfectly.' and states in a comment
left at Global Voices Online
the tree represents in a qualitative sense the relative distances between the virii genomes in the supplied distance matrix almost perfectly. This translates into true relatedness among the virii insofar as the distances have captured it. The used distance is NCD which has previously successfully determined phylogeny trees from the full mtDNA of species like mammals and fungi. So there is reason to believe its output is dependable but it is certainly not infallible. In that sense the NCD distance is comparable to allignment distance, but it is less sensitive to position of substring sequences in the sequence.