Move HMMER related modules, tests, and programs to new distribution.
[bioperl-live.git] / Bio / Tree / Statistics.pm
blob784b782da9dbc74ac37103d68940c43b05fd7a23
2 # BioPerl module for Bio::Tree::Statistics
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Jason Stajich <jason@bioperl.org>
8 # Copyright Jason Stajich
10 # You may distribute this module under the same terms as perl itself
12 # POD documentation - main docs before the code
14 =head1 NAME
16 Bio::Tree::Statistics - Calculate certain statistics for a Tree
18 =head1 SYNOPSIS
20 use Bio::Tree::Statistics;
22 =head1 DESCRIPTION
24 This should be where Tree statistics are calculated. It was
25 previously where statistics from a Coalescent simulation.
27 It now contains several methods for calculating L<Tree-Trait
28 statistics>.
30 =head1 FEEDBACK
32 =head2 Mailing Lists
34 User feedback is an integral part of the evolution of this and other
35 Bioperl modules. Send your comments and suggestions preferably to
36 the Bioperl mailing list. Your participation is much appreciated.
38 bioperl-l@bioperl.org - General discussion
39 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
41 =head2 Support
43 Please direct usage questions or support issues to the mailing list:
45 I<bioperl-l@bioperl.org>
47 rather than to the module maintainer directly. Many experienced and
48 reponsive experts will be able look at the problem and quickly
49 address it. Please include a thorough description of the problem
50 with code and data examples if at all possible.
52 =head2 Reporting Bugs
54 Report bugs to the Bioperl bug tracking system to help us keep track
55 of the bugs and their resolution. Bug reports can be submitted via
56 the web:
58 https://github.com/bioperl/bioperl-live/issues
60 =head1 AUTHOR - Jason Stajich
62 Email jason AT bioperl.org
64 =head1 CONTRIBUTORS
66 Heikki Lehvaslaiho, heikki at bioperl dot org
68 =head1 APPENDIX
70 The rest of the documentation details each of the object methods.
71 Internal methods are usually preceded with a _
73 =cut
76 # Let the code begin...
79 package Bio::Tree::Statistics;
80 use strict;
83 use base qw(Bio::Root::Root);
85 =head2 new
87 Title : new
88 Usage : my $obj = Bio::Tree::Statistics->new();
89 Function: Builds a new Bio::Tree::Statistics object
90 Returns : Bio::Tree::Statistics
91 Args :
93 =head2 assess_bootstrap
95 Title : assess_bootstrap
96 Usage : my $tree_with_bs = $stats->assess_bootstrap(\@bs_trees,$guide_tree);
97 Function: Calculates the bootstrap for internal nodes based on the percentage
98 of times \@bs_trees agree with each internal node
99 Returns : L<Bio::Tree::TreeI>
100 Args : Arrayref of L<Bio::Tree::TreeI>s
101 Guide tree, L<Bio::Tree::TreeI>s
103 =cut
105 sub assess_bootstrap{
106 my ($self,$bs_trees,$guide_tree) = @_;
107 my @consensus;
109 # internal nodes are defined by their children
111 my (%lookup,%internal);
112 my $i = 0;
113 for my $tree ( $guide_tree, @$bs_trees ) {
114 # Do this as a top down approach, can probably be
115 # improved by caching internal node states, but not going
116 # to worry about it right now.
118 my @allnodes = $tree->get_nodes;
119 my @internalnodes = grep { ! $_->is_Leaf } @allnodes;
120 for my $node ( @internalnodes ) {
121 my @tips = sort map { $_->id }
122 grep { $_->is_Leaf() } $node->get_all_Descendents;
123 my $id = "(".join(",", @tips).")";
124 if( $i == 0 ) {
125 $internal{$id} = $node->internal_id;
126 } else {
127 $lookup{$id}++;
130 $i++;
132 my @save;
133 for my $l ( keys %lookup ) {
134 if( defined $internal{$l} ) {#&& $lookup{$l} > $min_seen ) {
135 my $intnode = $guide_tree->find_node(-internal_id => $internal{$l});
136 $intnode->bootstrap(sprintf("%d",100 * $lookup{$l} / $i));
139 return $guide_tree;
143 =head2 cherries
145 Example : cherries($tree, $node);
146 Description: Count number of paired leaf nodes
147 in a binary tree
148 Returns : integer
149 Exceptions :
150 Args : 1. Bio::Tree::TreeI object
151 2. Bio::Tree::NodeI object within the tree, optional
153 Commonly used statistics assume a binary tree, but this methods
154 returns a value even for trees with polytomies.
156 =cut
158 sub cherries ($;$) {
159 my $self = shift;
160 my $tree = shift;
161 my $node = shift || $tree->get_root_node;
163 my $cherries = 0;
164 my @descs = $node->each_Descendent;
166 if ($descs[0]->is_Leaf and $descs[1]->is_Leaf) {
167 if ($descs[3]) { #polytomy at leaf level
168 $cherries = 0;
169 } else {
170 $cherries = 1;
172 } else {
173 # recurse
174 foreach my $desc (@descs) {
175 $cherries += $self->cherries($tree, $desc);
178 return $cherries;
182 =head2 Tree-Trait statistics
184 The following methods produce descriptors of trait distribution among
185 leaf nodes within the trees. They require that a trait has been set
186 for each leaf node. The tag methods of Bio::Tree::Node are used to
187 store them as key/value pairs. In this way, one tree can store more
188 than one trait.
190 Trees have method add_traits() to set trait values from a file. See the
191 add_trait() method in L<Bio::Tree::TreeFunctionsI>.
193 =head2 fitch
195 Example : fitch($tree, $key, $node);
196 Description: Calculates Parsimony Score (PS) and internal trait
197 values using the Fitch 1971 parsimony algorithm for
198 the subtree a defined by the (internal) node.
199 Node defaults to the root.
200 Returns : true on success
201 Exceptions : leaf nodes have to have the trait defined
202 Args : 1. Bio::Tree::TreeI object
203 2. trait name string
204 3. Bio::Tree::NodeI object within the tree, optional
206 Runs first L<fitch_up> that calculates parsimony scores and then
207 L<fitch_down> that should resolve most of the trait/character state
208 ambiguities.
210 Fitch, W.M., 1971. Toward defining the course of evolution: minimal
211 change for a specific tree topology. Syst. Zool. 20, 406-416.
213 You can access calculated parsimony values using:
215 $score = $node->->get_tag_values('ps_score');
217 and the trait value with:
219 $traitvalue = $node->->get_tag_values('ps_trait'); # only the first
220 @traitvalues = $node->->get_tag_values('ps_trait');
222 Note that there can be more that one trait value, especially for the
223 root node.
225 =cut
227 sub fitch {
228 my $self = shift;
229 my $tree = shift;
230 my $key = shift || $self->throw("Trait name is needed");
231 my $node = shift || $tree->get_root_node;
233 $self->fitch_up($tree, $key, $node);
234 $self->fitch_down($tree, $node);
238 =head2 ps
240 Example : ps($tree, $key, $node);
241 Description: Calculates Parsimony Score (PS) from Fitch 1971
242 parsimony algorithm for the subtree as defined
243 by the (internal) node.
244 Node defaults to the root.
245 Returns : integer, 1 < PS < n, where n is number of branches
246 Exceptions : leaf nodes have to have the trait defined
247 Args : 1. Bio::Tree::TreeI object
248 2. trait name string
249 3. Bio::Tree::NodeI object within the tree, optional
251 This is the first half of the Fitch algorithm that is enough for
252 calculating the resolved parsimony values. The trait/chararacter
253 states are commonly left in ambiguous state. To resolve them, run
254 L<fitch_down>.
256 =cut
258 sub ps { shift->fitch_up(@_) }
261 =head2 fitch_up
263 Example : fitch_up($tree, $key, $node);
264 Description: Calculates Parsimony Score (PS) from the Fitch 1971
265 parsimony algorithm for the subtree as defined
266 by the (internal) node.
267 Node defaults to the root.
268 Returns : integer, 1< PS < n, where n is number of branches
269 Exceptions : leaf nodes have to have the trait defined
270 Args : 1. Bio::Tree::TreeI object
271 2. trait name string
272 3. Bio::Tree::NodeI object within the tree, optional
274 This is a more generic name for L<ps> and indicates that it performs
275 the first bottom-up tree traversal that calculates the parsimony score
276 but usually leaves trait/character states ambiguous. If you are
277 interested in internal trait states, running L<fitch_down> should
278 resolve most of the ambiguities.
280 =cut
282 sub fitch_up {
283 my $self = shift;
284 my $tree = shift;
285 my $key = shift || $self->throw("Trait name is needed");
286 my $node = shift || $tree->get_root_node;
288 if ($node->is_Leaf) {
289 $self->throw ("ERROR: ". $node->internal_id. " needs a value for trait $key")
290 unless $node->has_tag($key);
291 $node->set_tag_value('ps_trait', $node->get_tag_values($key) );
292 $node->set_tag_value('ps_score', 0 );
293 return; # end of recursion
296 foreach my $child ($node->each_Descendent) {
297 $self->fitch_up($tree, $key, $child);
300 my %intersection;
301 my %union;
302 my $score;
304 foreach my $child ($node->each_Descendent) {
305 foreach my $trait ($child->get_tag_values('ps_trait') ) {
306 $intersection{$trait}++ if $union{$trait};
307 $union{$trait}++;
309 $score += $child->get_tag_values('ps_score');
312 if (keys %intersection) {
313 $node->set_tag_value('ps_trait', keys %intersection);
314 $node->set_tag_value('ps_score', $score);
315 } else {
316 $node->set_tag_value('ps_trait', keys %union);
317 $node->set_tag_value('ps_score', $score+1);
320 if ($self->verbose) {
321 print "-- node --------------------------\n";
322 print "iID: ", $node->internal_id, " (", $node->id, ")\n";
323 print "Trait: ", join (', ', $node->get_tag_values('ps_trait') ), "\n";
324 print "length :", scalar($node->get_tag_values('ps_score')) , "\n";
326 return scalar $node->get_tag_values('ps_score');
330 =head2 fitch_down
332 Example : fitch_down($tree, $node);
333 Description: Runs the second pass from Fitch 1971
334 parsimony algorithm to resolve ambiguous
335 trait states left by first pass.
336 by the (internal) node.
337 Node defaults to the root.
338 Returns : true
339 Exceptions : dies unless the trait is defined in all nodes
340 Args : 1. Bio::Tree::TreeI object
341 2. Bio::Tree::NodeI object within the tree, optional
343 Before running this method you should have ran L<fitch_up> (alias to
344 L<ps> ). Note that it is not guaranteed that all states are completely
345 resolved.
347 =cut
349 sub fitch_down {
350 my $self = shift;
351 my $tree = shift;
352 my $node = shift || $tree->get_root_node;
354 my $key = 'ps_trait';
355 $self->throw ("ERROR: ". $node->internal_id. " needs a value for $key")
356 unless $node->has_tag($key);
358 my $nodev;
359 foreach my $trait ($node->get_tag_values($key) ) {
360 $nodev->{$trait}++;
363 foreach my $child ($node->each_Descendent) {
364 next if $child->is_Leaf; # end of recursion
366 my $intersection;
367 foreach my $trait ($child->get_tag_values($key) ) {
368 $intersection->{$trait}++ if $nodev->{$trait};
371 $self->fitch_down($tree, $child);
372 $child->set_tag_value($key, keys %$intersection);
374 return 1; # success
378 =head2 persistence
380 Example : persistence($tree, $node);
381 Description: Calculates the persistence
382 for node in the subtree defined by the (internal)
383 node. Node defaults to the root.
384 Returns : int, number of generations trait value has to remain same
385 Exceptions : all the nodes need to have the trait defined
386 Args : 1. Bio::Tree::TreeI object
387 2. Bio::Tree::NodeI object within the tree, optional
389 Persistence measures the stability that the trait value has in a
390 tree. It expresses the number of generations the trait value remains
391 the same. All the decendants of the root in the same generation have
392 to share the same value.
394 Depends on Fitch's parsimony score (PS).
396 =cut
398 sub _persistence {
399 my $self = shift;
400 my $tree = shift;
401 my $node = shift;
402 my $value = shift || $self->throw("Value is needed");
405 my $key = 'ps_trait';
407 $self->throw("Node is needed") unless $node->isa('Bio::Tree::NodeI');
409 return 0 unless $node->get_tag_values($key) eq $value; # wrong value
410 return 1 if $node->is_Leaf; # end of recursion
412 my $persistence = 10000000; # an arbitrarily large number
413 foreach my $child ($node->each_Descendent) {
414 my $pers = $self->_persistence($tree, $child, $value);
415 $persistence = $pers if $pers < $persistence;
417 return $persistence + 1;
420 sub persistence {
421 my $self = shift;
422 my $tree = shift;
423 my $node = shift || $tree->get_root_node;
424 $self->throw("Node is needed") unless $node->isa('Bio::Tree::NodeI');
426 my $key = 'ps_trait';
427 my $value = $node->get_tag_values($key);
429 #calculate
430 my $persistence = $self->_persistence($tree, $node, $value);
431 $node->set_tag_value('persistance', $persistence);
432 return $persistence;
436 =head2 count_subclusters
438 Example : count_clusters($tree, $node);
439 Description: Calculates the number of sub-clusters
440 in the subtree defined by the (internal)
441 node. Node defaults to the root.
442 Returns : int, count
443 Exceptions : all the nodes need to have the trait defined
444 Args : 1. Bio::Tree::TreeI object
445 2. Bio::Tree::NodeI object within the tree, optional
447 Depends on Fitch's parsimony score (PS).
449 =cut
451 sub _count_subclusters {
452 my $self = shift;
453 my $tree = shift;
454 my $node = shift;
455 my $value = shift || $self->throw("Value is needed");
457 my $key = 'ps_trait';
459 $self->throw ("ERROR: ". $node->internal_id. " needs a value for trait $key")
460 unless $node->has_tag($key);
462 if ($node->get_tag_values($key) eq $value) {
463 if ($node->get_tag_values('ps_score') == 0) {
464 return 0;
465 } else {
466 my $count = 0;
467 foreach my $child ($node->each_Descendent) {
468 $count += $self->_count_subclusters($tree, $child, $value);
470 return $count;
473 return 1;
476 sub count_subclusters {
477 my $self = shift;
478 my $tree = shift;
479 my $node = shift || $tree->get_root_node;
480 $self->throw("Node is needed") unless $node->isa('Bio::Tree::NodeI');
482 my $key = 'ps_trait';
483 my $value = $node->get_tag_values($key);
485 return $self->_count_subclusters($tree, $node, $value);
489 =head2 count_leaves
491 Example : count_leaves($tree, $node);
492 Description: Calculates the number of leaves with same trait
493 value as root in the subtree defined by the (internal)
494 node. Requires an unbroken line of identical trait values.
495 Node defaults to the root.
496 Returns : int, number of leaves with this trait value
497 Exceptions : all the nodes need to have the trait defined
498 Args : 1. Bio::Tree::TreeI object
499 2. Bio::Tree::NodeI object within the tree, optional
501 Depends on Fitch's parsimony score (PS).
503 =cut
505 sub _count_leaves {
506 my $self = shift;
507 my $tree = shift;
508 my $node = shift || $tree->get_root_node;
509 my $value = shift;
511 my $key = 'ps_trait';
513 $self->throw ("ERROR: ". $node->internal_id. " needs a value for trait $key")
514 unless $node->has_tag($key);
516 if ($node->get_tag_values($key) eq $value) {
517 #print $node->id, ": ", $node->get_tag_values($key), "\n";
518 return 1 if $node->is_Leaf; # end of recursion
520 my $count = 0;
521 foreach my $child ($node->each_Descendent) {
522 $count += $self->_count_leaves($tree, $child, $value);
524 return $count;
526 return 0;
529 sub count_leaves {
530 my $self = shift;
531 my $tree = shift;
532 my $node = shift || $tree->get_root_node;
533 $self->throw("Node is needed") unless $node->isa('Bio::Tree::NodeI');
535 my $key = 'ps_trait';
536 my $value = $node->get_tag_values($key);
538 return $self->_count_leaves($tree, $node, $value);
542 =head2 phylotype_length
544 Example : phylotype_length($tree, $node);
545 Description: Sums up the branch lengths within phylotype
546 exluding the subclusters where the trait values
547 are different
548 Returns : float, length
549 Exceptions : all the nodes need to have the trait defined
550 Args : 1. Bio::Tree::TreeI object
551 2. Bio::Tree::NodeI object within the tree, optional
553 Depends on Fitch's parsimony score (PS).
555 =cut
557 sub _phylotype_length {
558 my $self = shift;
559 my $tree = shift;
560 my $node = shift;
561 my $value = shift;
563 my $key = 'ps_trait';
565 $self->throw ("ERROR: ". $node->internal_id. " needs a value for trait $key")
566 unless $node->has_tag($key);
568 return 0 if $node->get_tag_values($key) ne $value;
569 return $node->branch_length if $node->is_Leaf; # end of recursion
571 my $length = 0;
572 foreach my $child ($node->each_Descendent) {
573 my $sub_len = $self->_phylotype_length($tree, $child, $value);
574 $length += $sub_len;
575 $length += $child->branch_length if not $child->is_Leaf and $sub_len;
577 return $length;
580 sub phylotype_length {
581 my $self = shift;
582 my $tree = shift;
583 my $node = shift || $tree->get_root_node;
585 my $key = 'ps_trait';
586 my $value = $node->get_tag_values($key);
588 return $self->_phylotype_length($tree, $node, $value);
592 =head2 sum_of_leaf_distances
594 Example : sum_of_leaf_distances($tree, $node);
595 Description: Sums up the branch lengths from root to leaf
596 exluding the subclusters where the trait values
597 are different
598 Returns : float, length
599 Exceptions : all the nodes need to have the trait defined
600 Args : 1. Bio::Tree::TreeI object
601 2. Bio::Tree::NodeI object within the tree, optional
603 Depends on Fitch's parsimony score (PS).
605 =cut
607 sub _sum_of_leaf_distances {
608 my $self = shift;
609 my $tree = shift;
610 my $node = shift;
611 my $value = shift;
613 my $key = 'ps_trait';
615 $self->throw ("ERROR: ". $node->internal_id. " needs a value for trait $key")
616 unless $node->has_tag($key);
617 return 0 if $node->get_tag_values($key) ne $value;
618 #return $node->branch_length if $node->is_Leaf; # end of recursion
619 return 0 if $node->is_Leaf; # end of recursion
621 my $length = 0;
622 foreach my $child ($node->each_Descendent) {
623 $length += $self->_count_leaves($tree, $child, $value) * $child->branch_length +
624 $self->_sum_of_leaf_distances($tree, $child, $value);
626 return $length;
629 sub sum_of_leaf_distances {
630 my $self = shift;
631 my $tree = shift;
632 my $node = shift || $tree->get_root_node;
634 my $key = 'ps_trait';
635 my $value = $node->get_tag_values($key);
637 return $self->_sum_of_leaf_distances($tree, $node, $value);
641 =head2 genetic_diversity
643 Example : genetic_diversity($tree, $node);
644 Description: Diversity is the sum of root to leaf distances
645 within the phylotype normalised by number of leaf
646 nodes
647 Returns : float, value of genetic diversity
648 Exceptions : all the nodes need to have the trait defined
649 Args : 1. Bio::Tree::TreeI object
650 2. Bio::Tree::NodeI object within the tree, optional
652 Depends on Fitch's parsimony score (PS).
654 =cut
656 sub genetic_diversity {
657 my $self = shift;
658 my $tree = shift;
659 my $node = shift || $tree->get_root_node;
661 return $self->sum_of_leaf_distances($tree, $node) /
662 $self->count_leaves($tree, $node);
666 =head2 statratio
668 Example : statratio($tree, $node);
669 Description: Ratio of the stem length and the genetic diversity of the
670 phylotype L<genetic_diversity>
671 Returns : float, separation score
672 Exceptions : all the nodes need to have the trait defined
673 Args : 1. Bio::Tree::TreeI object
674 2. Bio::Tree::NodeI object within the tree, optional
676 Statratio gives a measure of separation and variability within the phylotype.
677 Larger values identify more rapidly evolving and recent phylotypes.
679 Depends on Fitch's parsimony score (PS).
681 =cut
683 sub statratio {
684 my $self = shift;
685 my $tree = shift;
686 my $node = shift || $tree->get_root_node;
688 my $div = $self->genetic_diversity($tree, $node);
689 return 0 if $div == 0;
690 return $node->branch_length / $div;
695 =head2 ai
697 Example : ai($tree, $key, $node);
698 Description: Calculates the Association Index (AI) of Whang et
699 al. 2001 for the subtree defined by the (internal)
700 node. Node defaults to the root.
701 Returns : real
702 Exceptions : leaf nodes have to have the trait defined
703 Args : 1. Bio::Tree::TreeI object
704 2. trait name string
705 3. Bio::Tree::NodeI object within the tree, optional
707 Association index (AI) gives a more fine grained results than PS since
708 the result is a real number. ~0 E<lt>= AI.
710 Wang, T.H., Donaldson, Y.K., Brettle, R.P., Bell, J.E., Simmonds, P.,
711 2001. Identification of shared populations of human immunodeficiency
712 Virus Type 1 infecting microglia and tissue macrophages outside the
713 central nervous system. J. Virol. 75 (23), 11686-11699.
715 =cut
717 sub _node_ai {
718 my $self = shift;
719 my $node = shift;
720 my $key = shift;
722 my $traits;
723 my $leaf_count = 0;
724 for my $desc ( $node->get_all_Descendents ) {
725 next unless $desc->is_Leaf;
726 $leaf_count++;
727 $self->throw ("Node ". $desc->id. " needs a value for trait [$key]")
728 unless $desc->has_tag($key);
729 my $trait = $desc->get_tag_values($key);
730 $traits->{$trait}++;
732 my $most_common = 0;
733 foreach ( keys %$traits) {
734 $most_common = $traits->{$_} if $traits->{$_} > $most_common;
736 return sprintf "%1.6f", (1 - ($most_common/$leaf_count) ) / (2**($leaf_count-1) );
739 sub ai {
740 my $self = shift;
741 my $tree = shift;
742 my $key = shift || $self->throw("Trait name is needed");
743 my $start_node = shift || $tree->get_root_node;
744 return unless $start_node;
746 my $sum = 0;
747 for my $node ( $start_node->get_all_Descendents ) {
748 next if $node->is_Leaf;
749 $sum += $self->_node_ai($node, $key);
751 return $sum;
755 =head2 mc
757 Example : mc($tree, $key, $node);
758 Description: Calculates the Monophyletic Clade (MC) size statistics
759 for the subtree a defined by the (internal) node.
760 Node defaults to the root;
761 Returns : hashref with trait values as keys
762 Exceptions : leaf nodes have to have the trait defined
763 Args : 1. Bio::Tree::TreeI object
764 2. trait name string
765 3. Bio::Tree::NodeI object within the tree, optional
767 Monophyletic Clade (MC) size statistics by Salemi at al 2005. It is
768 calculated for each trait value. 1 E<lt>= MC E<lt>= nx, where nx is the
769 number of tips with value x:
771 pick the internal node with maximim value for
772 number of of tips with only trait x
774 MC was defined by Parker et al 2008.
776 Salemi, M., Lamers, S.L., Yu, S., de Oliveira, T., Fitch, W.M., McGrath, M.S.,
777 2005. Phylodynamic analysis of Human Immunodeficiency Virus Type 1 in
778 distinct brain compartments provides a model for the neuropathogenesis of
779 AIDS. J. Virol. 79 (17), 11343-11352.
781 Parker, J., Rambaut A., Pybus O., 2008. Correlating viral phenotypes
782 with phylogeny: Accounting for phylogenetic uncertainty Infection,
783 Genetics and Evolution 8 (2008), 239-246.
785 =cut
787 sub _node_mc {
788 my $self = shift;
789 my $node = shift;
790 my $key = shift;
792 my $traits;
793 my $leaf_count = 0;
794 for my $node2 ( $node->get_all_Descendents ) {
795 next unless $node2->is_Leaf;
796 $leaf_count++;
797 my $trait = $node2->get_tag_values($key);
798 $traits->{$trait}++;
800 return $traits;
803 sub mc {
804 my $self = shift;
805 my $tree = shift;
806 my $key = shift || die "Trait name is needed";
807 my $start_node = shift || $tree->get_root_node;
808 return unless $start_node;
810 my $sum; # hashref, keys are trait values
811 my $keys; # hashref, keys are trait values
812 foreach my $node ( $start_node->get_all_Descendents ) {
813 next if $node->is_Leaf;
814 my $traits = $self->_node_mc($node, $key);
815 if (scalar keys %$traits == 1) {
816 my ($value) = keys %$traits;
817 no warnings;
818 $sum->{$value} = $traits->{$value}
819 if $sum->{$value} < $traits->{$value};
820 } else {
821 map { $keys->{$_} = 1 } keys %$traits;
824 # check for cases where there are no clusters
825 foreach my $value (keys %$keys) {
826 $sum->{$value} = 1 unless defined $sum->{$value};
828 return $sum;