4 SGN::Genefamily - a class to deal with (currently disk-based) genefamilies for tomato annotation purposes
8 The genefamilies are defined by alignment files in a subdirectory. Thus it is easy to update the family definitions, which will happen frequently over the next two months. Then the gene families will be moved to the database. So this code is only very temporary.
12 Lukas Mueller <lam87@cornell.edu>
16 Methods in this class include:
20 package SGN
::Genefamily
;
23 use namespace
::autoclean
;
24 use File
::Slurp qw
/slurp/;
25 use File
::Spec
::Functions
;
26 use File
::Basename qw
/basename/;
28 =head2 accessors name()
31 Property: the name of the gene family
32 Side Effects: will be used to map to the corresponding file name
44 Usage: my @members = $gf->members()
45 Desc: retrieves the members of a genefamily. Read only.
46 Property: the members of the gene family
52 has
'members' => ( is
=> 'ro', isa
=> 'ArrayRef', default => sub { [] } );
56 Usage: my $dir = $gf->files_dir()
57 Desc: sets the directory where the genefamilies are located.
59 Side Effects: used for retrieving gene family information
71 Usage: my $d = $gf->dataset()
72 Desc: under the genefamily dir (files_dir), a number of sub-dirs
73 should be present, each of which represents a separate
74 gene family clustering (for example, based on different
75 species or different clustering parameters).
76 Property: the dataset name [string]
89 Usage: my $alignment = $gf->get_alignment()
90 Desc: returns the alignment as a string
92 Side Effects: dies if the alignment has not yet been calculated.
100 catfile
( $self->get_path(), "alignments", $self->name() . ".fa.align" );
103 die "No alignment file available for family " . $self->name();
111 Usage: my $fasta = $gf->get_fasta()
112 Desc: returns the sequences of a gene family as a string
116 Side Effects: dies if the fasta is not available.
123 my $file = catfile
( $self->get_path(), "fasta", $self->name() . ".fa" );
125 die "The fasta information for family "
127 . " cannot be found";
134 Usage: my $fasta = $gf->get_seqs()
135 Desc: returns the sequences of a gene family as a list of
139 Side Effects: dies if the fasta information is not available.
146 my $file = catfile
( $self->get_path(), "fasta", $self->name() . ".fa" );
148 die "The fasta information for family "
150 . " cannot be found";
153 my $io = Bio
::SeqIO
->new( -format
=> 'fasta', -file
=> $file );
154 while ( my $seq = $io->next_seq() ) {
174 catfile
( $self->get_path(), "/trees/" . $self->name() . ".tree" );
176 die "The tree information for family "
178 . " cannot be found";
188 =head2 get_available_datasets
190 Usage: my @ds = SGN::Genefamily->get_available_datasets($DIR)
191 Desc: a class function that returns the available datasets
192 Ret: a list of dataset names
193 Args: the $DIR where the datasets are located.
199 sub get_available_datasets
{
202 my @dirs = map { basename
($_) } grep -d
, glob "$path/*";
208 return catfile
( $self->files_dir(), $self->dataset() );