Bio::DB::Universal: move into its own distribution
[bioperl-live.git] / Bio / Variation / VariantI.pm
blobde8c65651c57a371b8fdb9c59e99be5345040d3b
2 # BioPerl module for Bio::Variation::VariantI
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Heikki Lehvaslaiho <heikki-at-bioperl-dot-org>
8 # Copyright Heikki Lehvaslaiho
10 # You may distribute this module under the same terms as perl itself
12 # POD documentation - main docs before the code
14 =head1 NAME
16 Bio::Variation::VariantI - Sequence Change SeqFeature abstract class
18 =head1 SYNOPSIS
20 #get Bio::Variant::VariantI somehow
21 print $var->restriction_changes, "\n";
22 foreach $allele ($var->each_Allele) {
23 #work on Bio::Variation::Allele objects
26 =head1 DESCRIPTION
28 This superclass defines common methods to basic sequence changes. The
29 instantiable classes Bio::Variation::DNAMutation,
30 Bio::Variation::RNAChange and Bio::Variation::AAChange use them.
31 See L<Bio::Variation::DNAMutation>, L<Bio::Variation::RNAChange>,
32 and L<Bio::Variation::AAChange> for more information.
34 These classes store information, heavy computation to determine allele
35 sequences is done elsewhere.
37 The database cross-references are implemented as
38 Bio::Annotation::DBLink objects. The methods to access them are
39 defined in Bio::DBLinkContainerI. See L<Bio::Annotation::DBLink>
40 and L<Bio::DBLinkContainerI> for details.
42 Bio::Variation::VariantI redifines and extends
43 Bio::SeqFeature::Generic for sequence variations. This class
44 describes specific sequence change events. These events are always
45 from a specific reference sequence to something different. See
46 L<Bio::SeqFeature::Generic> for more information.
48 IMPORTANT: The notion of reference sequence permeates all
49 Bio::Variation classes. This is especially important to remember when
50 dealing with Alleles. In a polymorphic site, there can be a large
51 number of alleles. One of then has to be selected to be the reference
52 allele (allele_ori). ALL the rest has to be passed to the Variant
53 using the method add_Allele, including the mutated allele in a
54 canonical mutation. The IO modules and generated attributes depend on
55 it. They ignore the allele linked to using allele_mut and circulate
56 each Allele returned by each_Allele into allele_mut and calculate
57 the changes between that and allele_ori.
60 =head1 FEEDBACK
62 =head2 Mailing Lists
64 User feedback is an integral part of the evolution of this and other
65 Bioperl modules. Send your comments and suggestions preferably to the
66 Bioperl mailing lists Your participation is much appreciated.
68 bioperl-l@bioperl.org - General discussion
69 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
71 =head2 Support
73 Please direct usage questions or support issues to the mailing list:
75 I<bioperl-l@bioperl.org>
77 rather than to the module maintainer directly. Many experienced and
78 reponsive experts will be able look at the problem and quickly
79 address it. Please include a thorough description of the problem
80 with code and data examples if at all possible.
82 =head2 Reporting Bugs
84 Report bugs to the Bioperl bug tracking system to help us keep track
85 the bugs and their resolution. Bug reports can be submitted via the
86 web:
88 https://github.com/bioperl/bioperl-live/issues
90 =head1 AUTHOR - Heikki Lehvaslaiho
92 Email: heikki-at-bioperl-dot-org
94 =head1 APPENDIX
96 The rest of the documentation details each of the object
97 methods. Internal methods are usually preceded with a _
99 =cut
102 # Let the code begin...
105 package Bio::Variation::VariantI;
106 use strict;
107 # Object preamble - inheritance
109 use base qw(Bio::Root::Root Bio::SeqFeature::Generic Bio::DBLinkContainerI);
111 =head2 id
113 Title : id
114 Usage : $obj->id
115 Function:
117 Read only method. Returns the id of the variation object.
118 The id is the id of the first DBLink object attached to this object.
120 Example :
121 Returns : scalar
122 Args : none
124 =cut
126 sub id {
127 my ($self) = @_;
128 my @ids = $self->each_DBLink;
129 my $id = $ids[0] if scalar @ids > 0;
130 return $id->database. "::". $id->primary_id if $id;
134 =head2 add_Allele
136 Title : add_Allele
137 Usage : $self->add_Allele($allele)
138 Function:
140 Adds one Bio::Variation::Allele into the list of alleles.
141 Note that the method forces the convention that nucleotide
142 sequence is in lower case and amino acds are in upper
143 case.
145 Example :
146 Returns : 1 when succeeds, 0 for failure.
147 Args : Allele object
149 =cut
152 sub add_Allele {
153 my ($self,$value) = @_;
154 if (defined $value) {
155 if( ! $value->isa('Bio::Variation::Allele') ) {
156 my $com = ref $value;
157 $self->throw("Is not a Allele object but a [$com]");
158 return 0;
159 } else {
160 if ( $self->isa('Bio::Variation::AAChange') ) {
161 $value->seq( uc $value->seq) if $value->seq;
162 } else {
163 $value->seq( lc $value->seq) if $value->seq;
165 push(@{$self->{'alleles'}},$value);
166 $self->allele_mut($value); #????
167 return 1;
169 } else {
170 return 0;
175 =head2 each_Allele
177 Title : alleles
178 Usage : $obj->each_Allele();
179 Function:
181 Returns a list of Bio::Variation::Allele objects
183 Example :
184 Returns : list of Alleles
185 Args : none
187 =cut
189 sub each_Allele{
190 my ($self,@args) = @_;
191 return @{$self->{'alleles'}};
195 =head2 isMutation
197 Title : isMutation
198 Usage : print join('/', $obj->each_Allele) if not $obj->isMutation;
199 Function:
201 Returns or sets the boolean value indicating that the
202 variant described is a canonical mutation with two alleles
203 assinged to be the original (wild type) allele and mutated
204 allele, respectively. If this value is not set, it is
205 assumed that the Variant describes polymorphisms.
207 Returns : a boolean
209 =cut
211 sub isMutation {
212 my ($self,$value) = @_;
213 if (defined $value) {
214 if ($value ) {
215 $self->{'isMutation'} = 1;
216 } else {
217 $self->{'isMutation'} = 0;
220 return $self->{'isMutation'};
224 =head2 allele_ori
226 Title : allele_ori
227 Usage : $obj->allele_ori();
228 Function:
230 Links to and returns the Bio::Variation::Allele object.
231 If value is not set, returns false. All other Alleles are
232 compared to this.
234 Amino acid sequences are stored in upper case characters,
235 others in lower case.
237 Example :
238 Returns : string
239 Args : string
241 See L<Bio::Variation::Allele> for more.
243 =cut
245 sub allele_ori {
246 my ($self,$value) = @_;
247 if( defined $value) {
248 if ( ! ref $value || ! $value->isa('Bio::Variation::Allele')) {
249 $self->throw("Value is not Bio::Variation::Allele but [$value]");
250 } else {
251 if ( $self->isa('Bio::Variation::AAChange') ) {
252 $value->seq( uc $value->seq) if $value->seq;
253 } else {
254 $value->seq( lc $value->seq) if $value->seq;
256 $self->{'allele_ori'} = $value;
259 return $self->{'allele_ori'};
263 =head2 allele_mut
265 Title : allele_mut
266 Usage : $obj->allele_mut();
267 Function:
269 Links to and returns the Bio::Variation::Allele
270 object. Sets and returns the mutated allele sequence.
271 If value is not set, returns false.
273 Amino acid sequences are stored in upper case characters,
274 others in lower case.
276 Example :
277 Returns : string
278 Args : string
280 See L<Bio::Variation::Allele> for more.
282 =cut
285 sub allele_mut {
286 my ($self,$value) = @_;
287 if( defined $value) {
288 if ( ! ref $value || ! $value->isa('Bio::Variation::Allele')) {
289 $self->throw("Value is not Bio::Variation::Allele but [$value]");
290 } else {
291 if ( $self->isa('Bio::Variation::AAChange') ) {
292 $value->seq( uc $value->seq) if $value->seq;
293 } else {
294 $value->seq( lc $value->seq) if $value->seq;
296 $self->{'allele_mut'} = $value;
299 return $self->{'allele_mut'};
302 =head2 length
304 Title : length
305 Usage : $obj->length();
306 Function:
308 Sets and returns the length of the affected original
309 allele sequence. If value is not set, returns false == 0.
311 Value 0 means that the variant position is before the
312 start=end sequence position. (Value 1 would denote a point
313 mutation). This follows the convension to report an
314 insertion (2insT) in equivalent way to a corresponding
315 deletion (2delT) (Think about indel polymorpism ATC <=> AC
316 where the origianal state is not known ).
318 Example :
319 Returns : string
320 Args : string
322 =cut
325 sub length {
326 my ($self,$value) = @_;
327 if ( defined $value) {
328 $self->{'length'} = $value;
330 if ( ! exists $self->{'length'} ) {
331 return 0;
333 return $self->{'length'};
336 =head2 upStreamSeq
338 Title : upStreamSeq
339 Usage : $obj->upStreamSeq();
340 Function:
342 Sets and returns upstream flanking sequence string. If
343 value is not set, returns false. The sequence should be
344 >=25 characters long, if possible.
346 Example :
347 Returns : string or false
348 Args : string
350 =cut
353 sub upStreamSeq {
354 my ($self,$value) = @_;
355 if( defined $value) {
356 $self->{'upstreamseq'} = $value;
358 return $self->{'upstreamseq'};
362 =head2 dnStreamSeq
364 Title : dnStreamSeq
365 Usage : $obj->dnStreamSeq();
366 Function:
368 Sets and returns dnstream flanking sequence string. If
369 value is not set, returns false. The sequence should be
370 >=25 characters long, if possible.
372 Example :
373 Returns : string or false
374 Args : string
376 =cut
379 sub dnStreamSeq {
380 my ($self,$value) = @_;
381 if( defined $value) {
382 $self->{'dnstreamseq'} = $value;
384 return $self->{'dnstreamseq'};
389 =head2 label
391 Title : label
392 Usage : $obj->label();
393 Function:
395 Sets and returns mutation event label(s). If value is not
396 set, or no argument is given returns false. Each
397 instantiable class needs to implement this method. Valid
398 values are listed in 'Mutation event controlled vocabulary' in
399 http://www.ebi.ac.uk/mutations/recommendations/mutevent.html.
401 Example :
402 Returns : string
403 Args : string
405 =cut
408 sub label {
409 my ($self,$value) = @_;
410 $self->throw_not_implemented();
415 =head2 status
417 Title : status
418 Usage : $obj->status()
419 Function:
421 Returns the status of the sequence change object.
422 Valid values are: 'suspected' and 'proven'
424 Example : $obj->status('proven');
425 Returns : scalar
426 Args : valid string (optional, for setting)
429 =cut
432 sub status {
433 my ($self,$value) = @_;
434 my %status = (suspected => 1,
435 proven => 1
438 if( defined $value) {
439 $value = lc $value;
440 if ($status{$value}) {
441 $self->{'status'} = $value;
443 else {
444 $self->throw("$value is not valid status value!");
447 if( ! exists $self->{'status'} ) {
448 return "$self";
450 return $self->{'status'};
454 =head2 proof
456 Title : proof
457 Usage : $obj->proof()
458 Function:
460 Returns the proof of the sequence change object.
461 Valid values are: 'computed' and 'experimental'.
463 Example : $obj->proof('computed');
464 Returns : scalar
465 Args : valid string (optional, for setting)
468 =cut
471 sub proof {
472 my ($self,$value) = @_;
473 my %proof = (computed => 1,
474 experimental => 1
477 if( defined $value) {
478 $value = lc $value;
479 if ($proof{$value}) {
480 $self->{'proof'} = $value;
481 } else {
482 $self->throw("$value is not valid proof value!");
485 return $self->{'proof'};
489 =head2 region
491 Title : region
492 Usage : $obj->region();
493 Function:
495 Sets and returns the name of the sequence region type or
496 protein domain at this location. If value is not set,
497 returns false.
499 Example :
500 Returns : string
501 Args : string
503 =cut
506 sub region {
507 my ($self,$value) = @_;
508 if( defined $value) {
509 $self->{'region'} = $value;
511 return $self->{'region'};
515 =head2 region_value
517 Title : region_value
518 Usage : $obj->region_value();
519 Function:
521 Sets and returns the name of the sequence region_value or
522 protein domain at this location. If value is not set,
523 returns false.
525 Example :
526 Returns : string
527 Args : string
529 =cut
532 sub region_value {
533 my ($self,$value) = @_;
534 if( defined $value) {
535 $self->{'region_value'} = $value;
537 return $self->{'region_value'};
540 =head2 region_dist
542 Title : region_dist
543 Usage : $obj->region_dist();
544 Function:
546 Sets and returns the distance tot the closest region
547 (i.e. intro/exon or domain) boundary. If distance is not
548 set, returns false.
550 Example :
551 Returns : integer
552 Args : integer
554 =cut
557 sub region_dist {
558 my ($self,$value) = @_;
559 if( defined $value) {
560 if ( not $value =~ /^[+-]?\d+$/ ) {
561 $self->throw("[$value] for region_dist has to be an integer\n");
562 } else {
563 $self->{'region_dist'} = $value;
566 return $self->{'region_dist'};
570 =head2 numbering
572 Title : numbering
573 Usage : $obj->numbering()
574 Function:
576 Returns the numbering chema used locating sequnce features.
577 Valid values are: 'entry' and 'coding'
579 Example : $obj->numbering('coding');
580 Returns : scalar
581 Args : valid string (optional, for setting)
584 =cut
587 sub numbering {
588 my ($self,$value) = @_;
589 my %numbering = (entry => 1,
590 coding => 1
593 if( defined $value) {
594 $value = lc $value;
595 if ($numbering{$value}) {
596 $self->{'numbering'} = $value;
598 else {
599 $self->throw("'$value' is not a valid for numbering!");
602 if( ! exists $self->{'numbering'} ) {
603 return "$self";
605 return $self->{'numbering'};
608 =head2 mut_number
610 Title : mut_number
611 Usage : $num = $obj->mut_number;
612 : $num = $obj->mut_number($number);
613 Function:
615 Returns or sets the number identifying the order in which the
616 mutation has been issued. Numbers shouldstart from 1.
617 If the number has never been set, the method will return ''
619 If you want the output from IO modules look nice and, for
620 multivariant/allele variations, make sense you better set
621 this attribute.
623 Returns : an integer
625 =cut
628 sub mut_number {
629 my ($self,$value) = @_;
630 if (defined $value) {
631 $self->{'mut_number'} = $value;
633 unless (exists $self->{'mut_number'}) {
634 return ('');
635 } else {
636 return $self->{'mut_number'};
641 =head2 SeqDiff
643 Title : SeqDiff
644 Usage : $mutobj = $obj->SeqDiff;
645 : $mutobj = $obj->SeqDiff($objref);
646 Function:
648 Returns or sets the link-reference to the umbrella
649 Bio::Variation::SeqDiff object. If there is no link,
650 it will return undef
652 Note: Adding a variant into a SeqDiff object will
653 automatically set this value.
655 Returns : an obj_ref or undef
657 See L<Bio::Variation::SeqDiff> for more information.
659 =cut
661 sub SeqDiff {
662 my ($self,$value) = @_;
663 if (defined $value) {
664 if( ! $value->isa('Bio::Variation::SeqDiff') ) {
665 $self->throw("Is not a Bio::Variation::SeqDiff object but a [$value]");
666 return;
668 else {
669 $self->{'seqDiff'} = $value;
672 unless (exists $self->{'seqDiff'}) {
673 return;
674 } else {
675 return $self->{'seqDiff'};
679 =head2 add_DBLink
681 Title : add_DBLink
682 Usage : $self->add_DBLink($ref)
683 Function: adds a link object
684 Example :
685 Returns :
686 Args :
689 =cut
692 sub add_DBLink{
693 my ($self,$com) = @_;
694 if( $com && ! $com->isa('Bio::Annotation::DBLink') ) {
695 $self->throw("Is not a link object but a [$com]");
697 $com && push(@{$self->{'link'}},$com);
700 =head2 each_DBLink
702 Title : each_DBLink
703 Usage : foreach $ref ( $self->each_DBlink() )
704 Function: gets an array of DBlink of objects
705 Example :
706 Returns :
707 Args :
710 =cut
712 sub each_DBLink{
713 my ($self) = @_;
715 return @{$self->{'link'}};
718 =head2 restriction_changes
720 Title : restriction_changes
721 Usage : $obj->restriction_changes();
722 Function:
724 Returns a string containing a list of restriction
725 enzyme changes of form +EcoRI, separated by
726 commas. Strings need to be valid restriction enzyme names
727 as stored in REBASE. allele_ori and allele_mut need to be assigned.
729 Example :
730 Returns : string
731 Args : string
733 =cut
735 sub restriction_changes {
736 my ($self) = @_;
738 if (not $self->{'re_changes'}) {
739 my %re = &_enzymes;
741 # complain if used on AA data
742 if ($self->isa('Bio::Variation::AAChange')) {
743 $self->throw('Restriction enzymes do not bite polypeptides!');
746 #sanity checks
747 $self->warn('Upstream sequence is empty!')
748 if $self->upStreamSeq eq '';
749 $self->warn('Downstream sequence is empty!')
750 if $self->dnStreamSeq eq '';
751 # $self->warn('Original allele sequence is empty!')
752 # if $self->allele_ori eq '';
753 # $self->warn('Mutated allele sequence is empty!')
754 # if $self->allele_mut eq '';
756 #reuse the non empty DNA level list at RNA level if the flanks are identical
757 #Hint: Check DNAMutation object first
758 if ($self->isa('Bio::Variation::RNAChange') and $self->DNAMutation and
759 $self->upStreamSeq eq $self->DNAMutation->upStreamSeq and
760 $self->dnStreamSeq eq $self->DNAMutation->dnStreamSeq and
761 $self->DNAMutation->restriction_changes ne '' ) {
762 $self->{'re_changes'} = $self->DNAMutation->restriction_changes;
763 } else {
765 #maximum length of a type II restriction site in the current REBASE
766 my ($le_dn) = 15;
767 my ($le_up) = $le_dn;
769 #reduce the flank lengths if the desired length is not available
770 $le_dn = CORE::length ($self->dnStreamSeq) if $le_dn > CORE::length ($self->dnStreamSeq);
771 $le_up = CORE::length ($self->upStreamSeq) if $le_up > CORE::length ($self->upStreamSeq);
773 #Build sequence strings to compare
774 my ($oriseq, $mutseq);
775 $oriseq = $mutseq = substr($self->upStreamSeq, -$le_up, $le_up);
776 $oriseq .= $self->allele_ori->seq if $self->allele_ori->seq;
777 $mutseq .= $self->allele_mut->seq if $self->allele_mut->seq;
778 $oriseq .= substr($self->dnStreamSeq, 0, $le_dn);
779 $mutseq .= substr($self->dnStreamSeq, 0, $le_dn);
781 # ... and their reverse complements
782 my $oriseq_rev = _revcompl ($oriseq);
783 my $mutseq_rev = _revcompl ($mutseq);
785 # collect results into a string
786 my $rec = '';
787 foreach my $enz (sort keys (%re)) {
788 my $site = $re{$enz};
789 my @ori = ($oriseq=~ /$site/g);
790 my @mut = ($mutseq=~ /$site/g);
791 my @ori_r = ($oriseq_rev =~ /$site/g);
792 my @mut_r = ($mutseq_rev =~ /$site/g);
794 $rec .= '+'. $enz. ", "
795 if (scalar @ori < scalar @mut) or (scalar @ori_r < scalar @mut_r);
796 $rec .= '-'. $enz. ", "
797 if (scalar @ori > scalar @mut) or (scalar @ori_r > scalar @mut_r);
800 $rec = substr($rec, 0, CORE::length($rec) - 2) if $rec ne '';
801 $self->{'re_changes'} = $rec;
804 return $self->{'re_changes'}
808 sub _revcompl {
809 # side effect: lower case letters
810 my ($seq) = shift;
812 $seq = lc $seq;
813 $seq =~ tr/acgtrymkswhbvdnx/tgcayrkmswdvbhnx/;
814 return CORE::reverse $seq;
818 sub _enzymes {
819 #REBASE version 005 type2.005
820 my %enzymes = (
821 'AarI' => 'cacctgc',
822 'AatII' => 'gacgtc',
823 'AccI' => 'gt[ac][gt]ac',
824 'AceIII' => 'cagctc',
825 'AciI' => 'ccgc',
826 'AclI' => 'aacgtt',
827 'AcyI' => 'g[ag]cg[ct]c',
828 'AflII' => 'cttaag',
829 'AflIII' => 'ac[ag][ct]gt',
830 'AgeI' => 'accggt',
831 'AhaIII' => 'tttaaa',
832 'AloI' => 'gaac[acgt][acgt][acgt][acgt][acgt][acgt]tcc',
833 'AluI' => 'agct',
834 'AlwNI' => 'cag[acgt][acgt][acgt]ctg',
835 'ApaBI' => 'gca[acgt][acgt][acgt][acgt][acgt]tgc',
836 'ApaI' => 'gggccc',
837 'ApaLI' => 'gtgcac',
838 'ApoI' => '[ag]aatt[ct]',
839 'AscI' => 'ggcgcgcc',
840 'AsuI' => 'gg[acgt]cc',
841 'AsuII' => 'ttcgaa',
842 'AvaI' => 'c[ct]cg[ag]g',
843 'AvaII' => 'gg[at]cc',
844 'AvaIII' => 'atgcat',
845 'AvrII' => 'cctagg',
846 'BaeI' => 'ac[acgt][acgt][acgt][acgt]gta[ct]c',
847 'BalI' => 'tggcca',
848 'BamHI' => 'ggatcc',
849 'BbvCI' => 'cctcagc',
850 'BbvI' => 'gcagc',
851 'BbvII' => 'gaagac',
852 'BccI' => 'ccatc',
853 'Bce83I' => 'cttgag',
854 'BcefI' => 'acggc',
855 'BcgI' => 'cga[acgt][acgt][acgt][acgt][acgt][acgt]tgc',
856 'BciVI' => 'gtatcc',
857 'BclI' => 'tgatca',
858 'BetI' => '[at]ccgg[at]',
859 'BfiI' => 'actggg',
860 'BglI' => 'gcc[acgt][acgt][acgt][acgt][acgt]ggc',
861 'BglII' => 'agatct',
862 'BinI' => 'ggatc',
863 'BmgI' => 'g[gt]gccc',
864 'BplI' => 'gag[acgt][acgt][acgt][acgt][acgt]ctc',
865 'Bpu10I' => 'cct[acgt]agc',
866 'BsaAI' => '[ct]acgt[ag]',
867 'BsaBI' => 'gat[acgt][acgt][acgt][acgt]atc',
868 'BsaXI' => 'ac[acgt][acgt][acgt][acgt][acgt]ctcc',
869 'BsbI' => 'caacac',
870 'BscGI' => 'cccgt',
871 'BseMII' => 'ctcag',
872 'BsePI' => 'gcgcgc',
873 'BseRI' => 'gaggag',
874 'BseSI' => 'g[gt]gc[ac]c',
875 'BsgI' => 'gtgcag',
876 'BsiI' => 'cacgag',
877 'BsiYI' => 'cc[acgt][acgt][acgt][acgt][acgt][acgt][acgt]gg',
878 'BsmAI' => 'gtctc',
879 'BsmI' => 'gaatgc',
880 'Bsp1407I' => 'tgtaca',
881 'Bsp24I' => 'gac[acgt][acgt][acgt][acgt][acgt][acgt]tgg',
882 'BspGI' => 'ctggac',
883 'BspHI' => 'tcatga',
884 'BspLU11I' => 'acatgt',
885 'BspMI' => 'acctgc',
886 'BspMII' => 'tccgga',
887 'BsrBI' => 'ccgctc',
888 'BsrDI' => 'gcaatg',
889 'BsrI' => 'actgg',
890 'BstEII' => 'ggt[acgt]acc',
891 'BstXI' => 'cca[acgt][acgt][acgt][acgt][acgt][acgt]tgg',
892 'BtrI' => 'cacgtc',
893 'BtsI' => 'gcagtg',
894 'Cac8I' => 'gc[acgt][acgt]gc',
895 'CauII' => 'cc[cg]gg',
896 'Cfr10I' => '[ag]ccgg[ct]',
897 'CfrI' => '[ct]ggcc[ag]',
898 'CjeI' => 'cca[acgt][acgt][acgt][acgt][acgt][acgt]gt',
899 'CjePI' => 'cca[acgt][acgt][acgt][acgt][acgt][acgt][acgt]tc',
900 'ClaI' => 'atcgat',
901 'CviJI' => '[ag]gc[ct]',
902 'CviRI' => 'tgca',
903 'DdeI' => 'ct[acgt]ag',
904 'DpnI' => 'gatc',
905 'DraII' => '[ag]gg[acgt]cc[ct]',
906 'DraIII' => 'cac[acgt][acgt][acgt]gtg',
907 'DrdI' => 'gac[acgt][acgt][acgt][acgt][acgt][acgt]gtc',
908 'DrdII' => 'gaacca',
909 'DsaI' => 'cc[ag][ct]gg',
910 'Eam1105I' => 'gac[acgt][acgt][acgt][acgt][acgt]gtc',
911 'EciI' => 'ggcgga',
912 'Eco31I' => 'ggtctc',
913 'Eco47III' => 'agcgct',
914 'Eco57I' => 'ctgaag',
915 'EcoNI' => 'cct[acgt][acgt][acgt][acgt][acgt]agg',
916 'EcoRI' => 'gaattc',
917 'EcoRII' => 'cc[at]gg',
918 'EcoRV' => 'gatatc',
919 'Esp3I' => 'cgtctc',
920 'EspI' => 'gct[acgt]agc',
921 'FauI' => 'cccgc',
922 'FinI' => 'gggac',
923 'Fnu4HI' => 'gc[acgt]gc',
924 'FnuDII' => 'cgcg',
925 'FokI' => 'ggatg',
926 'FseI' => 'ggccggcc',
927 'GdiII' => 'cggcc[ag]',
928 'GsuI' => 'ctggag',
929 'HaeI' => '[at]ggcc[at]',
930 'HaeII' => '[ag]gcgc[ct]',
931 'HaeIII' => 'ggcc',
932 'HaeIV' => 'ga[ct][acgt][acgt][acgt][acgt][acgt][ag]tc',
933 'HgaI' => 'gacgc',
934 'HgiAI' => 'g[at]gc[at]c',
935 'HgiCI' => 'gg[ct][ag]cc',
936 'HgiEII' => 'acc[acgt][acgt][acgt][acgt][acgt][acgt]ggt',
937 'HgiJII' => 'g[ag]gc[ct]c',
938 'HhaI' => 'gcgc',
939 'Hin4I' => 'ga[cgt][acgt][acgt][acgt][acgt][acgt][acg]tc',
940 'HindII' => 'gt[ct][ag]ac',
941 'HindIII' => 'aagctt',
942 'HinfI' => 'ga[acgt]tc',
943 'HpaI' => 'gttaac',
944 'HpaII' => 'ccgg',
945 'HphI' => 'ggtga',
946 'Hpy178III' => 'tc[acgt][acgt]ga',
947 'Hpy188I' => 'tc[acgt]ga',
948 'Hpy99I' => 'cg[at]cg',
949 'KpnI' => 'ggtacc',
950 'Ksp632I' => 'ctcttc',
951 'MaeI' => 'ctag',
952 'MaeII' => 'acgt',
953 'MaeIII' => 'gt[acgt]ac',
954 'MboI' => 'gatc',
955 'MboII' => 'gaaga',
956 'McrI' => 'cg[ag][ct]cg',
957 'MfeI' => 'caattg',
958 'MjaIV' => 'gt[acgt][acgt]ac',
959 'MluI' => 'acgcgt',
960 'MmeI' => 'tcc[ag]ac',
961 'MnlI' => 'cctc',
962 'MseI' => 'ttaa',
963 'MslI' => 'ca[ct][acgt][acgt][acgt][acgt][ag]tg',
964 'MstI' => 'tgcgca',
965 'MwoI' => 'gc[acgt][acgt][acgt][acgt][acgt][acgt][acgt]gc',
966 'NaeI' => 'gccggc',
967 'NarI' => 'ggcgcc',
968 'NcoI' => 'ccatgg',
969 'NdeI' => 'catatg',
970 'NheI' => 'gctagc',
971 'NlaIII' => 'catg',
972 'NlaIV' => 'gg[acgt][acgt]cc',
973 'NotI' => 'gcggccgc',
974 'NruI' => 'tcgcga',
975 'NspBII' => 'c[ac]gc[gt]g',
976 'NspI' => '[ag]catg[ct]',
977 'PacI' => 'ttaattaa',
978 'Pfl1108I' => 'tcgtag',
979 'PflMI' => 'cca[acgt][acgt][acgt][acgt][acgt]tgg',
980 'PleI' => 'gagtc',
981 'PmaCI' => 'cacgtg',
982 'PmeI' => 'gtttaaac',
983 'PpiI' => 'gaac[acgt][acgt][acgt][acgt][acgt]ctc',
984 'PpuMI' => '[ag]gg[at]cc[ct]',
985 'PshAI' => 'gac[acgt][acgt][acgt][acgt]gtc',
986 'PsiI' => 'ttataa',
987 'PstI' => 'ctgcag',
988 'PvuI' => 'cgatcg',
989 'PvuII' => 'cagctg',
990 'RleAI' => 'cccaca',
991 'RsaI' => 'gtac',
992 'RsrII' => 'cgg[at]ccg',
993 'SacI' => 'gagctc',
994 'SacII' => 'ccgcgg',
995 'SalI' => 'gtcgac',
996 'SanDI' => 'ggg[at]ccc',
997 'SapI' => 'gctcttc',
998 'SauI' => 'cct[acgt]agg',
999 'ScaI' => 'agtact',
1000 'ScrFI' => 'cc[acgt]gg',
1001 'SduI' => 'g[agt]gc[act]c',
1002 'SecI' => 'cc[acgt][acgt]gg',
1003 'SexAI' => 'acc[at]ggt',
1004 'SfaNI' => 'gcatc',
1005 'SfeI' => 'ct[ag][ct]ag',
1006 'SfiI' => 'ggcc[acgt][acgt][acgt][acgt][acgt]ggcc',
1007 'SgfI' => 'gcgatcgc',
1008 'SgrAI' => 'c[ag]ccgg[ct]g',
1009 'SimI' => 'gggtc',
1010 'SmaI' => 'cccggg',
1011 'SmlI' => 'ct[ct][ag]ag',
1012 'SnaBI' => 'tacgta',
1013 'SnaI' => 'gtatac',
1014 'SpeI' => 'actagt',
1015 'SphI' => 'gcatgc',
1016 'SplI' => 'cgtacg',
1017 'SrfI' => 'gcccgggc',
1018 'Sse232I' => 'cgccggcg',
1019 'Sse8387I' => 'cctgcagg',
1020 'Sse8647I' => 'agg[at]cct',
1021 'SspI' => 'aatatt',
1022 'Sth132I' => 'cccg',
1023 'StuI' => 'aggcct',
1024 'StyI' => 'cc[at][at]gg',
1025 'SwaI' => 'atttaaat',
1026 'TaqI' => 'tcga',
1027 'TaqII' => 'gaccga',
1028 'TatI' => '[at]gtac[at]',
1029 'TauI' => 'gc[cg]gc',
1030 'TfiI' => 'ga[at]tc',
1031 'TseI' => 'gc[at]gc',
1032 'Tsp45I' => 'gt[cg]ac',
1033 'Tsp4CI' => 'ac[acgt]gt',
1034 'TspEI' => 'aatt',
1035 'TspRI' => 'ca[cg]tg[acgt][acgt]',
1036 'Tth111I' => 'gac[acgt][acgt][acgt]gtc',
1037 'Tth111II' => 'caa[ag]ca',
1038 'UbaGI' => 'cac[acgt][acgt][acgt][acgt]gtg',
1039 'UbaPI' => 'cgaacg',
1040 'VspI' => 'attaat',
1041 'XbaI' => 'tctaga',
1042 'XcmI' => 'cca[acgt][acgt][acgt][acgt][acgt][acgt][acgt][acgt][acgt]tgg',
1043 'XhoI' => 'ctcgag',
1044 'XhoII' => '[ag]gatc[ct]',
1045 'XmaIII' => 'cggccg',
1046 'XmnI' => 'gaa[acgt][acgt][acgt][acgt]ttc'
1049 return %enzymes;