注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

wangyufeng的博客

祝愿BB 健康开心快乐每一天

 
 
 

日志

 
 

如何从GFF文件中提取CDS序列:How to get the CDS sequences from gff3 file  

2011-11-03 10:07:58|  分类: Perl & bioperl |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Here is the perl script to get the cds sequences from gff3 file.

If you have any questions about the script or parsing gff3 file, pls link to:http://bioops.info/2011/03/parse-gff3-file/


#!/usr/bin/perl
# parse_gff3.pl
# get the cds sequences from gff3 file.
# usage: perl parse_gff3.pl gff3_input_file genome_sequence_directory >output.txt
 
use strict;
use warnings;
 
use Bio::Tools::GFF;
use Bio::DB::Fasta;
 
my $gff3_file=$ARGV[0];  #gff3 input file
my $genome=$ARGV[1]; #genome sequence directory
my $gffio = Bio::Tools::GFF -> new(-file =>$gff3_file , -gff_version => 3);
my $db    = Bio::DB::Fasta  -> new($genome);
 
my $i=0;
my $first=0;
my $strand=0;
my $seq='';
while(my $feature = $gffio->next_feature()) {
# Sometimes, the gff3 file format is a little different with the standard format,
# So that the keys of the hashes are different.
# Use the following lines of masked code to see the keys of the hashes. Some Change may be needed.
#  $i++;
#  if ($i==2){
#    my @a=keys %{$feature};
#    print "@a\n";
#    my @b=keys %{$feature->{_location}};
#    print "@b\n";
#    print "$feature->{_primary_tag}\n";
#  }
  if($feature->{_primary_tag}=~/mrna/i){
    $first++;
    if($first==1){
      $seq='';
    }else{
      print_sequence($seq,80,$strand);
      $seq='';
    }
    print ">$feature->{_gsf_tag_hash}->{Name}->[0]|$feature->{_gsf_tag_hash}->{ID}->[0]\n";
  }elsif($feature->{_primary_tag}=~/cds/i){
    my $seq_temp=$db->seq($feature->{_gsf_seq_id}, $feature->{_location}->{_start}=>$feature->{_location}->{_end});
    if($feature->{_location}->{_strand}=~/-/){  # to know the strand
      $seq=$seq_temp.$seq;
      $strand=0;
    }else{
      $seq.=$seq_temp;
      $strand=1;
    }
  }
}
print_sequence($seq,80,$strand);
 
$gffio->close();
 
sub print_sequence {
  my($sequence, $length,$strand) = @_;
  if($strand==0){  # if the sequence is on the minus strand, get its reverse-complement counterpart
    $sequence=~tr/ACGTacgt/TGCAtgca/;
    $sequence=reverse $sequence;
  }
  for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) {
    print substr($sequence, $pos, $length),"\n";
  }
}
  评论这张
 
阅读(3233)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017