注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

wangyufeng的博客

祝愿BB 健康开心快乐每一天

 
 
 

日志

 
 

Readscan:separate host and pathogen read sequences  

2012-12-05 09:58:48|  分类: 抗性基因 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
Readscan is a program to identify pathogen/contaminant sequences in whole genome shotgun sequencing datasets.
Readscan:separate host and pathogen read sequences - 喜欢吃桃子 - wangyufeng的博客

DOWNLOAD, INSTALL and TEST RUN on a bash shell

  1. wget ftp://pub-ftp.kaust.edu.sa/pub/clse/pathogen_genomics/clse_ext1/readscan/readscan-0.5.tar.gz

  2. tar -zxvf readscan-0.5.tar.gz

  3. follow the instruction on the README inside

  4. Output consists of a venn_stats file and microbes_stats.txt. Please refer to TROUBLESHOOTING for solutions. Please keep the readscan_search.log and readscan.log file which would help to debug any problems.

MANUAL

For updated manual see

  • perldoc readscan_lsf.pl

  • perldoc readscan_makeflow.pl

  • perldoc readscan_sge.pl

  • perldoc readscan.pl

TROUBLESHOOTING

  1. What are the prerequisites for installing readscan ?

    readscan depends on

    • perl

    • smalt v 0.6.3

    • Unix utilities make,sort,split,cat etc.

    • To run on Platform LSF or Sun Grid Engine no additional tools are needed

    • To run on Load levelers other than Platform LSF or Sun Grid Engine Makeflow is required.

  2. normal: User cannot use the queue. Job not submitted.

    On LSF clusters the jobs will submitted to the default queue. Try changing the queue

    by passing --lsf q=anotherqueuename

  3. Error in rusage section: Job-level resource requirement values must satisfy limits set by the queue-level resource requirement values. Job not submitted.

    On some LSF clusters the jobs will not be submitted as the rusage section may not satisfy the queue-level resource requirements. It is possible to override the default LSF rusage ie., -R string

    by passing --lsf RMh='span[ptile=8]' --lsf RMp='span[ptile=8]' should fix this

    or alternatively

    --lsf R=-1

    would totally suppress the resource string passed to bsub

  4. TERM_MEMLIMIT: job killed after reaching LSF memory usage limit. Exited with signal termination: Killed.

    [0] smalt.c:330 ERROR: memory allocation failed

    On LSF Try increasing the memory limit with

    readscan_lsf.pl index -k 13 -s 6 --lsf R='select[mem>6291] rusage[mem=6291]' --lsf M=6291000 bacterial_all.fasta

    On SGE Try increasing the memory limit with

    readscan_sge.pl index -k 13 --sampling_step_size 6 --sge l='h_vmem=6291M,virtual_free=2645.5M' bacterial_all.fasta

  5. cannot create <outputdir> at readscan.pl line <line>.

    Try deleting the outdir if it already exists readscan will try to create a new directory with name of the input fastq file

  6. How to interpret the results stats file?

    sample stats file

    The stats file has 6 sections species,genus,family,order,class,phylum and sequence The first 5 sections has 3 columns rank,parent taxon_id,taxon_id,name and Genome relative abundance (GRA).
    They are sorted by most abundant to least abundant taxon on GRA values.
    The last sequnce section has additional columns namely

    NO_OF_ALIGNS - number of alignments on a particular reference
    BASES_COVERED - number of bases covered on a particular reference
    REF_LENTH - length of the reference
    PERC_COVERAGE - percentage of the reference covered by atleast 1 base
    MEAN_CONTIG_LENGTH = sum(contig_length X number_of_reads_supporting_the_contig)/sum(number_of_reads)
    REF_NAME - name of the reference sequence

  7. Very low percentage of reads map to the host and pathogen databases

    Try setting the minid parameter for smalt to 0.01 or less

    --smalt yMh=0.01 --smalt yMp=0.01

  8. How to compile an updated reference dataset ?

    The reference datasets (bacterial,virual,fungi and human) are nothing but multiple FASTA sequences concatenated into a single multifasta file. The reference datasets provided on this page may not be upto date. Users who wish to compile an upto date reference datasets of microbial and human references may download them from NCBI RefSeq FTP page.

  9. How to compile an updated Taxon file?

    Upto date Taxon files can be downloaded NCBI Taxonomy FTP page.


 



  评论这张
 
阅读(817)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017