注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

wangyufeng的博客

祝愿BB 健康开心快乐每一天

 
 
 

日志

 
 

SMITH: a LIMS for handling next-generation sequencing workflows  

2014-12-24 17:24:33|  分类: 生物信息分析 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Abstract

BACKGROUND:

Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges.

METHODS:

SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses.

RESULTS:

SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc.

CONCLUSIONS:

SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.

SMITH: a LIMS for handling next-generation sequencing workflows - 喜欢吃桃子 - wangyufeng的博客

 Infrastructure, main tasks, and architecture. A) Infrastructure: Sequencing is performed on an Illumina HiSeq2000 instrument. Data are stored on an Isilon mass storage device. Data are elaborated on a Sun Grid Engine High Performance Computing cluster (SGE-HPC). Application servers run web applications for Genome browsing, data listings, the SMITH LIMS, and host the MySQL information tier. The user data directories are organized by group leader name, user login name, file-type, and run date. B) Sample tracking in SMITH. A sample passes through four states ("requested", "queued", "confirmed", "analysed"). Submitted samples have status "requested". When a sample is added to the virtual flow cell, its status changes to "queued". Upon the group leader confirmation the status changes to "confirmed". The sample is then run and analysed by the workflow engine and assumes the final status "analysed". HPC refers to a high performance computing cluster. C) Architecture of the workflow unit. Generated commands invoke Galaxy workflows that subsequently call the un-pluggable core. A part of the instruments can be on the Galaxy side (proprietary tools and scripts) and the other part (open-source tools) is moved to the core.
Fulltext:
http://www.biomedcentral.com/1471-2105/15/S14/S3
  评论这张
 
阅读(405)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017