注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

wangyufeng的博客

祝愿BB 健康开心快乐每一天

 
 
 

日志

 
 

如何使用MEME本地化预测Motif  

2012-04-23 10:20:28|  分类: 生物信息分析 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
友情提示:meme.bin是在src目录下,而不是在meme目录
cd src ./meme.bin

安装文档:

http://meme.nbcr.net/meme4_3_0/doc/meme-install.html

1,Type the following commands and then follow the instructions printed by the configure command.
$ tar zxf meme_VERSION.tar.gz
$ cd meme_VERSION
$ ./configure --prefix=$home/meme --with-url=http://meme.nbcr.net/meme

2,Edit your shell configuration file to add
$home/meme/bin to your shell's path.




首先,当然是先安装 MEME 和 MAST 到指定的目录

下载地址:http://meme.nbcr.net/meme4_6_1/meme-download.html

然后,就是运行 MAST 进行 motif 的搜索

a. 基本命令:
$mast <motif file> <sequence file> [options]

b. motif 文件的格式
=======================motif file================
MEME version 4
ALPHABET= ACGT  

strands: + -

Background letter frequencies (from
A 0.303 C 0.183 G 0.209 T 0.306

MOTIF crp log-odds matrix: alength= 4 w= 19
-1073``` -5  -1073  143
-1073  -163````163   -6
-1073  -163  -1073  162
  -78 -1073    187 -237
  144  -163  -1073 -138 
   -4    -5    -24   21
   -4    95     17 -138
 -136    36     76   -6
   81 -1073    -24   -6
 -236    36    149 -138
  -78    36     49   -6
-1073  -163    -83  143
 -236   227  -1073 -237
  134 -1073    -24 -237
 -236   227  -1073 -237
  144  -163   -183 -237
  -78   117   -183   21
   44 -1073  -1073   94
   22  -163  -1073   94

MOTIF lexA log-odds matrix: alength= 4 w= 18
  -47 -1045  -1045  147
  153 -1045  -157  -198
-1045   227 -1045 -1045
-1045 -1045 -1045   182
-1045 -1045   223 -1045
-1045 -1045 -1045   182
  153 -1045  -157  -198
-1045  -154 -1045   171
  153 -1045  -157  -198
 -105 -1045 -1045   160
   95 -154       -99
   -5   46  -1045    60
  175 -1045 -1045 -1045
   -5     5 -1045    82
   53   127 -1045  -198
-1045   227 -1045 -1045
  175 -1045 -1045 -1045
-1045 -1045   188   -40
================================
首先对这个文件作个说明,这个文件包含下面几个部分的内容

1。The MEME version number line.

2. The alphabet line.

  1. For DNA motif files the line
    ALPHABET= ACGT 
    or for protein motif files
    ALPHABET= ACDEFGHIKLMNPQRSTVWY 
    must be present.
3. Strand information line. (DNA motif files only.)
 
  If both DNA strands are included in the motif:
    strands: + - 
    or if only one strand is included:
    strands: +

4. The background distribution lines.
The background must start a new line with the string: Background letter frequencies (from This is followed, on the next line(s), by a list of characters and their associated frequencies, delimited by white space.

5. The motifs.
There may be one or more motifs. Each motif starts with a "MOTIF" line, followed by a "log-odds matrix" and/or a "letter-probability matrix" section.

MAST requires each motif to be represented as a "log-odds matrix". For all other programs that accept MEME minimal motif format, each motif must be represented as a "letter-probability matrix". You may include both formats for a given motif but only one will be used, depending on which program you are running. If you include both formats for a motif, you should put the log-odds matrix format first. Each of the sections has a header line followed by one line of letter score/frequencies for each position in the motif, as detailed below. It is recommended, though not required, that you included a URL line listing the webpage where more information can be found on the motif.

The motif format for MAST is:
MOTIF motif_name log-odds matrix: alength= 4 w= 22 E= 0 ... ... lines of log-odds scores; each line is list of scores for each letter ... ... URL website The motif format for all other programs that accept MEME minimal motif format is: MOTIF motif_name letter-probability matrix: alength= 4 w= 22 nsites= 49 E= 0 ... ... lines of probabilities; each line is list of probabilities of each letter ... ... URL website

c.序列文件
序列文件可以是一般的 fasta 格式的序列即可

d. 输出的文件格式

显示如下:

********************************************************************************
MAST - Motif Alignment and Search Tool
********************************************************************************
        MAST version 4.6.0 (Release date: Thu Jan 20 14:06:48 PST 2011)

        For further information on how to interpret these results or to get
        a copy of the MAST software please access http://meme.nbcr.net.
********************************************************************************


********************************************************************************
REFERENCE
********************************************************************************
        If you use this program in your research, please cite:

        Timothy L. Bailey and Michael Gribskov,
        "Combining evidence using p-values: application to sequence homology
        searches", Bioinformatics, 14(48-54), 1998.
********************************************************************************


********************************************************************************
DATABASE AND MOTIFS
********************************************************************************
        DATABASE mac14_merge_GSM353640_peaks.bed.chr1.peakOut.200 (nucleotide)
        Last updated on Sat Mar 19 14:16:52 2011
        Database contains 200 sequences, 141677 residues

        Scores for positive and reverse complement strands are combined.

        MOTIFS test.motif (nucleotide)
        MOTIF WIDTH BEST POSSIBLE MATCH
        ----- ----- -------------------
            19   TGTGATCGAGGTCACACTT
            18   TACTGTATATATATCCAG

        PAIRWISE MOTIF CORRELATIONS:
        MOTIF     1
        ----- -----
            0.34
        No overly similar pairs (correlation > 0.60) found.

        Random model letter frequencies (from non-redundant database):
        A 0.274 C 0.225 G 0.225 T 0.274
********************************************************************************


********************************************************************************
SECTION I: HIGH-SCORING SEQUENCES
********************************************************************************
        - Each of the following 1 sequences has E-value less than 10.
        - The E-value of a sequence is the expected number of sequences
          in a random database of the same size that would match the motifs as
          well as the sequence does and is equal to the combined p-value of the
          sequence times the number of sequences in the database.
        - The combined p-value of a sequence measures the strength of the
          match of the sequence to all the motifs and is calculated by
            o finding the score of the single best match of each motif
              to the sequence (best matches may overlap),
            o calculating the sequence p-value of each score,
            o forming the product of the p-values,
            o taking the p-value of the product.
        - The sequence p-value of a score is defined as the
          probability of a random sequence of the same length containing
          some match with as good or better a score.
        - The score for the match of a position in a sequence to a motif
          is computed by by summing the appropriate entry from each column of
          the position-dependent scoring matrix that represents the motif.
        - Sequences shorter than one or more of the motifs are skipped.
        - The table is sorted by increasing E-value.
********************************************************************************
SEQUENCE NAME                      DESCRIPTION                   E-VALUE  LENGTH
-------------                      -----------                   -------- ------
                                                                    0.5    542

********************************************************************************



********************************************************************************
SECTION II: MOTIF DIAGRAMS
********************************************************************************
        - The ordering and spacing of all non-overlapping motif occurrences
          are shown for each high-scoring sequence listed in Section I.
        - A motif occurrence is defined as a position in the sequence whose
          match to the motif has POSITION p-value less than 0.0001.
        - The POSITION p-value of a match is the probability of
          a single random subsequence of the length of the motif
          scoring at least as well as the observed match.
        - For each sequence, all motif occurrences are shown unless there
          are overlaps.  In that case, a motif occurrence is shown only if its
          p-value is less than the product of the p-values of the other
          (lower-numbered) motif occurrences that it overlaps.
        - The table also shows the E-value of each sequence.
        - Spacers and motif occurences are indicated by
           o -d-    `d' residues separate the end of the preceding motif
                    occurrence and the start of the following motif occurrence
           o [sn]  occurrence of motif `n' with p-value less than 0.0001.
                    A minus sign indicates that the occurrence is on the
                    reverse complement strand.
********************************************************************************

SEQUENCE NAME                      E-VALUE   MOTIF DIAGRAM
-------------                      --------  -------------
                                      0.5  242_[+2]_282

********************************************************************************



********************************************************************************
SECTION III: ANNOTATED SEQUENCES
********************************************************************************
        - The positions and p-values of the non-overlapping motif occurrences
          are shown above the actual sequence for each of the high-scoring
          sequences from Section I.
        - A motif occurrence is defined as a position in the sequence whose
          match to the motif has POSITION p-value less than 0.0001 as
          defined in Section II.
        - For each sequence, the first line specifies the name of the sequence.
        - The second (and possibly more) lines give a description of the
          sequence.
        - Following the description line(s) is a line giving the length,
          combined p-value, and E-value of the sequence as defined in Section I.
        - The next line reproduces the motif diagram from Section II.
        - The entire sequence is printed on the following lines.
        - Motif occurrences are indicated directly above their positions in the
          sequence on lines showing
           o the motif number of the occurrence (a minus sign indicates that
          the occurrence is on the reverse complement strand),
           o the position p-value of the occurrence,
           o the best possible match to the motif (or its reverse complement), a
nd
           o columns whose match to the motif has a positive score (indicated
             by a plus sign).
********************************************************************************


8

  LENGTH = 542  COMBINED P-VALUE = 2.50e-03  E-VALUE =      0.5
  DIAGRAM: 242_[+2]_282

           o columns whose match to the motif has a positive score (indicated
             by a plus sign).
********************************************************************************


8

  LENGTH = 542  COMBINED P-VALUE = 2.50e-03  E-VALUE =      0.5
  DIAGRAM: 242_[+2]_282


                      [+2]
                      7.3e-07
                      TACTGTATATATATCCAG
                      +++++  +++++++++++
226  CGACCCAGTAATCCCACTACTGGGTATATACCCAGATGAATATAAACCATTCTACCATAAAGACACATGCATACA

********************************************************************************


CPU: fudan-vari
Time 0.070000 secs.

mast test.motif mac14_merge_GSM353640_pe
  评论这张
 
阅读(3881)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017