Home		Analyse		Results		Contact us

Busy ... Please wait!

MADIBA Help

MADIBA Help Page

What is FASTA format?
What is AGI format?
What is PlasmoDB ID format?
What is TIGR Rice gene nomenclature?
I get an error when I try submit my sequences
I get no results when finding related genes using BLAST

What is FASTA format?

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:

>PFA0005w
ATGGTGACGCAAAGTAGTGGTGGGGGTGCTGCTGGTAGTAGTGGTGAGGA
AGATGCCAAACATGTATTGGATGAATTTGGGCAACAAGTGTACAATGAAA
AAGTGGAAAAGTATGCTAATTCTAAAATATATAAAGAGGCGTTGAAAGGA
GATTTGTCACAAGCATCAATTTTGAGCGAATTAGCTGGCACCTATAAACC
ATGTGCCCTTGAATATGAATATTATAAGCATACTAATGGCGGTGGTAAGG
GTAAAAGGTATCCGTGTACAGAGTTAGGTGAAAAAGTAGAACCACGTTTT
TCGGATACACTTGGTGGTCAGTGTACTAACAAAAAAATAGAAGGTAATAA
ATATATTAAAGGTAAGGATGTTGGTGCTTGTGCACCATACCGACGTCTAC
ATCTATGTAGTCATAATTTGGAAAGTATACAAACAAATAATTATAATAGT
GGTAATGCTAAACATAATTTATTGGTAGATGTGTGTATGGCAGCCAAATA
CGAAGGGGACTCAATAAAAAACTATTATCCAAAGTATCAAAGAACATATC
...

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:

        A --> adenosine           M --> A C (amino)
        C --> cytidine            S --> G C (strong)
        G --> guanine             W --> A T (weak)
        T --> thymidine           B --> G T C
        U --> uridine             D --> G A T
        R --> G A (purine)        H --> A C T
        Y --> T C (pyrimidine)    V --> G C A
        K --> G T (keto)          N --> A G C T (any)
                                  -  gap of indeterminate length

What is AGI format?

Format:

AT (Arabidopsis thaliana)
1,2,3,4,5 (chromosome number) or M for mitochondrial or C for chloroplast.
g (gene)
00010 (five-digit code, numbered from top/north to bottom/south of chromosome)
Different versions of a gene product, eg a differentially spliced gene , are denoted as 00010.1,2,3 etc (Optional)

Gene Symbols (usually based on function) are also acceptable.

Any of the following are acceptable formats as input:

Locus Identifier	Alternate Splice Forms	Gene Symbol
AT1G08680	AT1G08680.1	ZIGA4
	AT1G08680.2
	AT1G08680.3

Note that using the Locus Identifier or Gene Symbol as input will use all Alternate Splice Form Identifiers.

What is PlasmoDB ID format?

For example:
PFB0256w
PFB0260w
PFB0280w
PFB0305c

What is TIGR Rice gene nomenclature?

Format:

LOC_Os (Oryza sativa Locus)
1-12 (chromosome number)
g (gene)
00010 (five-digit code, numbered from top/north to bottom/south of chromosome)
Different versions of a gene product, eg a differentially spliced gene , are denoted as 00010.1,2,3 etc (Optional)

For more information, see the TIGR rice site

I get an error when I try submit my sequences

This server uses BioPython which can be critical about how the sequences are submitted
Ensure that the sequences are properly formated as a FASTA file
Also, try replacing the end-of-line character at the end of the title (remove it and re-type it)

I get no results when finding related genes using BLAST

Try increasing the e-value, or decreasing the %coverage and/or %identity