ENSEMBL ftp structure

From Augix' Wiki

Jump to: navigation, search

FTP Site Directories Ref Link

The FTP directory has the following basic structure, although not all information is neccessarily available for each species.

  |-- embl	      Gene predictions annotated on genomic DNA slices of 1 Mb in EMBL format.
  |   | 
  |   |-- species
  |
  |
  |-- emf     	      Alignment dumps in EMF format
  |   | 
  |   |-- pecan                  * Pecan whole genome multiple alignments
  |   |                            with conservation scores for selected sets
  |   |-- ensembl_compara        * protein trees and protein multiple alignments
  |   |                            underlying orthologue/paralogue predictions
  |   |--species_variation       * resequencing data
  |
  | 
  |-- fasta	      Gene predictions in FASTA datatabase format
  |   |
  |   |--species
  |      |
  |      |-- cdna         * Transcript (cDNA) predictions
  |      |-- dna          * Genomic DNA in assembled entities
  |      |-- pep          * Translation (peptide) predictions
  |      |-- rna          * Non-coding RNA predictions
  |
  |            
  |-- genbank	      Gene predictions annotated on genomic DNA slices of 1 Mb in GenBank format.
  |   | 
  |   |-- species
  |
  |
  |-- gtf	      Gene annotation in GTF format
  |
  |-- mysql          MySQL database table text dumps
      |
      |-- core       General genome annotation information
      |
      |                * Genome sequence assembly
      |                * Ensembl gene predictions
      |                * Ab initio gene predictions
      |                * Marker information
      |                * ...
      |
      |-- otherfeatures  Additional genome annotation
      |
      |                * Gene predictions based on EST information
      |                * ...
      |
      |-- variation  Genetic variation information
      |-- vega       Manually curated gene sets
      |-- cdna       cDNA to genome alignments based on the latest EMBL database
      |
      |-- ensembl_compara      Cross-species comparative genomics data:
      |
      |                          * Orthologue/paralogue predictions
      |                          * Protein families
      |                          * Whole genome alignments
      |                          * Synteny information
      |
      |-- ensembl_go           Gene Ontology database
      |
      |-- ensembl_web_user_db  SQL table defintion for server-side user config database
      |
      |-- ensembl_website        Ensembl web site database:
      |                            * Context-sensitive help articles
      |                            * News articles
      |                            * Mini-ads
      |-- compara_mart_homology     Orthologue/paralogue predictions
      |
      |-- compara_mart_multiple_ga  Constrained elements predictions
      |
      |-- compara_mart_pairwise_ga  Whole genome pairwise alignments
      | 
      |-- ensembl_mart              Cross-species data mining tables
      |
      |-- genomic_features_mart     Clone data sets
      |
      |-- sequence_mart             Genome sequences
      |
      |-- snp_mart                  Genetic variation information
      |
      |-- vega_mart                 Manually curated gene sets
Personal tools