######################################################################

README from ftp://ftp.ncbi.nih.gov/genomes/H_sapiens

updated: 7 March, 2013

######################################################################

=====================================
Directory Contents
=====================================

This directory includes sequence records and map data generated at
NCBI or used in NCBI resources.

Sequence data include chromosomes, contigs, RNAs, and proteins 
generated through the NCBI Reference Sequence and NCBI Genome 
Annotation projects. Map data presented in the Map Viewer resource are
also provided here. 

The NCBI Map Viewer provides graphical views of the human genome data.
See:
  http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606

The sections below include: 

    README_CURRENT_RELEASE file
    Scaffold and chromosome assembly & information files
      allcontig.agp.gz
      masking_coordinates.gz
      windowmasker_nmer.oascii.gz
      seq_contig.md.gz
      scaffold_names
    CHR## - Chromosome directories
    Assembled_chromosomes directory & chr_NC_gi file
    chr_context_for_alt_loci
    pseudoautosomal_region
    RNA, protein and other directories
    GFF
    GFF_interim
    Gnomon
    mapview directory
    Mapping_data directory
    ARCHIVE directory
    File extensions

Sequence data are in the Chromosome, RNA, and protein directories.


=====================================
README_CURRENT_RELEASE file
=====================================

This file provides information specific to the current annotation
release, including data freeze dates, release date and release number.
This file also indicates if updates are made to correct an error or to
provide updated information.


====================================================
Scaffold and chromosome assembly & information files
====================================================

allcontig.agp.gz file:
----------------------
This file provides detailed information about the scaffold assembly.

columns:

 1:   scaffold accession.version
 
 2:   beginning base on scaffold
 
 3:   ending base on scaffold
 
 4:   scaffold fragment number
 
 5:   fragment type (D=Draft, F=Finished, W=Whole genome shotgun
      (WGS) N=NN gap)
 
 6:   if sequence, value = accession.version of the component 
      sequence from which bases are derived
      if N-gap, value = number of N's
 
 7:   if sequence, value = beginning base of component sequence
      if N-gap, value = keyword "fragment"
      {fragment keyword indicates gap between fragments within a
      clone or between fragments of overlapping clones} 
 
 8:   if sequence, value = ending base of component sequence
      if N-gap, value=yes - some sort of order and orientation by
      mRNA, EST or BAC end pair
      if N-gap,value=no - no order and orientation between
      flanking fragments
 
 9:   + if accession is positive orientation to scaffold, 
      - otherwise
      (column 9 for sequence only)


windowmasker_nmer.oascii.gz
---------------------------
The windowmasker_nmer.oascii.gz file gives Nmer counts generated by
running the first phase of WindowMasker (Morgulis A, Gertz EM,
Schaffer AA, Agarwala R. 2006. Bioinformatics 22:134-41) on the
genomic sequences of the reference assembly. These counts can be used
as input for the second phase of WindowMasker to mask any nucleotide
sequence for the genome. N and other default parameter settings are
computed within WindowMasker depending on the input genome sequence.
The windowmasker_nmer.oascii.gz file is in WindowMasker optimized
ASCII format and is not human readable. Alternate human readable
formats are supported and can be generated by running WindowMasker.
WindowMasker is available at:
ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/


masking_coordinates.gz:
-----------------------
The masking_coordinates.gz file lists locations for segments of
repetitive sequence in the genomic scaffolds (determined using
RepeatMasker http://www.repeatmasker.org/). These coordinates can be
used to mask the repetitive sequences in the scaffolds.

columns:

 1.   scaffold accession.version

 2.   beginning base on scaffold

 3.   ending base on scaffold

 4.   class of repetitive sequence, or list of classes when
      overlapping repeats have been merged into a single
      span.


seq_contig.md.gz file:
----------------------
The seq_contig.md file provides information on the order and
orientation of the scaffolds along the chromosome.

columns:

1. tax_id:       9606 is Homo sapiens

2. chromosome:   1-22, X, Y, MT, value|contig accession or 
                 Un|contig accession where value|contig indicates
                 contig is associated with a chromosome but not
                 localized, and Un|contig indicates the contig is not
                 placed on any chromosome
3. from:         chromosome coordinate, reported in 1 base coordinates

4. to:           chromosome coordinate, reported in 1 base coordinates

5. orientation:  +, -, 0 - where 0 indicates uncertainty in
                 orientation

6. accession:    accession.version format

7. id:           internal ID

8. type:         designates the type of feature (e.g. scaffold)

9. assembly      this value is used to associate scaffolds with a
                 particular assembly (e.g., reference assembly vs
                 alternate assemblies provided by other groups or
                 representing other strains)

10. weight       weight value for object. For all maps, a lower
                 weight signifies a higher confidence value for the
                 map object.
                 1= finished sequence (Blue in MapViewer)
                 3= WGS sequence (Green in MapViewer)
                 5= Draft sequence (Orange in MapViewer)


scaffold_names file:
--------------------
This file provides alternative names used for the genomic scaffolds in
each specified assembly.

columns:

1:   Assembly label

2:   Genome Center name or na

3:   Genomic RefSeq Accession.version

4:   GenBank Accession.version

5:   NCBI name
     (used prior to assignment of the RefSeq Accession.version).

na: not applicable. na in column 4 indicates that the scaffold 
sequence was revised and that no GenBank version of the scaffold 
exist. This can be due to replacement of foreign contaminants by gaps 
in the RefSeq sequence or a difference in orientation.


=====================================
CHR_## - Chromosome directories
=====================================

The files in the chromosome directories provide concatenated sequence
data for scaffolds that have been assembled from individual GenBank
records.

The order of the scaffolds in these files does not represent their 
order on the chromosome.

The scaffolds in the chromosome FTP directories are the same ones that
are presented on the NCBI Map Viewer; the sequences include the
reference assembly and may include alternate assemblies when
available.

The constructed scaffolds are reference sequences (RefSeq) and are not
part of the GenBank database. GenBank contains archival sequence
records as they were submitted by the producers of the data. See the 
RefSeq web site for more information:
  http://www.ncbi.nih.gov/RefSeq/


=====================================
Assembled_chromosomes directory
=====================================

The files in this directory, and its sub-directories, provide data for
all the top-level objects in each assembly: assembled chromosomes,
unlocalized scaffolds (those scaffolds that are associated with a
specific chromosome but which cannot be ordered or oriented on that
chromosome), unplaced scaffolds (those scaffolds that are not
associated with any chromosome), and in some cases scaffolds from
alternate locus groups or genome patches (see the NCBI Assembly Model
web page for an explanation of these terms:
http://www.ncbi.nlm.nih.gov/genome/assembly/model).

The filenames include the assembly name. To obtain the complete set of
data for an assembly, download all the files for the desired format
that contain the same assembly name. Depending on the particular
assembly, this set may include multiple chromosomes files with names
including a "chr*" term, an unlocalized scaffold file with
"unlocalized" in its name, an unplaced scaffold file with "unplaced"
in its name, and an alternate scaffold file with "alts" in its name.


chr_NC_gi file:
---------------
The chr_NC_gi file provides the accession and gi for the reference
sequence (RefSeq) chromosome records, and any complete chromosomes 
from alternate assemblies.

columns:
 1. chromosome
 2. chromosome accession.version
 3. chromosome gi
 4. assembly name
 5. assembly accession.version


chr_accessions_{assembly name} file:
------------------------------------
The chr_accessions_* file provides the correspondence between the
RefSeq and GenBank records for each chromosome in the assembly.

columns:
 1. Chromosome
 2. RefSeq Accession.version
 3. RefSeq gi
 4. GenBank Accession.version
 5. GenBank gi

na: not applicable. na in column 4 and 5 indicates that the 
chromosome sequence was revised and that no GenBank version of the 
scaffold exist.


unlocalized_ and unplaced_accessions_{assembly name} files:
-----------------------------------------------------------
The unlocalized_* and unplaced_* files provide the correspondence
between the RefSeq and GenBank records for scaffolds that are,
respectively, unlocalized on a chromosome or unplaced. If an assembly
includes scaffolds from alternate locus groups or genome patches, then
accession, version and gi data for these scaffolds is provided in a 
file named alts_accessions_{assembly name}.

columns:
 1. Chromosome (Un, if unplaced scaffold)
 2. RefSeq Accession.version
 3. RefSeq gi
 4. GenBank Accession.version
 5. GenBank gi

na: not applicable. na in column 4 and 5 indicates that the 
scaffold sequence was revised and that the GenBank and RefSeq 
sequences differ or that no GenBank accession was assigned.


seq sub-directory:
------------------
The files in this directory provide assembled sequences for the
chromosomes and other top-level objects in FASTA format. Runs of Ns
are inserted into the chromosome sequence wherever there is a gap in
the scaffold layout, e.g. between scaffolds, at the centromere, at the
telomeres, or at large regions of heterochromatin. The chromosome
coordinates of features placed on chromosomes, as displayed in Map
Viewer or provided in the sequence based map files located in the
/mapview directory, correspond to positions on these assembled
chromosome sequences. The feature coordinates used for unlocalized or
unplaced scaffolds use the coordinate system of each scaffold.

Files with the suffix .fa.gz contain unmasked sequences; files with 
the suffix .mfa.gz contain sequences masked using RepeatMasker (lower 
case) and the results of a screen against foreign sequences (N's).

Each file is named according to the abbreviation for the species,
whether the assembly is the reference assembly (_ref_) or an alternate
assembly (_alt_), the assembly name, and either the chromosome label
or the scaffold group (unlocalized, unplaced, or alts). 


agp sub-directory:
------------------
Files describing, in AGP format, how the chromosomes and other
top-level objects are assembled from their component sequence
records. Filenames follow the convention described for the seq
sub-directory and have the suffix .agp.gz.

columns:

 1:   chromosome, as chr+chromosome designation, or scaffold name
 
 2:   beginning base on chromosome or scaffold
 
 3:   ending base on chromosome or scaffold
 
 4:   fragment number
 
 5:   fragment type (D=Draft, F=Finished, W=Whole genome shotgun
      (WGS) N=NN gap)
 
 6:   if sequence, value = accession.version of the component 
      sequence from which bases are derived
      if N-gap, value = number of N's
 
 7:   if sequence, value = beginning base of component sequence
      if N-gap, value = keyword "fragment"
      {fragment keyword indicates gap between fragments within a
      clone or between fragments of overlapping clones}
 
 8:   if sequence, value = ending base of component sequence
      if N-gap, value=yes - some sort of order and orientation by
      mRNA, EST or BAC end pair
      if N-gap,value=no - no order and orientation between
      flanking fragments
 
 9:   + if accession is positive orientation to chromosome
      - otherwise
      (column 9 for sequence only)


gbs sub-directory:
------------------
Files providing annotation, in GenBank flat file format, for the
chromosomes and other top-level objects. Filenames follow the
convention described for the seq sub-directory and have the suffix
.gbs.gz.


======================================
chr_context_for_alt_loci directory
======================================
For each assembly with alternative loci or patches, the files in 
this directory provide information on the relationship between 
each assembly unit's scaffolds and the primary assembly.

genomic_regions_definitions.txt file:
-------------------------------------
A file defining the regions on the primary assembly for which 
alternate loci or patch scaffolds are available.
The file is tab delimited (including a #header) with the following 
columns:
 1. region_name: name for the genomic region
 2. chromosome: accession.version for the chromosome or 
       unlocalized/unplaced scaffold
 3. start: the starting position on the chromosome or scaffold
       (in 1 base coordinates)
 4. stop: the ending position on the chromosome or scaffold
       (in 1 base coordinates)


alt_scaffolds_placements.txt files:
-----------------------------------
A file associating alternate loci or patch scaffolds with the 
corresponding primary assembly chromosome, providing the location on 
the chromosome, the genomic region name, and the length of any
unaligned tails.
The file is tab delimited (including a #header) with the following 
columns:
 1. alt_asm_name: name of the assembly-unit that includes the 
       alternate scaffold
 2. prim_asm_name: name of the primary assembly-unit on which the
       alternate scaffold is being placed
 3. alt_scaf_name: name of the alternate scaffold being placed
 4. alt_scaf_acc: accession.version of the alternate scaffold being
       placed
 5. parent_type: type of object on which the alternate scaffold is
       being placed, either CHROMOSOME or SCAFFOLD
 6. parent_name: name of the object on which the alternate scaffold
       is being placed (can be either a chromosome or a scaffold)
 7. parent_acc: accession.version of the sequence on which the
       alternate scaffold is being aligned
 8. region_name: name of the genomic region on the parent within 
       which the alterante scaffold is placed
 9. ori: orientation of the alignment, '+', '-' or 'b' (mixed)
 10. alt_scaf_start: start of the placement on the alternate 
       scaffold (in 1 base coordinates)
 11. alt_scaf_stop: end of the placement on the alternate scaffold
       (in 1 base coordinates)
 12. parent_start: start of the placement on the parent sequence
       (in 1 base coordinates)
 13. parent_stop: end of the placement on the parent sequence 
       (in 1 base coordinates)
 14. alt_start_tail: number of bases at the start of the alternate 
       scaffold not involved in the placement
 15. alt_stop_tail: number of bases at the end of the alternate 
       scaffold not involved in the placement

Note: Every alternate scaffold associated with the assembly-unit will
be listed in this file. Any alternate scaffold that has no placement
will have 'na' in columns 5 to 15. Any alternate scaffold that has a 
chromosome assignment, but no alignment, would have the chromosome 
name in column 6 and 'na' in columns 7 to 15.


{scaffold accession.version}_{chromosome accession.version}.asn files:
----------------------------------------------------------------------
Files providing alignments of the alternate loci or patch scaffolds to
the corresponding primary assembly chromosome, in ASN.1 format. These
alignments indicate how the alternate loci and patch scaffold
sequences differ from the chromosomes of the primary assembly.
[Note: some older files do not have versions in the file names.]


{scaffold accession.version}_{chromosome accession.version}.gff files:
----------------------------------------------------------------------
Files providing alignments of the alternate loci or patch scaffolds to
the corresponding primary assembly chromosome, in CIGAR format 
embedded within a GFF format file. These alignments indicate how the
alternate loci and patch scaffold sequences differ from the 
chromosomes of the primary assembly.


===================================
pseudoautosomal_region directory 
===================================
par.txt file:
-------------
A file defining the pseudo-autosomal regions (PARs) when the sequences
of the sex chromosomes in a mammalian genome assembly are known to
include the pseudo-autosomal regions.
The file is tab delimited (including a #header) with the following 
columns:
 1. Chr: chromosome name
 2. PAR-name: name of the PAR region
 3. start: the starting position of the PAR on the chromosome
       (in 1 base coordinates)
 4. end: the ending position of the PAR on the chromosome
       (in 1 base coordinates)

par_align.asn file:
-------------------
File providing the alignment of the PAR region on chromsome X to the
PAR region on chromosome Y, in ASN.1 format.

par_align.gff file:
-------------------
File providing the alignment of the PAR region on chromsome X to the
PAR region on chromosome Y, in GFF3 format.


=====================================
RNA, protein and other directories
=====================================

The RNA and protein directories provide sequence files in three
formats representing all of the mRNA, non-coding transcript, and
protein model reference sequences (RefSeq) exported as part of the
genome annotation process.

In addition, fasta files containing the comprehensive set of Gnomon
predictions are also provided. These correspond to the Map Viewer 
'Model Transcripts' map and include a supported subset that is
instantiated as model RefSeq records (with accession prefix XM_,
XR_, or XP_) and an 'Ab initio' subset that is not instantiated into
model RefSeq. These purely 'Ab initio' models are not assigned 
accession numbers, or tracked between annotation releases. They are an 
experimental dataset. Additional information about this prediction 
program is available at:
  http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.shtml


RNA directory:
--------------
File Name             Format         Contents
---------------------------------------------------------------------
Gnomon_mRNA.fsa.gz    FASTA          transcript predictions
rna.asn.gz            ASN.1          annotated transcripts
rna.fa.gz             FASTA          annotated transcripts
rna.gbk.gz            Flat File      annotated transcripts

protein directory:
------------------
File Name             Format         Contents
--------------------------------------------------------------------
Gnomon_prot.fsa.gz    FASTA          protein predictions
protein.fa.gz         FASTA          annotated proteins
protein.gbk.gz        Flat File      annotated proteins


Accession Format      Molecule       Type
----------------------------------------------------
NM_xxxxxx             mRNA           curated RefSeq*
NR_xxxxxx             transcript     curated RefSeq*
NP_xxxxxx             protein        curated RefSeq*
YP_xxxxxx             protein        curated RefSeq*
XM_xxxxxx             mRNA           model@
XR_xxxxxx             transcript     model@
XP_xxxxxx             protein        model@

* curated RefSeq= these RefSeq records are subject to review and
curation by NCBI's RefSeq staff, and may be updated between annotation
releases. Note that the curation process is ongoing. Note that the 
accession prefix may be followed by either 6 or 9 digits (e.g., 
NM_123456 and NM_123456789).

@ model RefSeq= these RefSeq records are products of the genome
annotation processing and are not subject to curation and updates
between annotation releases. Model RefSeqs represent Gnomon 
predictions that are supported by transcript and/or protein homology.

Additional information about the curated RefSeqs (NM_, NR_, NP_ 
accession prefix) is available at: 
  http://www.ncbi.nlm.nih.gov/RefSeq/
  ftp://ftp.ncbi.nih.gov/refseq/

Additional information about the gene models is available at
  http://www.ncbi.nlm.nih.gov/genome/guide/build.shtml.


other directory:
----------------
File Name                     Format     Contents
---------------------------------------------------------------------
pseudo_without_product.fa.gz  FASTA      pseudogenes without products

This file provides the genomic sequence corresponding to pseudogene
and other gene regions which do not have any associated transcribed
RNA products or translated protein products. It includes annotated
gene regions that require rearrangement to provide the final product,
e.g. immunoglobulin segments. These sequences are not assigned
accession numbers, and are derived directly from the assembled genomic
sequences.


====
GFF
====
The files in this directory provide the features annotated on the 
genomic sequences of the assembly(ies) in GFF version 3 format, 
according to specifications version 1.20 at:
http://www.sequenceontology.org/gff3.shtml

{alt,ref}_{assembly_name}_scaffolds.gff3.gz
-------------------------------------------
Features annotated on {assembly_name} in scaffold coordinates.

{alt,ref}_{assembly_name}_top_level.gff3.gz
-------------------------------------------
Features annotated on {assembly_name} in top-level object coordinates.
The top-level objects are: assembled chromosomes, unlocalized 
scaffolds (those scaffolds that are associated with a specific 
chromosome but which cannot be ordered or oriented on that 
chromosome), unplaced scaffolds (those scaffolds that are not
associated with any chromosome), and in some cases scaffolds from
alternate locus groups or genome patches (see the NCBI Assembly Model
web page for an explanation of these terms:
http://www.ncbi.nlm.nih.gov/genome/assembly/model).


===========
GFF_interim
===========
The files in this directory provide interim updates to the annotation 
release in GFF version 3 format.
They contain features projected from the current RefSeq transcripts 
and curated genomic sequences (with accession prefixes NM_ or NR_, 
and NG_ respectively) placed on the latest assembly version at a 
given date. The current RefSeqs include transcript variants that are 
new or have been updated since the last full annotation. The latest 
assembly version may include additional or updated genome patches 
compared to the assembly version used for the full annotation.
See the NCBI Assembly Model web page for an explanation of the term 
genome patch:
http://www.ncbi.nlm.nih.gov/genome/assembly/model

Interim updates do not include Gnomon gene predictions.

interim_{assembly_name}_scaffolds.gff3_{freeze_date}.gz 
-------------------------------------------------------
Features projected on {assembly_name} from RefSeq sequences retrieved 
from Entrez on {freeze_date}, in scaffold coordinates.

interim_{assembly_name}_top_level.gff3_{freeze_date}.gz
-------------------------------------------
Features projected on {assembly_name} from RefSeq sequences retrieved 
from Entrez on {freeze_date}, in top-level object coordinates.


======
Gnomon
======
The files in this directory provide the Gnomon models predicted on 
the genomic sequences of the assembly(ies) in GFF version 3 format, 
according to specifications version 1.20 at:
http://www.sequenceontology.org/gff3.shtml

These models correspond to the Map Viewer 'Model Transcripts' map and
include a supported subset that is instantiated as model RefSeq 
records (with accession prefix XM_, XR_, or XP_) and an 'Ab initio' 
subset that is not instantiated into model RefSeq. These purely 
'Ab initio' models are not assigned accession numbers, or tracked 
between annotation releases. They are an experimental dataset. 
Additional information about this prediction program is available at:
  http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.shtml

{alt,ref}_{assembly_name}_gnomon_scaffolds.gff3.gz
--------------------------------------------------
Gnomon models predicted on {assembly_name} in scaffold coordinates.

{alt,ref}_{assembly_name}_gnomon_top_level.gff3.gz
--------------------------------------------------
Gnomon models predicted on {assembly_name} in top-level object 
coordinates. The top-level objects are: assembled chromosomes,
unlocalized scaffolds (those scaffolds that are associated with a
specific chromosome but which cannot be ordered or oriented on that
chromosome), unplaced scaffolds (those scaffolds that are not
associated with any chromosome), and in some cases scaffolds from
alternate locus groups or genome patches (see the NCBI Assembly Model
web page for an explanation of these terms:
http://www.ncbi.nlm.nih.gov/genome/assembly/model).


=====================================
mapview directory
=====================================

This directory contains assembly and annotation data used to provide
the displays available in the human Map Viewer:
http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606

Most of the files in this directory contain headers that document the
content of the fields in each file. Additional information on some
files is provided below.

org_transcript.gff.gz and zoo_transcript.gff.gz files
-----------------------------------------------------
These files provide cDNA-to-Genomic, or spliced sequence
alignments. These files include same-species and cross-species
alignments, respectively. Alignments are generated via the Splign
alignment tool: 
http://www.ncbi.nlm.nih.gov/sutils/splign
Information on indels has not been included.

The file format is GFF version 3 according to specifications version
1.07: 
http://song.sourceforge.net/gff3.shtml

The content is in chromosomal coordinates or scaffold coordinates for
unplaced scaffolds. The accession.version of a genomic reference
sequence (NCBI RefSeq) is used as the value of the GTF/GFF 'seqid'
column. (Examples of accession.version are NC_* or AC_* for
chromosomes and NW_* or NT_* for scaffolds.) The genome assembly and
chromosome names for the chromosome sequences can be obtained from the
file Assembled_chromosomes/chr_NC_gi. Likewise, the file
mapviewer/seq_contig.md.gz provides the genome assembly and chromosome
assignment, if any, for the unplaced scaffolds.

These files replace org_transcript.gtf.gz and zoo_transcript.gtf.gz
which were in a format compatible with GFF version 2 and GTF.


=====================================
Mapping_data directory
=====================================

This data is a link to the UniSTS ftp site containing non-sequence 
based mapping information for human STS.


=====================================
ARCHIVE directory
=====================================

This directory is provided to maintain archival annotation release 
data.


======================================
FOSMIDS directory
======================================

Directory for FOSMID sequence data.


=====================================
File extensions
=====================================

File extensions impart information about the file format as follows:

*.asn.gz = ASN.1 file, print form

*.fa.gz  = FASTA file format, compressed

*.fsa.gz = FASTA file format, compressed

*.mfa.gz = masked FASTA file format, compressed 
           (repeats identified with RepeatMasker are lower case and 
           foreign spans are replaced with N's). 

*.gbk.gz = GenBank flat file format (annotation + sequence), 
           compressed

*.gbs.gz = GenBank summary file format (annotation only), compressed
           The *.gbs file format does not contain sequence data, 
           but instead contains a "CONTIG" field showing how the 
           scaffold or chromosome is assembled from its components.

*.gff3.gz = GFF version 3 file format, compressed


=====
Notes
=====
* The annotations in the *.gbk and *.gbs files currently include genes,
conserved protein domains, as well as microRNAs, defined by sequence 
obtained from miRBase (Griffiths-Jones S, Grocock RJ, van Dongen S, 
Bateman A, Enright AJ. 2006. Nucleic Acids Res. 34:D140-D144) and 
placed by Splign (Kapustin Y, Souvorov A, Tatusova T and Lipman D. 
2008. Biology Direct 3:20), and tRNA features annotated by 
tRNAscan-SE (Lowe TM, Eddy SR. 1997. Nucleic Acids Res. 25:955-64).

* Variation data from the most recent dbSNP build can be obtained from 
the dbSNP FTP site:
 ftp://ftp.ncbi.nih.gov/snp/

* Gene symbols in this directory are not updated with every update
to Entrez Gene. Suggestions for how to convert a set of GeneIDs into
current symbols and names is provided in this FAQ from Entrez Gene:

http://www.ncbi.nlm.nih.gov/entrez/query/static/help/genefaq.html#faq_g4

######################################################################
