Help : Gene Alignment Help
Contents
What are the alignment data?
Map positions are assigned to identifiers using the NCBI genome assembly, hg17 NCBI
Build 35, accessed through the UCSC
genome browser, GoldenPath (May 2004 freeze). Each position is
associated with a GenBank accession number. An accession may
have 1 to many genomic positions within GoldenPath, however, there are
generally 2 mappings, one on the positive strand and one on the
negative strand.
The data that are returned to the user for unconsolidated positions
include the following:
- Identifier: The input value provided by user to query
the genome.
- Accession: The accession number that the identifier has
been mapped to. In the case that there are multiple accessions for an
identifier, an individual row will exist for each accession.
- Target Start: Alignment start position in target chromosome.
- Target End: Alignment end position in target chromosome
- Strand: + or - for chromosome strand.
- Chromosome: Target sequence name
- Q Start*: Alignment start position in chromosome
- Q Size*: Query sequence size
In the qStart/qEnd fields the coordinates are where it matches
from the point of view of the forward strand. For more information,
see the
UCSC site.
How are clones aligned?
A clone is aligned by first being mapped to its associated GenBank
accessions via DBest. These accessions are
then mapped to the genome via the UCSC data. Each clone can map to 1
to many GenBank accessions.
How are genes aligned?
A gene is first mapped to a Unigene Cluster ID and then the accessions
are returned that map to that cluster. There are generally many
accessions mapped to one cluster
How are the data consolidated?
There are several steps taken when data are returned in a consolidated
position.
- The identifier is mapped to Genbank accessions.
- If the group of accessions that are associated with the identifier
map to more than 1 chromosome, the data are thrown out. You will need
to use the unconsolidated mapping position in order to see all
positions for this identifier.
- Next, the largest end position for all the accessions aligned
to the identifier is compared with the smallest start position. If
this distance is greater than the maximum distance between querries
that you have selected (by default this is 1,000,000 bases) then the
data re discarded.
- The query size that is returned, in the case that the data are not
discarded, is the distance between the smallest accession start
position and the largest accession end position.
- Please note, that for most genes, since there are so many
accessions that map to a cluster, this option is not recommended, as
it will generally not return any results due to the fact that one of
the accessions might be mapped to a different (potentially erroneous) chromosome.