ESTS: Investigators are working diligently to sequence and assemble the genomes of various organisms, including the mouse and human, for a number of important reasons. Although important goals of any sequencing project may be to obtain a genomic sequence and identify a complete set of genes, the ultimate goal is to gain an understanding of when, where, and how a gene is turned on, a process commonly referred to as gene expression. Once we begin to understand where and how a gene is expressed under normal circumstances, we can then study what happens in an altered state, such as in disease. To accomplish the latter goal, however, researchers must identify and study the protein, or proteins, coded for by a gene.
WHAT ARE ESTS AND HOW ARE THEY MADE?
ESTS are small pieces of dna sequence usually 200-500 nucleotides hat are generated by sequencing either one or both ends of expressed genes sequence bits of dna that represent genes are expressed in certain cells tissues organs and uses tags which is a portion of chromosomal dnafor identifying genes from genemic sequence organisms depends on genome size as well as prescence or absence of introns
Using mrna to generate cdna:
Gene identification is difficult in human most of our genome is composed of introns and relatively few coding sequence or genes. These genes are expressed as proteins.Each gene must be converted to mrna rna that serves as templetes for protein synthesis mrnas in the cells do not contain sequences from the regions between genes nor from the non coding introns that arer present with in many genes
From cdnas to ESTS:
Once cDNA representing an expressed gene has been isolated, scientists can then sequence a few hundred nucleotides from either end of the molecule to create two different kinds of ESTs. Sequencing only the beginning portion of the cDNA produces what is called a 5 EST. A 5' EST is obtained from the portion of a transcript that usually codes for a protein. These regions tend to be conserved across species and do not change much within a Gene family. Sequencing the ending portion of the cDNA molecule produces what is called a 3 est. Because these ESTs are generated from the 3' end of a transcript, they are likely to fall within non-coding, or untranslated regions (UTRs), and therefore tend to exhibit less cross-species conservation than do coding sequences.
An est is a tiny portion of entire gene that can be used to help identify unknown genes and to map their positions with in a genome
ESTS AS GENOME LANDMARKS
Because ESTs represent a copy of just the interesting part of a genome, that which is expressed, they have proven themselves again and again as powerful tools in the hunt for genes involved in hereditary diseases. ESTs also have a number of practical advantages in that their sequences can be generated rapidly and inexpensively, only one sequencing experiment is needed per each cDNA generated, and they do not have to be checked for sequencing errors because mistakes do not prevent identification of the gene from which the EST was derived
To find a disease gene using this approach, scientists first use observable biological clues to identify ESTs that may correspond to disease gene candidates. Scientists then examine the DNA of disease patients for mutations in one or more of these candidate genes to confirm gene identity. Using this method, scientists have already isolated genes involved in Alzheimer's disease, colon cancer, and many other diseases. It is easy to see why ESTs will pave the way to new horizons in genetic research.
ESTS AND NCBI
Because of their utility, speed with which they may be generated, and the low cost associated with this technology, many individual scientists as well as large genome sequencing centers have been generating hundreds of thousands of ESTs for public use. Once an EST was generated, scientists were submitting their tags to genebank , the NIH sequence database operated by NCBI. With the rapid submission of so many ESTs, it became difficult to identify a sequence that had already been deposited in the database. It was becoming increasingly apparent to NCBI investigators that if ESTs were to be easily accessed and useful as gene discovery tools, they needed to be organized in a searchable database that also provided access to other genome data. Therefore, in 1992, scientists at NCBI developed a new database designed to serve as a collection point for ESTs. Once an EST that was submitted to GenBank had been screened and annotated, it was then deposited in this new database, called dbest.
DBEST:A DESCREPTIVE CATALOG OF ESTS:
Scientists at NCBI created dbEST to organize, store, and provide access to the great mass of public EST data that has already accumulated and that continues to grow daily. Using dbEST, a scientist can access not only data on human ESTs but information on ESTs from over 300 other organisms as well. Whenever possible, NCBI scientists annotate the EST record with any known information. For example, if an EST matches a DNA sequence that codes for a known gene with a known function, that gene's name and function are placed on the EST record. Annotating EST records allows public scientists to use dbEST as an avenue for gene discovery. By using a database search tool, such as NCBI’s BLAST, any interested party can conduct sequence similarity searches against dbEST.
UNIGENE: A NON-REDUNDANT SET OF GENE-ORIENTED CLUSTERS
Because a gene can be expressed as mRNA many, many times, ESTs ultimately derived from this mRNA may be Redundant. That is, there may be many identical, or similar, copies of the same EST. Such redundancy and overlap means that when someone searches dbEST for a particular EST, they may retrieve a long list of tags, many of which may represent the same gene. Searching through all of these identical ESTs can be very time consuming. To resolve the redundancy and overlap problem, NCBI investigators developed the UniGene automatically partitions GenBank sequences into a non-redundant set of gene-oriented clusters. Although it is widely recognized that the generation of ESTs constitutes an efficient strategy to identify genes, it is important to acknowledge that despite its advantages, there are several limitations associated with the EST approach. One is that it is very difficult to isolate mRNA from some tissues and cell types. This results in a paucity of data on certain genes that may only be found in these tissues or cell types.
Second is that important gene regulatory sequences may be found within an intron. Because ESTs are small segments of cDNA, generated from a mRNA in which the introns have been removed, much valuable information may be lost by focusing only on cDNA sequencing. Despite these limitations, ESTs continue to be invaluable in characterizing the human genome, as well as the genomes of other organisms. They have enabled the mapping of many genes to chromosomal sites and have also assisted in the discovery of many new genes