HOMOLOGY BASED GENE PREDICTION
Gene finding typically refers to the area of computational biology, that is concerned with algorithmically identifying stretches of sequence, usually genomic DNA that are biologically functional. This especially includes protein-coding genes, but may also include other functional elements such as RNA genes and regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.
In its earliest days, "gene finding" was based on painstaking experimentation on living cells and organisms. Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. Today, with comprehensive genome sequence and powerful computational resources at the disposal of the research community, gene finding has been redefined as a largely computational problem.
Extrinsic evidence is in the form of known sequence as a mrna or protein product given mrna sequence it is a trivial to derive a unique genomic dna sequence from which it had to be transcribed
Blast is the widely used system designed for this purpose.this approach requires extensive sequencing of mrna and protein products in order to collect extensive evidences in a complex organism many hundreds or thousands of different cell types must be studied
HOMOLOGS: features in species being compared that are similar because they are ancestrally related based on genome pattern the homologous gene finding can be identified as:
1.Genome pattern based on number of nucleotides
2. genome size based on nucleotide base pairs
3.percentage of similarity between two similar species.
AB initio approaches
Ab initio finding might be more accurately characterized as gene prediction in genomes of prokaryotes genes have specific and relatively well understood promoter sequence the sequence coding for a protein occurs as one contiguous open reading frame which is typically many hundreds or thousands of base pairs long. protein coding dna has certain periodicities ab intio genes finding in eukaryotes especially complex organisms like humans.
First is the promoter and other regulatory signals.in these genomes are more complex than prokaryotes two classic ex of signals identified by eukaryotic gene finders are CPG islands and binding sites for a ploy tail
COMPARITIVE GENOMICS APPROACHES:-
Comparative gene finding can also be used to project high quality annotations from one genome to another. Notable examples include Projector, Gene Wise and GeneMapper. Such techniques now play a central role in the annotation of all genomes.
1. comparative methods: They are also called extrinsic. They include two strategies: those that use homologies with sequences from other genes, also called homology-based and those that make comparisons with genomic sequence from other genomes, also called comparative
o Homology-based: these methods predict a gene using the alignment of a protein (or RNA sequence in the form of full-length mRNA, cDNA or EST) with the genome sequence that we want to annotate. The known sequence (also called evidence) guides the prediction. There are several ways to achieve this: the simplest way consist in accepting the alignment of the known sequence to the genome as the gene prediction. More sophisticated methods use the known sequence as a guide and try to complete the evidence into a complete gene structure. The efficacy of this method depends on the number of known gene sequences, hence it is limited by how complete the biological databases are.
o Comparative-genomics-based: These methods are based in the hypothesis that the sequences conserved between to genomes relatively close to each other are functional, and therefore, possibly coding for a gene.
Mice and humans (indeed, most or all mammals including dogs, cats, rabbits, monkeys, and apes) have roughly the same number of nucleotides in their genomes -- about 3 billion base pairs.
AB INTIO GENE FINDING:-
In the first approach we will use all the ab intio tools from the gene prediction section and compare the result of the three programs .the result of gene finding program in order to compare the coordinates of predicted exons.
STEP 1:-Analysing the human sequence
In order to use Geneid follow these steps
1:connect to the geneid server
2:paste the dna sequence
3;select the oraganiosm (human)
In order to use genescan follow these steps
1:connect to the genescan
2;paste the dna sequence
3:select the oraganism(vertebrate)
4:run gene predictions
Now,make the production in the mouse sequence,with all the ab initio programs