Daphnia New tile array expression predicted genes, 2008 February, D. Gilbert Data at http://microbe.bio.indiana.edu:7182/prerelease2/gene-predictions/ with methods at http://microbe.bio.indiana.edu:7182/data/dpx-augtrials/ Summary We have genome tile array expression that has exon/intron level resolution. I'ved used those data with Augustus gene prediction as coding hints,' to call the genes that all the predictors are missing, but that have good expression signals. What I get that way is about 30% additional CDS sequence, although the gene models produced with tile expression hints are not as good as other ones (shorter). Using tile expression as CDS hints recovers most, but not all, of the expression region hints (depending on weights) but results in short/partial gene models. Poorer gene models than no hints, and increasing weight to force predictions reduces gene model quality. # Augustus with TAR, augmap19.gff.gz, all scaffolds CDSbases aug19 : ntr=56928, n=197323, m=214.95, cds=42413860, tb=162548203, c/t=0.261 ( CDS overlapping JGI genes: 29511951, no-overlap JGI genes: 12901909 ; 42% additional) # Daphnia v1.1 genome gene set coding sequence bases / total genome bases CDSbases JGI_V11 : ntr=30940 n=142754, m=211.28, cds=30160786, tb=174233412, c/t=0.173 CDSbases Gnomon : ntr=37466 n=151668, m=237.45, cds=36014074, tb=200738384, c/t=0.179 Key: ntr: number transcripts; n:number CDS-exons; m:mean exon size; cds:cds bases, tb:total gene-region genome bases, c/t: cds/total ratio Using Daphnia tile expression data adds ~ 40% of new CDS, e.g. daphnia c/t=0.26 John Manak's study (Nature genetics, 2007, doi:10.1038/ng1875) w/ Drosmel tile array expression suggests 30% transcription outside predicted genes, e.g. drosmel c/t=0.24 versus drosmel known genes c/t=0.18 Note however this study found only a small portion in intergenic regions. Due to daphnia's short introns and early gene prediction set, most of Daphnia's unfound genes are between current gene predictions. I also matched these new TAR predicted genes to known genes, and to Daphnia ESTs. Only 10% of Dpx AugTar19 proteins are found in NCBI-nr database (eval<1e-3) EST support 12% of the new AugTar19 genes. Top homology species 311 Insects 175 Nasonia + Apis 82 Tribolium 50 Aedes + Drosophila 215 Aquatic (Zebrafish, Sea urchin, anemone) 238 Bacteria 132 Transposon genes 577 Other 311 Insects 108 Nasonia 82 Tribolium 65 Apis 32 Aedes 24 Drosophila 215 Aquatic 93 Zebrafish (Danio) 90 Sea urchin (Strongylocentrotus) 32 Sea anemonae (Nematostella (sea anemonae) 132 Transposon genes 238 Bacteria 59 Acidovorax (soil bacteria) 51 Verminephrobacter (bacteria) 43 Pseudomonas (bacteria) 34 Methylibium (soil bacteria) 26 Delftia (bacteria) 25 Rhodoferax (bacteria) 577 Other Top Homology Descriptions Count Name 262 Hypothetical protein 49 Transposase 19 transcriptional regulator 11 Peptidase 11 unnamed protein product 9 Orf2-encoded protein 8 Orf2-encoded protein 7 major facilitator superfamily mfs_1 7 orf2-encoded protein 6 Osjnba0011f23.1 6 ankyrin repeat protein 5 Lreo_3 5 abc transporter related 4 Conserved Hypothetical protein 4 Of unknown function duf6 4 abc membrane transporter 4 heavy metal translocating p-type atpase 4 inner-membrane translocator 4 tonb-dependent siderophore receptor 4 two component transcriptional regulator .. #---------------------