wFleaBase | BLAST | BioMart | GBrowse Maps | Genomics | Help
[?]

Daphnia pulex Genes 2.0


This annotated list dpxmix19_tilede_effects.txt has some of the obvious DE cases, selected as the strongest DE (M value) from new genes and UTR/ncRNA regions in Cadmium, Chaborus and Mixed Metals. See also this shorter dpxmix19_tilede_interesting.txt list of about 40 most interested new genes and ncRNA regions for Cadmium, Mixed metals and Chaborus. Maps of expression for some of these interesting cases are shown in folder dpxmix19_tilede_interestpix/.

Differential expression quandry: Most DE in Unknown region

Newly predicted genes resolves this high unknown DE .. in favor of Untranslated regions of genes and non-coding RNA genes (maybe...).
The major distinction of Unknown regions having high DE for +Male, +Metals, and Chaoborus have shifted in large part to DE in UTR regions. Some small increase in genes CDS regions are also seen with the new predictions.

2007: Genes 1.0 (JGI_V11) Genome DE
Showing differential expression partitioned into Gene, Intron or Unknown regions of genome, as percent of genome.
2010: Genes 2.0 (Best3) Genome DE
Showing differential expression partitioned into Gene (CDS), UTR, Intron or Unknown regions of genome, as percent of genome.
Is this real biology? Predictions of UTR are not as confident as for coding exons/genes. And much of this UTR DE is for somewhat lower levels of expression than for CDS. The new gene predictions use all new evidence, esp. genome tiling, and thus recover much of the Uknown tile expression regions as gene models. Often these become attached to nearby coding genes as UTR, sometimes very long UTR as long as the coding gene. This may be artifacts of prediction.



Expression MA density plots for Genes 2.0

These density plots show the relative frequency of differential expression (M value, Y axis) versus absolute expression intensity (A value, X axis). They help understand that the rather large DE in UTR regions is consistent across treatment types, and associated with somewhat higher absolute expression.

Sex Coding GenicSex UTR Metal Coding GenicMetal UTR
Cadmium Coding GenicCadmium UTR Chaoborus Coding GenicChaoborus UTR


Example gene maps

High +Cadmium DE gene in center is also +MixedMetals (gene 4), +MixedMetals DE UTR/ncRNA gene on left (gene 2), and +Male DE UTR/ncRNA gene on right (gene 6).

Six tandem Chorion peroxidase genes are better resolved in Genes 2.0. Two on right (genes 5,6) show NO Male, and weak Chaoborus expression. One on left (gene 1) shows stronger +Male expression, and strong Juvenile (Cha/Cha-con). Genes 3,4 in middle show +Male equal or more than Female, and Gene 4 also shows +Chaborus.

This annotated list dpxmix19_tilede_effects.txt has some of the obvious DE cases, selected as the strongest DE (M value) from new genes and UTR/ncRNA regions in Cadmium, Chaborus and Mixed Metals. See also this shorter dpxmix19_tilede_interesting.txt list of about 40 most interested new genes and ncRNA regions for Cadmium, Mixed metals and Chaborus. Maps of expression for some of these interesting cases are shown in folder dpxmix19_tilede_interestpix/.


Differential expression in gene sets, 2007 - 2010

  Genes with any DE on CDS, all treatments
DE      JGI  Gnomon   Best3   Change (b3-jgi)
cad+    148    130     146   -2 
cad-     84     56      76   -10
cha+    438    389     562   +130    
cha-    567    530     579   +10
met+    696    574    1376   +650
met-   2223   2064    2363   +100
sexf   4196   4089    4662   +450 
sexm   2894   2569    3454   +550 
nul   23226  30273   38779 

Daphnia Genes 2.0 Quality Statistics

Details of Daphnia Genes 2.0 Quality Statistics in daphnia_genes2010_beta3.readme.txt

Notable stats for Best3 vs JGI_V11 genes:
  -- Best3 recovers 82% of protein homology, versus only 42% for JGI
  -- Best3 recovers 90% of EST, versus 75% for JGI
  -- Best3 recovers 66% (5Mb) of 7.6 Mb tile expression not in JGI set (tar_genes)

Best3/beta3 (dpulex_aug26_mixin19h):
  48060 mRNA in mixin19h set, no alt-transcripts yet
  42893 mRNA have evidence (homology, est, parology or tile expression)
   5167 mRNA have no evidence, but protein >= 40aa minimum
   5661 mRNA are non-coding by weak cds criteria (protein <40aa or <30% coding)

  22717 have protein homology (e-value <= 1e-5)
  18681 have EST evidence
  12409 have differential expression
   6079 have tar-gene expression (outside JGI set)
  10693 have protein paralogy only (e-value <= 1e-5)
Map views are at http://server7.wfleabase.org/cgi-bin/gbrowse/daphnia_pulex8/
Tracks Prediction/Best3
see also evidence tracks
DGC Tile expression/ Nimblegen + JGI exons, Cadmium xy and Nimblegen 0810
Protein_Analysis/Arthropod genes
EST assembly (Dpulex) and Dmagna EST 09

Details of Tile DE tables

Here is how I created and annotated the list dpxmix19_tilede_effects.txt of some obvious DE cases. There are more that can be found and some may be more obvious DE cases to work on.

At this directory, http://server7.wfleabase.org/prerelease2/gene-predictions/daphnia_genes2010_tilex/
this folder matab19_1004/
has tables of expression A and M (DE) values, and DE significance (p-value) for 4 treatments (cad, cha, met, sex) and five feature types:

  1. gene    : coding sequence of Genes 2.0,
  2. exonutr : untranslated regions of Genes 2.0, including UTR of coding genes and ncRNA
  3. intron  : of genes 2.0
  4. taru    : TAR regions not covered by Genes 2.0
  5. unann   : regions with expression (A) not in an of the above features.

Tables are in this format
ID                part  ntiles      A     M           tmean       ndf   pmean
hxAUG25p1s1g100t1 utr   54    1.94029     0.12018     0.46454     95    0.64332
hxAUG25p1s1g101t1 utr   39    1.60615     0.06057     0.27037     65    0.78773
hxAUG25p1s1g103t1 utr   105   0.49133     -0.04401    -0.15340    167   0.87826
For Cadmium, see these tables (I think you can ignore the intron, unann ones):
exonutr.txmastat104.cad.tab.gz gene.txmastat104.cad.tab.gz taru.txmastat104.cad.tab.gz intron.txmastat104.cad.tab.gz unann.txmastat104.cad.tab.gz

See also the MA plots showing density of these effects, e.g. introns have little A or M, while gene and exonutr have A up to 6 or so (log scale) and M in -2 to 2 range.

I sorted these tables by highest M value and pulled out the top 20 or so to annotate. E.g. for +Cad in new coding genes not in JGI gene set

gzcat gene.txmastat104.cad.tab.gz | perl -ne\
's/(\d\.\d+e\-\d+)/0/g; ($g,$tp,$nt,$a,$m,$tm,$nd,$p)=split; \
$am=sprintf "%.3f", $a - abs($m)/2; s/\-?[\d\.]+\t(\d+)\t([\d\.]+)$/$am\t$2/; print if ($nt>8);'\
| sort -k5,5nr | ggrep -v -F -f $dpxe/evda/mixin19.jgifull.ids - | head -20
For +Cad in UTR regions
gzcat exonutr.txmastat104.cad.tab.gz | perl -ne\
's/(\d\.\d+e\-\d+)/0/g; ($g,$tp,$nt,$a,$m,$tm,$nd,$p)=split; \
$am=sprintf "%.3f", $a - abs($m)/2; s/\-?[\d\.]+\t(\d+)\t([\d\.]+)$/$am\t$2/; print if ($nt>4);'\
| sort -k5,5nr | head -20
MS Excel should be able to sort results like this also. You will find analogous tile DE results for the JGI gene set (1.1) here:
http://server7.wfleabase.org/prerelease2/dpxtiles0907/matab.ind/

Analysis methods used for tile expression are outlined here http://server7.wfleabase.org/prerelease2/dpxtiles0907/dpxtilegene09-recipe.txt


Tile DE tables for Daphnia Genes 2.0

matab19_1004/ : Tables of DE statistics for Genes (CDS), UTR, intron and tar regions
destat/ : Table and plots of Genome DE partitions (see above)
detile_tar/ : TAR locations calculated from runs of significant tile DE (minrun=3 tiles)
      Name                            Last modified       Size  Description

[DIR] Parent Directory 19-Mar-2014 20:00 - [DIR] matab19_1004/ 10-Apr-2010 16:25 - [TXT] dpxtilegene_stats.R 10-Apr-2010 15:33 34k [DIR] dpxmix19_tilede_interestpix/ 21-Apr-2010 13:58 - [TXT] dpxmix19_tilede_interesting.txt 20-Apr-2010 16:35 8k [TXT] dpxmix19_tilede_effects.txt 20-Apr-2010 17:13 30k [DIR] dpx_genes2_pix/ 10-Apr-2010 16:40 - [TXT] dpx_genes2_info.html 21-Apr-2010 14:02 11k [DIR] detile_tar/ 19-Apr-2010 14:19 - [DIR] destat/ 10-Apr-2010 17:53 - [DIR] dernaseq/ 24-Dec-2010 15:32 - [   ] daphnia_locgene26mx19.tab.gz 08-Apr-2010 15:49 27.0M