[Coral-List] Acropora tenuis transcriptome

Wed Feb 1 19:08:24 EST 2012

Dear colleagues,

We are pleased to announce the pre-publication release of the Acropora tenuis transcriptome, which can now can be downloaded from Matz lab website: 

http://www.bio.utexas.edu/research/matz_lab/matzlab/Data.html 

As always, according to our data sharing policy, we place no restrictions on the use of the data, and do not ask for a coauthorship on papers using the data. 

Our own intended use of the data is comparative analysis of natural selection across coral transcriptomes, so if you have similar interests, we would appreciate if you contact us. We would be very glad to collaborate. 

We believe that rapid (prior to publication) unconditional release of high-throughput sequencing data can immediately help many labs working on various  aspects of coral biology, and can greatly facilitate the collaborations within the coral research community. We hope that everyone involved in coral genomics will consider following this policy.

Have fun with the data!

Eli Meyer
Line Bay
Misha Matz

----------------------------------------------

A.tenuis transcriptome summary:

source: 5d old larvae + 12d old juveniles, bulk culture (multiple parents)  
Sequencing platform: 454 GS-FLX Titanium
Library: normalized cDNA, prep according to Meyer et al 2009 BMC Genomics 10:219
(current version of the protocol: http://www.bio.utexas.edu/research/matz_lab/matzlab/Methods_files/cDNAlibraryforTitanium454protocol%2012-18-9.pdf )

Number of reads: 451,088
Number of bases: 137,243,747

Assembly with Newbler v.2.6, with option -urt ("use read tips")

Statistics on assembled sequences:
-------------------------
54060 sequences.
567 average length.
4809 maximum length.
100 minimum length.
N50 = 678
30.6 Mb altogether (30644819 b).
mean coverage: 3.75
-------------------------

Non-redundant singletons:
-------------------------
35131 sequences.
276 average length.
1053 maximum length.
100 minimum length.
N50 = 332
9.7 Mb altogether (9695368 b).
-------------------------

Annotation:
The sequences were annotated according to blastx matches to UniProt database version 2010_09, excluding poorly annotated hypothetical proteins from large-scale sequencing projects. The gene name and GO annotations were extracted from the best hits with e-value equal to or better than e-4.

number of annotates sequences: 56,326
number of unique annotations: 12,894

Files provided in the compressed archive:

Aten_jan2012.fasta : fully annotated fasta records
gene_hits.tab : number of sequences matching to each unique annotation
gene_table.tab :  sequences to gene names table
GO_table.tab :  sequences to GO categories table