Construction of Transcriptomic Database of Phalaenopsis and Gene Expression Profiling Studies

Being one of the largest families in Angiosperm, Orchidaceae displays a great biodiversity resulted from adaptation to diverse habitats. Genomic information of orchids is rather limited regardless of their unique and interesting biological features, thus impeding advanced molecular research. Here we report a strategy to integrate sequence outputs of the moth orchid, Phalaenopsis aphrodite, from two high-throughput sequencing platform technologies, Roche 454 and Illumina/Solexa, in order to maximize assembly efficiency. Tissues collected for cDNA library preparation included wide range of vegetative and reproductive tissues. We also designed an effective workflow for annotation and functional analysis. After assembly and trimming processes, 233,823 unique sequences were obtained. Among them, 42,590 contigs averaging 875 base pairs in length were annotated to protein-coding genes, of which 7,263 coding genes were found to be near full length. Sequence accuracy of assembled contigs was validated to be as high as 99.9 %. Genes of tissue-specific expression were also categorized by profiling analysis with RNA-Seq. Gene products targeted to specific subcellular localizations were identified by their annotations. We concluded that, with proper assembly to combine outputs of next generation sequencing platforms, transcriptome information can be enriched in gene discovery, functional annotation and expression profiling of a non-model organism.

 

Co-researchers:Chun-Lin Su, Ya-Ting Chao