作者:Ye, CT; Long, YQ; Ji, GL; Li, QQ; Wu, XH
影响因子:5.48
刊物名称:BIOINFORMATICS
出版年份:2018
卷:34(11) 页码:1841-1849
Motivation: Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 30 ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites.
Results: We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 30 UTRs and 30 UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome.
Figure S1. Schema of APAtrap. (a) Sliding window strategy for refining annotated 3’ UTRs or identifying novel 3’ UTRs. If no annotated 3’ UTR is
available for the given gene model, novel 3’ UTR will be identified,otherwise, the annotated 3’ UTR could be shortened or lengthened according to the read coverage.The 3’ UTR extension is the extended 3’ UTR portion that was identified for a gene with prior annotated 3’ UTR. Refined 3’ UTRs could be novel 3’ UTRs identified for genes without prior annotated 3’ UTRs or shortened/lengthened 3’ UTRs that were identified for genes with prior annotated 3’ UTRs. (b) Identification of APA sites based on the mean squared error model. (c)Detection of differential APA site usage between two samples. The PD index and the linear trend test that considers the coordinates and expression levels of all predicted poly(A) sites are used for identifying significant genes with differential APA site usage.