Skip to main content

Table 2 Complementary of the methods used as input for the integrative analyses

From: Analysis of 3760 hematologic malignancies reveals rare transcriptomic aberrations of driver genes

Input

Method

Goal

Limitation

Genetic variants

IntOGen

• Detect different types of the recurrence of genomic alterations in genes

• Combine seven tools that cover multiple aspects of cancer driver gene detection

• Focus only on genetic variants

• Focus only on single nucleotide variants and short indels, while structural variants, epigenetic silencing events, and germline susceptibility variants are not considered

Genetic variants

AbSplice

• Estimates the probability for a genetic variant to cause aberrant splicing

• Integrates deep learning sequence-based models (SpliceAI and MMSplice) with quantitative maps of splicing levels in tissues of interest (SpliceMap)

• It can be used to trace RNA-seq-based aberrant splicing calls back to the genomic-level variant

• For deep intronic variants, AbSplice performs not as well as near splice site variants

• SpliceMaps need to be created if new tissue or cell types are added

RNA-seq

OUTRIDER

• Detects RNA expression outlier, independently of genetic variants

• Accounts for covariations using denoising autoencoder

• Applies only to genes typically expressed in the considered cohort. Fails at calling activation of otherwise not expressed genes

• Sufficiently large cohort is required (> 60 samples) to detect outliers reliably

RNA-seq

NB-act

• Detects aberrantly activated genes in RNA-seq data, which complements OUTRIDER

• In comparison to underexpression outliers for which NMD-triggering variants provide orthogonal ground truth, benchmarking data based on rare variant annotation is less certain for gene activation

RNA-seq

FRASER

• Detects aberrantly spliced genes in RNA-seq data

• Accounts for sources of covariation using denoising autoencoder

• Intron-centric: no prior annotation needed, and does not require building clusters of introns sharing splice sites, which can get prohibitively big and lead to modeling complications

• Sufficiently large cohort is required (> 50 samples) to detect outliers reliably

• Can overlook some genuinely pathogenic isoforms, especially rapidly degraded splice isoforms