Bioinformatics Analysis

A. Illumina Sequencing Data Analysis

A.1   Genome-wide Variant Analysis

For WGS and WES data, also for targeted gene panels data: germline and somatic variant calling; copy number variation analysis; structural variant analysis (better combined with PacBio data); and de novo genome assembly (better combined with PacBio data).

A.2   Genome-wide Gene Expression Analysis

Mainly for RNA-Seq data, also for small RNA sequencing data, or for a combination of both: differential expression analysis; functional annotation of differential gene expression with pathways and networks; gene fusion identification; variant identification; isoform identification and differential expression; and allele-specific expression.

A.3   Genome-wide Epigenetic Analysis

For ChIP-Seq, ATAC-Seq, or 3C/HiC-Seq data: peak calling for delineation of transcription factor binding and histone modification; and motif enrichment analysis.

A.4   Metagenomics Analysis

For 16S/ITS and DNA/RNA shotgun metagenomics data: genus/species identification; taxonomy classification; microbiota composition analysis; de novo microbiome assembly; inter-species interaction and pathogen/microbiota-host interaction.

B. PacBio Sequencing SMRTLink Data Analysis

The following analysis pipelines are available within PacBio SMRTLink software and can be performed upon request. Additional custom bioinformatics services are also available upon request, including genome assembling, transcriptome annotation, and others.

B.1   Circular Consensus Sequencing (CCS)

Identify consensus sequences for single molecules.

B.2   Hierarchical Genome Assembly (HGAP)

Generate de novo assemblies of genomes using Continuous Long Reads (CLR).

B.3   Microbial Assembly

Generate de novo assemblies of small prokaryotic genomes between 1.9 – 10 Mb and companion plasmids between 2 – 220 kb.

B.4   Base Modifications

Identify common bacterial base modifications (6mA, 4mC) and optionally analyze the methyltransferase recognition motifs.

B.5   Iso-Seq

Characterize full-length transcript isoforms.

B.6   Minor Variants Analysis

Identify and phase minor single nucleotide substitution variants in complex populations.

B.7   Structural Variant Calling

Identify structural variants (default: ≥ 20 bp) in a sample or set of samples relative to a reference.

C. Single-Cell Sequencing Data Analysis

The BIG Core offers a wide range of standard and custom single-cell analysis approaches to explore the molecular aspects of cellular heterogeneity, population diversity, and complexity within single-cell samples. Please contact the BIG Core for more detail of options.

D. AlphaFold2 Protein Structure Predictions

AlphaFold is an artificial intelligence program developed by DeepMind, which predicts protein 3-dimensional structure from its amino acid sequence. Its database provides open access to over 200 million protein structure predictions. AlphaFold regularly achieves accuracy competitive with experiment. The BIG Core has AlphaFold2 installed and provides the protein structure prediction service to the research community.