Core Analysis Tools

The Omics platform is uniquely built around powerful, scalable analysis pipelines spanning single-omics features, robust Multi-Omics networking, and Pathways correlation testing.

Single-Omics Analysis

The standard Analysis Configuration page provides form-driven models to run Correlation analysis, PCA matrices, and specialized Machine Learning implementations like Random Forests securely against a loaded dataset.

Analysis Configuration Interface

Multi-Omics Integration

Multi-Omics configurations enable users to fuse orthogonal datasets (e.g. transcriptomics with metabolomics) to find multi-variable biological drivers using models like PLS-DA and DIABLO.

Multi-Omics Setup

Pathway Analysis

Direct biological context mapping is executed within the Pathway Analysis tool, leveraging databases like KEGG to calculate Over-Representation Analysis (ORA).

Pathway Investigation Panel

Parameter Reference Guide

Detailed technical specifications for the core configuration parameters available in the analysis modules.

Analysis Type

Defines the high-level methodology (e.g., Random Forest, Correlation, or GWAS Screening). Controls which downstream parameters are visible.

Number of Estimators

The number of decision trees in the ensemble model. Higher values lead to more stable models but require more memory.

Number of CV Splits

The $k$ partitions used for $k$-fold cross-validation. Typical academic baseline is 5 or 10.

Test Size

Proportion of the dataset used for validation (e.g., 0.2 represents 20% for testing).

Random State (Seed)

Integer seed value ensuring that stochastic operations (like dataset splitting) are identically reproducible across runs.

Correlation Method

Determines how to measure feature association: Pearson (linear), Spearman (rank), or Kendall (robust).

FDR Threshold (q-value)

The false discovery rate cutoff (Benjamini-Hochberg) for reporting significant biological findings.

Permutation Repeats

The number of shuffles performed to generate internal null distributions for feature importance significance testing.

SHAP Analysis Coverage

Enables calculation of Shapley values for model local interpretability, identifying variable contribution per sample.

Gene List Source

Input selection for enrichment. Utilizes high-ranking features from Random Forest or manually curated user lists.

Feature Subset Size (Top N)

The count of highest-ranking genes to include in ORA calculations.

Biological Organism

Species context for gene ID mapping (e.g., Human vs specialized genomic backends).

GO Namespace

Specifies which Gene Ontology branch to query: Biological Process, Molecular Function, or Cellular Component.

Minimum Overlap Threshold

The minimum number of provided genes required within a pathway to trigger statistical testing.