Project Overview

This report details the status of the ibd-quantum-esm project, a proof-of-concept pipeline designed to integrate quantum machine learning (QML) for classifying Inflammatory Bowel Disease (IBD) multi-omics data. This page interactively summarizes the project's goals, current outcomes, and performance metrics.

Executive Summary

The project repository now delivers a full, CLI-driven workflow for embedding protein sequences, ingesting multi-omics tables, training both classical and quantum classifiers, and generating summary artifacts. A key limitation is the current reliance on a synthetic dataset due to restricted access to real IBD multi-omics data. Therefore, all metrics reported here serve as pipeline validation rather than conclusive biomedical evidence.

Original Objective

Deliver a proof-of-concept quantum-enhanced classifier on a real IBD multi-omics cohort (e.g., IBDMDB), compare it against classical baselines, and surface interpretable molecular signatures for IBD flare vs. remission.

Current Implementation

Implements the entire workflow on a synthetic, high-separation multi-omics dataset (240 samples, 13 features). No real patient data or IBM hardware runs have been executed yet.


Interactive Workflow

The pipeline is orchestrated through a series of CLI commands. Click on any stage below to see its purpose and the associated command. This demonstrates the end-to-end process from data ingestion to model training.

🧬

1. Embed Sequences

ESM2 Embedding

📊

2. Ingest Omics

Load & Process Data

💻

3. Train Classical

Baseline Model

⚛️

4. Train Quantum

QSVC Kernel Model


Results Dashboard

This section visualizes the performance comparison between the classical and quantum models. All results are based on the synthetic dataset, which was found to be easily separable, leading to perfect scores for the classical model.

Classical Model (Logistic Regression)

1.00

Macro F1 Score

Quantum Model (QSVC Simulator)

0.83

Macro F1 Score


Findings & Next Steps

The primary finding is that the end-to-end pipeline is functional. The performance gap is likely due to the synthetic data's simplicity and un-tuned parameters for the quantum model. The focus must now shift to applying this pipeline to real data.

Key Findings

  • The CLI-driven workflow is fully implemented and functional.
  • Classical model (LogReg) achieves a perfect 1.00 Macro F1 on the synthetic data, indicating high separability.
  • Quantum model (QSVC) achieves ~0.83 Macro F1, showing viability but also a performance gap on this simple data.
  • The synthetic data is insufficient for meaningful biomedical interpretation or model comparison.

Recommended Next Steps

  • Secure access to IBDMDB or an equivalent real patient cohort.
  • Update ingestion scripts to merge different data modalities and labels from the real dataset.
  • Re-run training, tune PCA/qubit settings, and explore IBM managed simulators or hardware backends.
  • Extend reporting notebooks to interpret dominant biomarkers (microbes, metabolites, host genes).
  • Package the workflow (e.g., Makefile or tox) and enable CI tests for reproducibility.

Command Reference

Below is a reference of the key commands used in the project's workflow. Click each item to expand and view the full command.

python -m src runtime-test --limit 5
python -m src embed data/demo.fa --backend esm2_t6_8M_UR50D --outdb results/duckdb/embeddings.duckdb
python -m src ingest-omics data/ibd_multiomics_synthetic.csv --id-col sample_id --label-col clinical_status --outdb results/duckdb/ibd.duckdb --overwrite
python -m src train-omics --db results/duckdb/ibd.duckdb --out results/metrics/omics_classifier.json --model-out results/models/omics_classifier.joblib --pred-out results/predictions_omics.csv
python -m src train-qsvc-quantum --db results/duckdb/ibd.duckdb --table embeddings --labels-csv data/ibd_multiomics_synthetic.csv --label-col clinical_status --pca-components 3 --reps 1 --shots 1024 --backend statevector --per-class-limit 120 --C 100 --test-size 0.2 --random-state 123 --out results/metrics/qsvc_quantum.json --model-out results/models/qsvc_quantum.joblib