NForge — Multimodal fMRI Brain Encoding

// demo

$ Live Demo

Watch NForge simulate a full inference pipeline — from model loading to cortical surface predictions.

nforge — demo.py

// about

# What is NForge?

The brain doesn't process language, sound, and video in isolation — it integrates them into a unified perceptual experience. NForge models this process by predicting human fMRI responses to naturalistic multimodal stimuli.

how it works

1. Accept any combination of text, audio, or video as input
2. Extract deep features via LLaMA 3.2, V-JEPA2, Wav2Vec-BERT
3. Predict cortical fMRI responses via Transformer encoder
4. Project onto fsaverage5 brain mesh (~20,484 vertices)

// features

* New in NForge

Built on Meta's TRIBE v2 with significant architectural improvements and four new capabilities.

⚙

ROI Attention Maps

Visualise which temporal windows most strongly drove each brain region. Hook into transformer self-attention layers and project scores onto the HCP MMP1.0 parcellation.

▷

Real-time Streaming

Run sliding-window predictions from a live feature stream without pre-loading the full clip. Thread-safe, configurable context window and step size.

★

Modality Attribution

Per-vertex importance scores showing how much text, audio, and video contributed to each prediction. Supports ablation and integrated gradient methods.

◉

Cross-Subject Adaptation

Few-shot adaptation to unseen subjects via ridge regression or nearest-neighbour matching. No full retraining needed.

⚡

torch.compile Support

Optional backbone compilation for faster training and inference. Compiles the encoder and combiner while preserving dynamic subject layer indexing.

▦

src/ Package Layout

Professionally organized with clean subpackages: core, data, training, inference, viz, and configs. Full test coverage with pytest.

// architecture

| Pipeline

End-to-end architecture from raw stimuli to cortical surface predictions.

Input Stimuli

text / audio / video

|

Foundation Model Extractors

LLaMA 3.2-3B · Wav2Vec-BERT · V-JEPA2-ViT-G

|

Per-Modality MLP Projectors

concatenate / sum / stack (configurable)

|

Combiner MLP + Positional Embeddings

temporal position encoding

|

Transformer Encoder

8 layers · self-attention over time

|

Subject-Specific Linear Head

SubjectLayers · per-subject prediction heads

|

Cortical Surface Predictions

fsaverage5 · ~20,484 vertices bilateral

// installation

$ Get Started

Install NForge with pip. Choose the extras you need.

# Core (inference only)
$ pip install nforge

# With training support (PyTorch Lightning, WandB)
$ pip install "nforge[training]"

# With brain visualisation (nilearn, PyVista)
$ pip install "nforge[plotting]"

# With streaming support
$ pip install "nforge[streaming]"

# Everything
$ pip install "nforge[training,plotting,streaming,attribution]"

# Development
$ pip install "nforge[dev]"

quick_start.py

# Inference in 5 lines
from nforge import NForgeModel

model = NForgeModel.from_pretrained("facebook/tribev2")
events = model.get_events_dataframe(video_path="clip.mp4")
preds, segments = model.predict(events)

# preds: np.ndarray (n_segments, n_vertices)
print(f"Predicted {preds.shape[0]} segments x {preds.shape[1]} vertices")

// datasets

@ Supported Datasets

NForge supports training on multiple neuroimaging datasets out of the box.

datasets

Dataset	Subjects	Stimuli	TR (s)
Algonauts2025Bold	4	TV sitcom "Friends" + movies	1.49
Wen2017	3	Short videos (11.7s)	~2
Lahner2024Bold	10	Short videos (6.2s)	~2
Lebel2023Bold	8	Spoken narrative (6-18s)	~2

// use cases

~ Applications

NForge enables new research directions in computational neuroscience and beyond.

Cognitive Neuroscience Research

Study how the brain integrates language, sound, and vision during naturalistic stimuli like movies and podcasts.
Brain-Computer Interfaces

Use predicted cortical activations as a bridge between multimodal stimuli and neural decoding systems.
Clinical Neuroimaging

Cross-subject adaptation enables analysis of patients with limited scan data using few-shot calibration.
Media & Content Analysis

Predict which moments in video content drive the strongest neural engagement via modality attribution.
Real-time Neurofeedback

Streaming predictions enable live cortical activity monitoring during stimulus presentation.
Educational Tools

Visualise brain activity patterns for teaching neuroscience concepts with interactive attention maps.

// training

> Training

Run experiments locally or on a SLURM cluster.

training

# Environment setup
$ export DATAPATH=/path/to/neuroimaging/data
$ export SAVEPATH=/path/to/output
$ export SLURM_PARTITION=your_gpu_partition

# Quick local test (3 epochs, 3 timelines)
$ python -m nforge.configs.experiments.test_run

# Full cortical training on SLURM
$ python -m nforge.configs.experiments.cortical

NFORGE v1.0.3