Resources
Understanding your neural activation scores, what they mean, and how to use them to improve your content.
How BRAIN|RANK Works
BRAIN|RANK uses TRIBE v2 (TRansfomer for In-silico Brain Experiments), a brain-predictive foundation model developed by Meta AI Research and released in March 2026. TRIBE v2 was trained on over 1,000 hours of fMRI recordings from more than 700 volunteers watching, hearing, and reading naturalistic content.
BRAIN|RANK accepts three kinds of input — video, image, and text — and routes each through three specialized neural encoders to predict per-segment cortical activation:
V-JEPA2
Processes visual features — motion, composition, faces, objects, scene changes
Wav2Vec-BERT
Processes audio features — music, speech, sound effects, silence, rhythm
LLaMA 3.2
Processes text and narrative — transcribed speech, storytelling, dialogue meaning
Which encoders light up depends on the modality you upload:
Video
All three encoders engage — visual frames through V-JEPA2, the audio track through Wav2Vec-BERT, and transcribed speech through LLaMA 3.2.
Image
Held as a static frame for the duration you set and routed through V-JEPA2, with silence fed to the audio encoder. Use this for thumbnails and stills.
Text — Spoken
Your script is synthesized into speech and run through Wav2Vec-BERT + LLaMA 3.2; a neutral visual is used for V-JEPA2. Use this for voiceovers and scripts you intend to narrate.
Text — Visual
The text is rendered as on-screen typography for V-JEPA2 and processed as written language by LLaMA 3.2. Use this for captions, titles, and on-screen copy.
These encoders feed into a prediction head that outputs per-vertex cortical activation across 20,484 points on the brain surface, every 2 seconds. BRAIN|RANK then maps these activations to functional brain networks you can understand.
Understanding Your Scores
BRAIN|RANK scores are not percentages. They represent normalized neural activation intensity on a scale from 0.00 to 1.00. This is a measure of how strongly the predicted brain response fires — not a grade.
Important: 0.50 is a strong score, not "average"
Your brain is always active — baseline resting activity sits around 0.15-0.25. A score of 0.50 means the content is driving activation well above baseline. You do NOT want 1.00 — that level of neural activation would be closer to an epileptic seizure than engaged viewing.
Score Benchmark
Baseline
0.00 — 0.25
Resting brain activity — content not yet registering
Moderate
0.25 — 0.45
Content is registering — viewer is casually engaged
Strong
0.45 — 0.65
High engagement — viewer is locked in
Exceptional
0.65 — 0.85
Peak neural response — rare and powerful moments
Intense
0.85 — 1.00
Extremely rare activation — maximum neural intensity
What each tier means for creators
Baseline (0.00 — 0.25)
Brain is at resting state. Content isn't registering meaningful activation. This is normal for low-energy moments like long static shots or silence. If your entire video scores here, the content may not be engaging enough.
Moderate (0.25 — 0.45)
Content is registering — the viewer is casually engaged. This is typical for conversational content, simple tutorials, or calm footage. Most everyday content sits in this range. Solid but not remarkable.
Strong (0.45 — 0.65)
High engagement — the viewer is locked in. This is what you should aim for. Content in this range triggers genuine attention and emotional response. Most viral content peaks in this tier. If your average score is here, your content is performing well.
Exceptional (0.65 — 0.85)
Peak neural response — these are rare, powerful moments. Jump scares, plot twists, emotional climaxes, and breakthrough revelations score here. Individual segments may hit this tier, but sustained exceptional activation across an entire video is uncommon.
Intense (0.85+)
Extremely rare activation — maximum neural intensity. This indicates overwhelming sensory or emotional stimulation. Sustained scores here are not realistic or desirable — think of it as the brain's "red zone." Brief spikes here indicate the single most impactful moment in your content.
Brain Regions Explained
BRAIN|RANK tracks 9 functional brain networks. Each represents a different aspect of how viewers process your content. Understanding which networks activate helps you optimize for specific types of engagement.
Attention
Dorsal and ventral attention networks (frontal eye fields, intraparietal sulcus). Measures focused concentration — how well your content commands the viewer's attention.
Creator tip: Strong in: fast cuts, surprising visuals, direct address to camera. Weak in: repetitive footage, long static shots.
Reward
Ventral striatum and orbitofrontal cortex. Measures the brain's pleasure/satisfaction response — the dopamine-associated circuits that make viewers want more.
Creator tip: Strong in: humor, satisfying reveals, achievement moments, ASMR triggers. Key driver of likes, shares, and repeat views.
Emotional Processing
Amygdala and insula cortex. Measures the intensity of emotional response — both positive and negative emotions.
Creator tip: Strong in: emotional storytelling, dramatic moments, fear/surprise. High emotional peaks drive memorability and sharing.
Social Cognition
Temporoparietal junction and medial prefrontal cortex. Measures how much the viewer processes social and interpersonal cues.
Creator tip: Strong in: face-to-face content, dialogue, social dynamics, reaction videos. Drives connection and parasocial relationships.
Visual Processing
V1-V5 visual cortex areas. Measures how visually stimulating and complex the content is.
Creator tip: Strong in: high-motion footage, bright colors, visual effects, cinematography. Low in: static or text-heavy content.
Auditory Processing
Primary and secondary auditory cortex. Measures how the brain processes sounds, music, and speech.
Creator tip: Strong in: music drops, sound design, ASMR, vocal emphasis. Critical for audio-driven content like podcasts and music videos.
Language / Narrative
Broca's and Wernicke's areas. Measures how the brain processes narrative structure, storytelling, and verbal content.
Creator tip: Strong in: storytelling, educational content, commentary, interviews. Key for content that relies on verbal communication.
Default Mode
Medial prefrontal cortex and posterior cingulate. Measures mind-wandering and self-reflection — when the viewer's mind drifts inward.
Creator tip: High default mode = viewer is disengaging. Unlike other regions, you want this LOW. High values mean the content isn't holding attention.
Motion Processing
MT/V5 motion-sensitive visual areas. Measures how the brain tracks movement in your content.
Creator tip: Strong in: action sequences, sports, dance, transitions, camera movement. Low in: talking heads, static frames.
Key Metrics
Engagement Score
Your top-line number. A weighted blend of peak moment (40%), hook strength (30%), and retention (30%). It balances "how strong is your best moment" against "how consistently engaging is the whole piece." A single value you can compare across your own uploads and against the leaderboard.
Hook Strength
Average overall neural activation during the first 10 seconds of your content (the first five 2-second segments, averaged across all brain regions). Measures how quickly your opening captures attention. Above 0.40 is strong. Critical for short-form and social feeds where viewers decide to stay or scroll within seconds.
Retention
The weakest 10-second window across your content. BRAIN|RANKslides a 5-segment window over the full timeline and takes the minimum — so a low retention score means there's a stretch where the brain response drops off. Use this to find the single worst section of your edit and decide whether to tighten, cut, or replace it.
Peak Moment
The exact 2-second segment with the highest overall activation across all brain regions. This is your content's most neurally engaging moment. Use it as a thumbnail frame, clip highlight, or to understand what drives the strongest brain response.
Arousal
The viewer's physiological activation level — how alert or stimulated their nervous system is. Computed from attention and emotional-processing networks. High arousal (>0.60) = exciting, intense content. Low arousal (<0.30) = calming or boring content. Best content often oscillates between high and moderate arousal rather than pinning at either extreme.
Valence
Combines the reward and social-cognition networks. High valence (>0.60) = content triggers satisfaction and social-emotional approach (joy, connection, warmth). Low valence (<0.30) = disengagement, avoidance, or negative affect. Plotted with arousal on the circumplex chart, it maps the full emotional experience of your viewer.
About the Science
TRIBE v2 (TRansfomer for In-silico Brain Experiments) is a foundation model for predicting brain activity from multimodal content. It was developed by Meta AI Research and released in March 2026 under a CC BY-NC 4.0 license. The model was trained on over 1,000 hours of fMRI recordings from more than 700 volunteers viewing naturalistic videos, listening to podcasts, and reading text — which is why it generalizes across genres and formats rather than being locked to a single content type.
Temporal resolution
TRIBE v2 predicts brain state in 2-second windows. That's the same temporal resolution the model was trained on (the repetition time, or TR, of the underlying fMRI scans), and it's why every segment-level metric you see in BRAIN|RANKlands on an even-second boundary. Shorter-than-2s events — a single frame, a one-word utterance — contribute to the segment they fall in but can't be isolated below that window.
Cortical parcellation
TRIBE v2 outputs activation at roughly 20,484 cortical vertices — the standard fsaverage5 surface mesh (10,242 vertices per hemisphere). BRAIN|RANK then aggregates those per-vertex values into 9 functional networks using the Destrieux atlas, which divides the cortical surface into roughly 150 anatomically defined regions. Each network is a weighted aggregation of the regions most associated with that function in the neuroscience literature — for example, the Attention network combines dorsal and ventral attention regions (intraparietal sulcus, frontal eye fields, superior parietal lobule) rather than trusting a single anatomical area in isolation. The 9-network aggregation is a BRAIN|RANK design choice, not part of TRIBE v2 itself.
Normalization
Raw activation values don't come with a fixed scale, so every score you see onBRAIN|RANKis normalized using robust percentile scaling (clipped at the 1st and 99th percentiles of the model's training distribution, then rescaled to 0–1). A 0.50 means the response lands firmly above the population baseline — not that it's "half of maximum." This is also why tier boundaries (baseline, moderate, strong, exceptional, intense) are what they are.
How multi-modal content is handled
TRIBE v2 is built to receive vision, audio, and text features simultaneously. When you upload an image, BRAIN|RANK holds it as a static frame for the duration you set and pipes silence into the audio encoder, so the prediction isolates the visual contribution. When you upload a text script in spoken mode, the script is synthesized into speech so the audio and language encoders both activate on the same narrative — the way a viewer would actually hear it. Visual-text mode renders the text on-screen so the visual encoder sees the typography exactly as your audience would. The more modalities you supply, the closer the prediction is to the full viewing experience.
What BRAIN|RANK is — and isn't
These predictions are not scans of any individual viewer's brain, and they are not measurements of your own brain either. TRIBE v2 infers the group-average cortical response that would occur if a representative pool of viewers watched your content under naturalistic conditions. Treat the numbers as a directional signal for how content tends to land — strong hook, weak middle, peak at 00:42 — not as a diagnostic about a specific person.
Key research references:
- TRIBE v2: A Predictive Foundation Model Trained to Understand How the Human Brain Processes Complex Stimuli — Meta AI Research (March 2026). The foundation model that powers every BRAIN|RANK prediction. Released under CC BY-NC 4.0.
- Russell Circumplex Model of Affect — James A. Russell (1980). The two-axis emotion model (arousal × valence) we plot on the emotional-state chart.
- Destrieux Atlas — cortical parcellation scheme mapping the 20,484 surface vertices to ~150 functional regions. Used to aggregate vertex-level outputs into our 9 brain networks.
- fMRI (functional Magnetic Resonance Imaging)— the neuroimaging modality used to collect TRIBE v2's training data. Measures blood-oxygenation changes as a proxy for neural activity.
- Naturalistic neuroimaging datasets — multi-hour recordings of real people watching unedited video, the ground truth TRIBE v2 was fit to.
Note: TRIBE v2 is licensed under CC BY-NC-4.0 (non-commercial use). BRAIN|RANK's predictions are computational estimates of brain activity, not actual brain scans. Results should be interpreted as data-informed insights, not medical or scientific measurements.