What Do Neural Networks Really Learn? Exploring the Brain of an AI Model
Neural networks have become increasingly impressive in recent years, but there’s a big catch: we don’t really know what they are doing. We give them data and ways to get feedback, and somehow, they learn all kinds of tasks. It would be really useful, especially for safety purposes, to understand what they have learned and how they work after they’ve been trained. The ultimate goal is not only to understand in broad strokes what they’re doing but to precisely reverse engineer the algorithms encoded in their parameters. This is the ambitious goal of mechanistic interpretability. As an introduction to this field, we show how researchers have been able to partly reverse-engineer how InceptionV1, a convolutional neural network, recognizes images.
▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
This topic is truly a rabbit hole. If you want to learn more about this important research and even contribute to it, check out this list of sources about mechanistic interpretability and interpretability in general we’ve compiled for you:
On Interpreting InceptionV1:
Feature visualization:
Zoom in: An Introduction to Circuits:
The Distill journal contains several articles that try to make sense of how exactly InceptionV1 does what it does:
OpenAI’s Microscope tool lets us visualize the neurons and channels of a number of vision models in great detail:
Here’s OpenAI’s Microscope tool pointed on layer Mixed3b in InceptionV1:
Activation atlases:
More recent work applying SAEs to InceptionV1:
Transformer Circuits Thread, the spiritual successor of the circuits thread on InceptionV1. This time on transformers:
In the video, we cite “Toy Models of Superposition“:
We also cite “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning“:
More recent progress:
Mapping the Mind of a Large Language Model:
Press:
Paper in the transformers circuits thread:
Extracting Concepts from GPT-4:
Press:
Paper:
Browse features:
Language models can explain neurons in language models (cited in the video):
Press:
Paper:
View neurons:
Neel Nanda on how to get started with Mechanistic Interpretability:
Concrete Steps to Get Started in Transformer Mechanistic Interpretability:
Mechanistic Interpretability Quickstart Guide:
200 Concrete Open Problems in Mechanistic Interpretability:
More work mentioned in the video:
Progress measures for grokking via mechanistic interpretability:
Discovering Latent Knowledge in Language Models Without Supervision:
Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning:
▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, MERCH▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🟠 Patreon:
🔵 Channel membership:
🟢 Merch:
🟤 Ko-fi, for one-time and recurring donations:
▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Discord:
Reddit:
X/Twitter:
▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
AAAA you don’t fit in the description this time! But we thank you from the bottom of our hearts. All of you, in this Google Doc:
▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
All the good doggos who worked on this video:
1 view
84
33
2 weeks ago 00:03:49 1
Battling a Dead Battery? TOPDON BT100 is Your Ultimate Weapon! - YouTube
2 weeks ago 00:02:35 1
8 Ball Pool Hack iOS & Android - UPDATED 8 Ball Pool MOD APK with Aim Tool App 2025
2 weeks ago 00:03:12 1
Spoofing Pokermon GO - How to Get Pokemon GO Hack iOS & Android with Joystick, Teleport 2025
2 weeks ago 00:05:00 1
A CGI 3D Short Film: “NIGHTFALL“ - by The NIGHTFALL Team + ARTFX | TheCGBros
2 weeks ago 00:02:42 1
Pokemon TCG Pocket Hack iOS & Android - How to Get Poke God, Free Tickets with Pokemon TCG MOD APK
2 weeks ago 00:02:32 2
Dragon Ball Legends Hack/MOD APK iOS & Android - How to Get Unlimited Chrono Crystals in 2025
2 weeks ago 00:14:08 2
U.S. Nightmare Became a REALITY: Russia Took Iran Under Its Military and Economic Protection
2 weeks ago 00:02:15 1
Coin Master Hack iOS & Android - How to Get Coin Master Free Unlimited Spins 2025
3 weeks ago 00:00:00 1
Exploring An ABANDONED School And Old Hospital - Abandoned Places | Abandoned Places UK
3 weeks ago 00:00:00 2
Exploring Kiki’s & Lieutenant Packard’s ABANDONED HOUSE With EVERYTHING LEFT BEHIND & A Classic Car
3 weeks ago 00:03:38 1
Tired of Dry Air and Sleepless Nights? Here’s the Smart Humidifier That Has Your Back! - YouTube
3 weeks ago 00:02:42 1
X2 (5/5) Movie CLIP - This Is the Only Way (2003) HD
3 weeks ago 00:02:24 1
Delta Executor Mobile iOS & Android Tutorial NO KEY - The Best Roblox Executor Mobile in 2025
3 weeks ago 00:02:32 1
8 Ball Pool Hack/MOD APK iOS & Android - How I Got 8 Ball Pool Aim Hack/Guideline Tool in 2025
3 weeks ago 00:02:23 1
How to DO Block Blast Glitch iOS & Android - Get HIGH SCORE with Block Blast Hack/MOD APK 2025
3 weeks ago 00:03:03 1
Alan Walker, Kylie Cantrall - Unsure (Official Music Video)
3 weeks ago 00:02:43 1
Rīċa Ēastlēah - Never gonna give you up Cover In Old English. Bardcore/Medieval style
3 weeks ago 00:02:29 1
Delta Executor Mobile Tutorial iOS & Android - The Best Roblox Executor Mobile 2025 UPDATE
3 weeks ago 00:02:28 1
ILLSLICK - Go Play [Official Video]
3 weeks ago 00:07:48 1
Real VS. FAKE Lego Star Wars
3 weeks ago 02:42:17 7
Deep House Mix 2024 | Deep House, Vocal House, Nu Disco, Chillout Mix by Diamond #3
3 weeks ago 00:03:07 1
Scary Movie: Wazzup! (HD CLIP)
3 weeks ago 00:11:32 1
Ethel Cain - Amber Waves (Official Visualizer)
3 weeks ago 00:02:30 1
How to DO Block Blast Glitch iOS & Android - Get High Score with Block Blast Hack/MOD APK 2025