Do LLMs really “show their work“ when they perform chain of thought reasoning? “Measuring Faithfulness in Chain-of-Thought Reasoning“ is a new paper from Anthropic that aims to study this question empirically with a series of tests.
Timestamps:
00:00 - Measuring Faithfulness in Chain-of-Thought Reasoning
00:53 - What is Chain-of-Thought reasoning?
03:15 - Do the Chain-of-Thought Steps Really Reflect the Model’s Reasoning?
07:03 - Possible Faithfulness Failures
08:44 - Encoded Reasoning/Steganography
12:01 - Experiment Details
15:44 - Does Truncating the Chain of Thought Change the Predicted Answer?
16:53 - Does Editing the Chain of Thought Change the Predicted Answer?
17:14 - Do Uninformative Chain of Thought Tokens Also Improve Performance?
18:28 - Does Rewording the Chain of Thought Change the Predicted Answer?
20:20 - Does Model Size Affect Chain of Thought Faithfulness?
22:04 - Limitations
24:38 - Externalized Reasoning Oversight
Topics: ##ai #anthropic #CoT #reasoning
Link to the paper:
For related content:
- Twitter:
- Research lab:
- personal webpage:
- YouTube: @SamuelAlbanie1
- TikTok: @samuelalbanie
- Instagram:
- LinkedIn:
- Threads: @samuelalbanie
- Discord server for filtir:
(Optional) if you’d like to support the channel:
-
-
Credits:
Image credit (Chelsea photo) –#/media/File: