3Blue1Brown How large language models work, a visual intro to transformers | Chapter 5, Deep Learning
🎯 Загружено автоматически через бота:
🚫 Оригинал видео:
📺 Данное видео принадлежит каналу «3Blue1Brown» (@3blue1brown). Оно представлено в нашем сообществе исключительно в информационных, научных, образовательных или культурных целях. Наше сообщество не утверждает никаких прав на данное видео. Пожалуйста, поддержите автора, посетив его оригинальный канал.
✉️ Если у вас есть претензии к авторским правам на данное видео, пожалуйста, свяжитесь с нами по почте support@, и мы немедленно удалим его.
📃 Оригинальное описание:
Breaking down how Large Language Models work
Instead of sponsored ad reads, these lessons are funded directly by viewers:
---
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
If you’re interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
Site with exercises related to ML programming and GPTs
History of language models by Brit Cruise, @ArtOfTheProblem
An early paper on how directions in embedding spaces have meaning:
---
Timestamps
- Predict, sample, repeat
- Inside a transformer
- Chapter layout
- The premise of Deep Learning
- Word embeddings
- Embeddings beyond words
- Unembedding
- Softmax with temperature
- Up next
9 views
0
0
14 hours ago 00:08:52 1
[3Blue1Brown] Cross products | Chapter 10, Essence of linear algebra
2 days ago 00:10:00 7
[3Blue1Brown] Почему “вероятность 0“ не означает “невозможно“
1 month ago 00:27:06 267
[3Blue1Brown] How (and why) to raise e to the power of a matrix | DE6
1 month ago 00:14:12 19
[3Blue1Brown] Solving the heat equation | DE3
2 months ago 00:12:51 1
Change of basis | Chapter 13, Essence of linear algebra
2 months ago 00:12:09 1
Inverse matrices, column space and null space | Chapter 7, Essence of linear algebra
2 months ago 00:21:57 308
[Will Chen] But what are Polya vector fields? (and how can they be used to visualize complex integration?)
2 months ago 00:46:24 4
How are holograms possible? | Optics puzzles 5
2 months ago 00:27:13 9
[3Blue1Brown] How large language models work, a visual intro to transformers | Chapter 5, Deep Learning
2 months ago 00:22:42 2
[3Blue1Brown] How might LLMs store facts | Chapter 7, Deep Learning
2 months ago 00:46:23 8
[3Blue1Brown] How are holograms possible? | Optics puzzles 5
2 months ago 00:53:41 1
How I animate 3Blue1Brown | A Manim demo with Ben Sparks
2 months ago 00:46:23 10
[3Blue1Brown] How are holograms possible?
2 months ago 00:22:43 1
How might LLMs store facts | Chapter 7, Deep Learning
2 months ago 00:26:10 1
Attention in transformers, visually explained | Chapter 6, Deep Learning
3 months ago 00:00:59 1
A cute probability fact (part 2)
3 months ago 00:00:59 4
Temperature in LLMs
3 months ago 00:00:59 2
How word vectors encode meaning
3 months ago 00:27:13 10
But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning
3 months ago 00:00:58 1
How the Mandelbrot set is defined
3 months ago 00:00:50 1
Ellipses have multiple definitions, how are these the same?
3 months ago 00:31:51 1
Visualizing quaternions (4d numbers) with stereographic projection