where to find LLMs(03:18:34 - 03:21:46) - Deep Dive into LLMs like ChatGPT

where to find LLMs(03:18:34 - 03:21:46)
Deep Dive into LLMs like ChatGPT

This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the be...
This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications. I have one "Intro to LLMs" video already from ~year ago, but that is just a re-recording of a random talk, so I wanted to loop around and do a lot more comprehensive version.

Instructor
Andrej was a founding member at OpenAI (2015) and then Sr. Director of AI at Tesla (2017-2022), and is now a founder at Eureka Labs, which is building an AI-native school. His goal in this video is to raise knowledge and understanding of the state of the art in AI, and empower people to effectively use the latest and greatest in their work.
Find more at https://karpathy.ai/ and https://x.com/karpathy

Chapters
00:00:00 introduction
00:01:00 pretraining data (internet)
00:07:47 tokenization
00:14:27 neural network I/O
00:20:11 neural network internals
00:26:01 inference
00:31:09 GPT-2: training and inference
00:42:52 Llama 3.1 base model inference
00:59:23 pretraining to post-training
01:01:06 post-training data (conversations)
01:20:32 hallucinations, tool use, knowledge/working memory
01:41:46 knowledge of self
01:46:56 models need tokens to think
02:01:11 tokenization revisited: models struggle with spelling
02:04:53 jagged intelligence
02:07:28 supervised finetuning to reinforcement learning
02:14:42 reinforcement learning
02:27:47 DeepSeek-R1
02:42:07 AlphaGo
02:48:26 reinforcement learning from human feedback (RLHF)
03:09:39 preview of things to come
03:15:15 keeping track of LLMs
03:18:34 where to find LLMs
03:21:46 grand summary

Links
- ChatGPT https://chatgpt.com/
- FineWeb (pretraining dataset): https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
- Tiktokenizer: https://tiktokenizer.vercel.app/
- Transformer Neural Net 3D visualizer: https://bbycroft.net/llm
- llm.c Let's Reproduce GPT-2 https://github.com/karpathy/llm.c/discussions/677
- Llama 3 paper from Meta: https://arxiv.org/abs/2407.21783
- Hyperbolic, for inference of base model: https://app.hyperbolic.xyz/
- InstructGPT paper on SFT: https://arxiv.org/abs/2203.02155
- HuggingFace inference playground: https://huggingface.co/spaces/huggingface/inference-playground
- DeepSeek-R1 paper: https://arxiv.org/abs/2501.12948
- TogetherAI Playground for open model inference: https://api.together.xyz/playground
- AlphaGo paper (PDF): https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf
- AlphaGo Move 37 video: https://www.youtube.com/watch?v=HT-UZkiOLv8
- LM Arena for model rankings: https://lmarena.ai/
- AI News Newsletter: https://buttondown.com/ainews
- LMStudio for local inference https://lmstudio.ai/

- The visualization UI I was using in the video: https://excalidraw.com/
- The specific file of Excalidraw we built up: https://drive.google.com/file/d/1EZh5hNDzxMMy05uLhVryk061QYQGTxiN/view?usp=sharing
- Discord channel for Eureka Labs and this video: https://discord.gg/3zy8kqD9Cp

Educational Use Licensing
This video is freely available for educational and internal training purposes. Educators, students, schools, universities, nonprofit institutions, businesses, and individual learners may use this content freely for lessons, courses, internal training, and learning activities, provided they do not engage in commercial resale, redistribution, external commercial use, or modify content to misrepresent its intent.

#llm #chatgpt #ai #deep dive #deep learning #introduction #large language model
introduction - Deep Dive into LLMs like ChatGPT

introduction

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:00:00 - 00:01:00
- Introduction - Deep Dive into LLMs like ChatGPT

- Introduction

Deep Dive into LLMs like ChatGPT
2025年02月06日  @TimeStampBuddy 様 
00:00:01 - 00:01:04
pretraining data (internet) - Deep Dive into LLMs like ChatGPT

pretraining data (internet)

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:01:00 - 00:07:47
- LLM Pre-training - Deep Dive into LLMs like ChatGPT

- LLM Pre-training

Deep Dive into LLMs like ChatGPT
2025年02月06日  @TimeStampBuddy 様 
00:01:04 - 00:15:13
Atound, you explain a really interesting notion, that models need to "think" before producing a complex response, thats because each layer in a neural network has finite computation. I feel like its somewhat related to the notion of computational irreducibility Stephen Wolfram talks about.  This is also why we humans need to spend some time thinking about complex issues before coming up with a good response. - Deep Dive into LLMs like ChatGPT

Atound, you explain a really interesting notion, that models need to "think" before producing a complex response, thats because each layer in a neural network has finite computation. I feel like its somewhat related to the notion of computational irreducibility Stephen Wolfram talks about. This is also why we humans need to spend some time thinking about complex issues before coming up with a good response.

Deep Dive into LLMs like ChatGPT
2025年02月06日  @hashiromer7668 様 
00:01:49 - 03:31:24
But what if the ultimate joke about pelicans is actually 'the the the the the the,' but we simply don't have enough intelligence to understand it—just like an unusual move in the game of Go? XD - Deep Dive into LLMs like ChatGPT

But what if the ultimate joke about pelicans is actually 'the the the the the the,' but we simply don't have enough intelligence to understand it—just like an unusual move in the game of Go? XD

Deep Dive into LLMs like ChatGPT
2025年02月06日  @JanKowalski-dm5vr 様 
00:03:02 - 03:31:24
wow amazing  hours so much in few hours .. Saved me hours of research and insprie me for more ..great work looking forward for new such interesting videos.. - Deep Dive into LLMs like ChatGPT

wow amazing hours so much in few hours .. Saved me hours of research and insprie me for more ..great work looking forward for new such interesting videos..

Deep Dive into LLMs like ChatGPT
2025年02月06日  @adarshkumar-jv4hz 様 
00:03:30 - 03:31:24
at   , talks about eliminating racist sites during corpus preprocessing.  This can introduce bias by eliminating candid discussion of, for example, average IQ test scores of racial subgroups. Claude refuses to answer this altogether, calling race a constructed concept. ChatGPT and Gemini, at the time I queried them, both produced valid, honest outputs, which aligned with the research.  Those of you so enamored with Claude are still trapped in Dario's echo-chamber. But society has moved on, now (2025). Will you? - Deep Dive into LLMs like ChatGPT

at , talks about eliminating racist sites during corpus preprocessing. This can introduce bias by eliminating candid discussion of, for example, average IQ test scores of racial subgroups. Claude refuses to answer this altogether, calling race a constructed concept. ChatGPT and Gemini, at the time I queried them, both produced valid, honest outputs, which aligned with the research. Those of you so enamored with Claude are still trapped in Dario's echo-chamber. But society has moved on, now (2025). Will you?

Deep Dive into LLMs like ChatGPT
2025年02月06日  @thomasgilson6206 様 
00:03:50 - 03:31:24
tokenization - Deep Dive into LLMs like ChatGPT

tokenization

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:07:47 - 00:14:27
neural network I/O - Deep Dive into LLMs like ChatGPT

neural network I/O

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:14:27 - 00:20:11
- Neural Net & Training - Deep Dive into LLMs like ChatGPT

- Neural Net & Training

Deep Dive into LLMs like ChatGPT
2025年02月06日  @TimeStampBuddy 様 
00:15:13 - 00:40:14
neural network internals - Deep Dive into LLMs like ChatGPT

neural network internals

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:20:11 - 00:26:01
inference - Deep Dive into LLMs like ChatGPT

inference

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:26:01 - 00:31:09
GPT-2: training and inference - Deep Dive into LLMs like ChatGPT

GPT-2: training and inference

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:31:09 - 00:42:52
Somewhere around , you said something about training 1 million tokens. Do you mean you train chunks of 1 million tokens to generate output or you train different tokens that add up to a million to generate output? - Deep Dive into LLMs like ChatGPT

Somewhere around , you said something about training 1 million tokens. Do you mean you train chunks of 1 million tokens to generate output or you train different tokens that add up to a million to generate output?

Deep Dive into LLMs like ChatGPT
2025年02月06日  @oteikwufrancis1108 様 
00:36:52 - 03:31:24
- GPUs & Model Costs - Deep Dive into LLMs like ChatGPT

- GPUs & Model Costs

Deep Dive into LLMs like ChatGPT
2025年02月06日  @TimeStampBuddy 様 
00:40:14 - 01:01:06
Llama 3.1 base model inference - Deep Dive into LLMs like ChatGPT

Llama 3.1 base model inference

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:42:52 - 00:59:23
: Parallel universes !!! Just loving these analogies - awesome ! - Deep Dive into LLMs like ChatGPT

: Parallel universes !!! Just loving these analogies - awesome !

Deep Dive into LLMs like ChatGPT
2025年02月06日  @madhurkgpian 様 
00:55:22 - 03:31:24
pretraining to post-training - Deep Dive into LLMs like ChatGPT

pretraining to post-training

Deep Dive into LLMs like ChatGPT
2025年02月06日 
00:59:23 - 01:01:06
post-training data (conversations) - Deep Dive into LLMs like ChatGPT

post-training data (conversations)

Deep Dive into LLMs like ChatGPT
2025年02月06日 
01:01:06 - 01:20:32
- Build LLM Assistant - Deep Dive into LLMs like ChatGPT

- Build LLM Assistant

Deep Dive into LLMs like ChatGPT
2025年02月06日  @TimeStampBuddy 様 
01:01:06 - 02:07:30
"something went wrong" 😂 lol I love that he left this in there! - Deep Dive into LLMs like ChatGPT

"something went wrong" 😂 lol I love that he left this in there!

Deep Dive into LLMs like ChatGPT
2025年02月06日  @stephen-torrence 様 
01:18:46 - 03:31:24
his genuine laugh at ChatGPT error is so pure and spontaneous. How can someone not love Karpathy!!?? Sir you are pure Gold for humanity. - Deep Dive into LLMs like ChatGPT

his genuine laugh at ChatGPT error is so pure and spontaneous. How can someone not love Karpathy!!?? Sir you are pure Gold for humanity.

Deep Dive into LLMs like ChatGPT
2025年02月06日  @MarcoDonadelli 様 
01:18:47 - 03:31:24
hallucinations, tool use, knowledge/working memory - Deep Dive into LLMs like ChatGPT

hallucinations, tool use, knowledge/working memory

Deep Dive into LLMs like ChatGPT
2025年02月06日 
01:20:32 - 01:41:46
The chapter about hallucinations was so insightful. Never heard about it as an issue of the dataset, i.e., it wasn't trained to say "I don't know" and how one can test the knowledge of the model. Thanks! - Deep Dive into LLMs like ChatGPT

The chapter about hallucinations was so insightful. Never heard about it as an issue of the dataset, i.e., it wasn't trained to say "I don't know" and how one can test the knowledge of the model. Thanks!

Deep Dive into LLMs like ChatGPT
2025年02月06日  @linusnox 様 
01:20:32 - 03:31:24
Observation: Approx. at , Andrej tests the question "Who is Orson Kovacs" using falcon-7b-instruct in HF playground, the temperature is still 1.0 which will make the model to respond in a balanced manner between randomness and deterministic. Although it makes up stuff to behave like hallucinations, it is good to test out with temperature less or more than 1.0 to understand how the factuality of the data varies. - Deep Dive into LLMs like ChatGPT

Observation: Approx. at , Andrej tests the question "Who is Orson Kovacs" using falcon-7b-instruct in HF playground, the temperature is still 1.0 which will make the model to respond in a balanced manner between randomness and deterministic. Although it makes up stuff to behave like hallucinations, it is good to test out with temperature less or more than 1.0 to understand how the factuality of the data varies.

Deep Dive into LLMs like ChatGPT
2025年02月06日  @avinashrs6303 様 
01:23:50 - 03:31:24
you mentioned around  mark - the reason why you allow the model to say i don't know, instead of augmenting it with the new knowledge, is it because there's infinite amount of knowledge to learn so that it's virtually impossible to learn knowledge, and thus it's better to train it to know when to refuse? In other words, say if somehow the model CAN learn ALL the knowledge of the world, we won't need to train it to stop hallucinating? Thanks. - Deep Dive into LLMs like ChatGPT

you mentioned around mark - the reason why you allow the model to say i don't know, instead of augmenting it with the new knowledge, is it because there's infinite amount of knowledge to learn so that it's virtually impossible to learn knowledge, and thus it's better to train it to know when to refuse? In other words, say if somehow the model CAN learn ALL the knowledge of the world, we won't need to train it to stop hallucinating? Thanks.

Deep Dive into LLMs like ChatGPT
2025年02月06日  @charlielaw48 様 
01:30:00 - 03:31:24
Thanks for the informative video! I have a question about training language models for tool use, specifically regarding the process you described around - Deep Dive into LLMs like ChatGPT

Thanks for the informative video! I have a question about training language models for tool use, specifically regarding the process you described around

Deep Dive into LLMs like ChatGPT
2025年02月06日  @marathonour 様 
01:33:38 - 03:31:24
knowledge of self - Deep Dive into LLMs like ChatGPT

knowledge of self

Deep Dive into LLMs like ChatGPT
2025年02月06日 
01:41:46 - 01:46:56
models need tokens to think - Deep Dive into LLMs like ChatGPT

models need tokens to think

Deep Dive into LLMs like ChatGPT
2025年02月06日 
01:46:56 - 02:01:11
@.  Question. I was just reading a paper recently (I believe it was from Anthropic, but sadly I can't find it now) that when they have looked at "thinking models", it appears the final answer is generally already determined well before the reasoning process begins. Then the model just fills in the chain of thought to get from the question to where it wants to go. Isn't this exactly what you said is not the correct way to handle this? Can you comment on why, if this is the "wrong" approach, it seems to be what modern models are doing? - Deep Dive into LLMs like ChatGPT

@. Question. I was just reading a paper recently (I believe it was from Anthropic, but sadly I can't find it now) that when they have looked at "thinking models", it appears the final answer is generally already determined well before the reasoning process begins. Then the model just fills in the chain of thought to get from the question to where it wants to go. Isn't this exactly what you said is not the correct way to handle this? Can you comment on why, if this is the "wrong" approach, it seems to be what modern models are doing?

Deep Dive into LLMs like ChatGPT
2025年02月06日  @BangkokBubonaglia 様 
01:52:00 - 03:31:24
@ that is elucidating! This is the first time I’ve heard of this concept. Thank you Andrej. - Deep Dive into LLMs like ChatGPT

@ that is elucidating! This is the first time I’ve heard of this concept. Thank you Andrej.

Deep Dive into LLMs like ChatGPT
2025年02月06日  @seadude 様 
01:55:49 - 03:31:24
This teacher is very good at giving cute examples  Appreciate it and I agree it. - Deep Dive into LLMs like ChatGPT

This teacher is very good at giving cute examples Appreciate it and I agree it.

Deep Dive into LLMs like ChatGPT
2025年02月06日  @saisrikaranpulluri1472 様 
01:55:50 - 03:31:24
tokenization revisited: models struggle with spelling - Deep Dive into LLMs like ChatGPT

tokenization revisited: models struggle with spelling

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:01:11 - 02:04:53
Wow.. love this explanation about why these models fail at character related and counting related task - Deep Dive into LLMs like ChatGPT

Wow.. love this explanation about why these models fail at character related and counting related task

Deep Dive into LLMs like ChatGPT
2025年02月06日  @sumitsp01 様 
02:04:04 - 03:31:24
jagged intelligence - Deep Dive into LLMs like ChatGPT

jagged intelligence

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:04:53 - 02:07:28
supervised finetuning to reinforcement learning - Deep Dive into LLMs like ChatGPT

supervised finetuning to reinforcement learning

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:07:28 - 02:14:42
- Model Training in Practice - Deep Dive into LLMs like ChatGPT

- Model Training in Practice

Deep Dive into LLMs like ChatGPT
2025年02月06日  @TimeStampBuddy 様 
02:07:30 - 03:31:24
reinforcement learning - Deep Dive into LLMs like ChatGPT

reinforcement learning

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:14:42 - 02:27:47
DeepSeek-R1 - Deep Dive into LLMs like ChatGPT

DeepSeek-R1

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:27:47 - 02:42:07
Deepseek says “$3 is a bit expensive for an apple, but maybe they’re organic or something” 😂 - Deep Dive into LLMs like ChatGPT

Deepseek says “$3 is a bit expensive for an apple, but maybe they’re organic or something” 😂

Deep Dive into LLMs like ChatGPT
2025年02月06日  @austinw.1530 様 
02:34:21 - 03:31:24
What a treat!!! At  , haha when you say this is very busy very ugly because of google not being able to nail that was epic hahah - Deep Dive into LLMs like ChatGPT

What a treat!!! At , haha when you say this is very busy very ugly because of google not being able to nail that was epic hahah

Deep Dive into LLMs like ChatGPT
2025年02月06日  @KS-df1cp 様 
02:41:08 - 03:31:24
AlphaGo - Deep Dive into LLMs like ChatGPT

AlphaGo

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:42:07 - 02:48:26
Thank you for the video Andrej! One small note: at , the dashed line in the AlphaGo Zero plot is the Elo of the version of AlphaGo that *defeated* Lee in 2016 (not the Elo of Lee himself). - Deep Dive into LLMs like ChatGPT

Thank you for the video Andrej! One small note: at , the dashed line in the AlphaGo Zero plot is the Elo of the version of AlphaGo that *defeated* Lee in 2016 (not the Elo of Lee himself).

Deep Dive into LLMs like ChatGPT
2025年02月06日  @nkhr2 様 
02:43:05 - 03:31:24
reinforcement learning from human feedback (RLHF) - Deep Dive into LLMs like ChatGPT

reinforcement learning from human feedback (RLHF)

Deep Dive into LLMs like ChatGPT
2025年02月06日 
02:48:26 - 03:09:39
Tiny typo "let's add it to the dataset and give it an ordering that's extremely like a score of 5" -> SHOULD BE "let's add it to the dataset and give it an ordering that's extremely like a score of 1" - Deep Dive into LLMs like ChatGPT

Tiny typo "let's add it to the dataset and give it an ordering that's extremely like a score of 5" -> SHOULD BE "let's add it to the dataset and give it an ordering that's extremely like a score of 1"

Deep Dive into LLMs like ChatGPT
2025年02月06日  @giofou711 様 
03:03:44 - 03:31:24
preview of things to come - Deep Dive into LLMs like ChatGPT

preview of things to come

Deep Dive into LLMs like ChatGPT
2025年02月06日 
03:09:39 - 03:15:15
keeping track of LLMs - Deep Dive into LLMs like ChatGPT

keeping track of LLMs

Deep Dive into LLMs like ChatGPT
2025年02月06日 
03:15:15 - 03:18:34
if you have come till this time stamp then finish the video and go and build something with LLMs.😊 - Deep Dive into LLMs like ChatGPT

if you have come till this time stamp then finish the video and go and build something with LLMs.😊

Deep Dive into LLMs like ChatGPT
2025年02月06日  @Ronak.Purohit 様 
03:16:59 - 03:31:24
where to find LLMs - Deep Dive into LLMs like ChatGPT

where to find LLMs

Deep Dive into LLMs like ChatGPT
2025年02月06日 
03:18:34 - 03:21:46
grand summary - Deep Dive into LLMs like ChatGPT

grand summary

Deep Dive into LLMs like ChatGPT
2025年02月06日 
03:21:46 - 03:31:24
In principle these models are capable of analogies no human has had. Wow😮 - Deep Dive into LLMs like ChatGPT

In principle these models are capable of analogies no human has had. Wow😮

Deep Dive into LLMs like ChatGPT
2025年02月06日  @xnivaxhzne 様 
03:29:54 - 03:31:24
Thank you Andrej for this! Please continue putting contents like this and you are one of the best teachers in this space who can explain in this level of detail. The entire  is pure gold and very grateful that you are putting this level of time and effort ❤ - Deep Dive into LLMs like ChatGPT

Thank you Andrej for this! Please continue putting contents like this and you are one of the best teachers in this space who can explain in this level of detail. The entire is pure gold and very grateful that you are putting this level of time and effort ❤

Deep Dive into LLMs like ChatGPT
2025年02月06日  @ericmathews3619 様 
03:31:23 - 03:31:24

Andrej Karpathy

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。動画はYouTube公式の動画プレイヤーで再生されるため、再生数・収益などはすべて元動画に還元されます。

Timetable

動画タイムテーブル

タイムテーブルが見つかりませんでした。