- Since this feat only took on the order of a year, the trainers must actually be doing quintillions (10^18) of operations per second🤯（00:04:27 - 00:07:58）
Large Language Models explained briefly

Dig deeper here: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
Technical details as a talk: https://youtu.be/KJtZARuO3JY
This was made for an exhibit at the Computer History Museum: https://computerhistory.org/
Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

No secret end-screen vlog for this one, the end-screen real estate was all full!

------------------

These animations are largely made using a custom Python library, manim. See the FAQ comments here:
https://3b1b.co/faq #manim
https://github.com/3b1b/manim
https://github.com/ManimCommunity/manim/

All code for specific videos is visible here:
https://github.com/3b1b/videos/

The music is by Vincent Rubinetti.
https://www.vincentrubinetti.com
https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown
https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u

------------------

3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly.

Mailing list: https://3blue1brown.substack.com
Twitter: https://twitter.com/3blue1brown
Instagram: https://www.instagram.com/3blue1brown
Reddit: https://www.reddit.com/r/3blue1brown
Facebook: https://www.facebook.com/3blue1brown
Patreon: https://patreon.com/3blue1brown
Website: https://www.3blue1brown.com

* *** Purpose of the video:* The video was created as a short explainer for an exhibit at the Computer History Museum on large language models (LLMs).

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:00:00 - 00:00:43

Wow, the museum piece at looks like the teletype machine I used in high school to learn BASIC.Now it's safely behind glass on display as an ancient relic. Wow.

Large Language Models explained briefly

2024年11月21日　 @DennisDavisEdu 様　

00:00:10 - 00:07:58

👋

Large Language Models explained briefly

2024年11月21日　 @BankBhandari 様　

00:00:22 - 00:07:58

* *** Introduction to LLMs:* LLMs are sophisticated mathematical functions that predict the next word in a sequence of text by assigning probabilities to all possible words.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:00:43 - 00:01:15

* *** Chatbot Functionality:* Chatbots utilize LLMs to generate responses by repeatedly predicting the next word based on the user's input and the ongoing conversation.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:01:15 - 00:02:10

And that's why Mr.Chatgpt behave like that when answering.

Large Language Models explained briefly

2024年11月21日　 @ahmedalhomaide4416 様　

00:01:15 - 00:07:58

You lost the nounce of tokens not being words (probably on purpose) and then used exactly when describing generating tokens .

Large Language Models explained briefly

2024年11月21日　 @connectety 様　

00:01:15 - 00:07:58

- Does that mean there’s a web request back to the LLM after each word is produced? If it’s just one request why does each word seem to stream in as opposed to one block response?

Large Language Models explained briefly

2024年11月21日　 @yanikjayaram 様　

00:01:30 - 00:07:58

Ah, yes, "Paris is a place... in Paris." Thank you, AI, very cool!

Large Language Models explained briefly

2024年11月21日　 @unholycrusader69 様　

00:01:32 - 00:07:58

the wim Wenders Paris Texas reference ❤️

Large Language Models explained briefly

2024年11月21日　 @KillBillVaggeli13 様　

00:01:33 - 00:07:58

How important is the system prompt really? And is it even worth fine tuning a model when you could just alter this prompt and tada you created an expert on a specific topic? Which use cases is it worth to fine tune for?

Large Language Models explained briefly

2024年11月21日　 @towb0at 様　

00:01:38 - 00:07:58

- Извиняюсь за Каламбур и Рекурсию, интересно Можно ли понимать параметры В каком-то смысле Смыслом слов? Может всё не так-то уж и сложно и в то же время не так-то просто И у нас с машинами больше общего чем мы думаем. То есть аналогичным образом у нас в голове услышанные Или прочитанные слова Имеют какие-то параметры. А если говорить человеческим языком то какой-то смысл, Понятие. Таким образом мышление не такая уж и загадка. Мы тоже предсказываем следующее слово. У нас тоже параметры в зависимости от входных выходных данных могут менять значения. Так что не надо нас недооценивать. Кожаные мешки - это такие же алгоритмы работающие по тем же принципам только очень медленные. у нас тоже есть алгоритм Трансформера. Особенно когда смотрим или вспоминаем образы, а не слушаем. Наверное поэтому в некоторых эзотерических практиках и пытались выключить Внутренний диалог. Потому что речь - это примитивная машина тюринга ... А образы графика - это Параллельные вычисления. Поэтому лучше один раз увидеть Чем сто раз услышать. Что собственно здесь и продемонстрировано. Опять рекурсия.Но возвращаюсь к теме, думаю у нас тоже есть и рекуррентные нейронные Сети И всё остальное что присуще нейросетям.А значит это ещё один бонус в копилку того что нас можно оцифровать. Я имею в виду именно личность.А вообще этот канал отличная Находка! Спасибо алгоритмам Ютуба. Жаль нет времени, но подписался периодически когда буду есть смотреть. Тем более что в отличие от большинства видео здесь действительно видео. И есть что посмотреть. Особенно такое видео было бы полезно людям которые хотят понять как работают нейросети. Но именно тем кто хочет понять А не тем кто зациклена повторяет Что мы все не знаем как они работают.

Large Language Models explained briefly

2024年11月21日　 @romanbolgar 様　

00:01:54 - 00:07:58

* *** Training LLMs:* LLMs are trained on vast amounts of text data (e.g., from the internet) to learn patterns and relationships between words. This process involves adjusting billions of parameters within the model.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:02:10 - 00:03:00

I love how OpenAI said they can't create tools like ChatGPT without stealing. Truly makes you wonder what the hell these companies even do

Large Language Models explained briefly

2024年11月21日　 @volodyad195 様　

00:02:14 - 00:07:58

* *** Backpropagation:* The training process uses backpropagation to refine the model's parameters, increasing the probability of predicting the correct next word in the training examples.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:03:00 - 00:04:27

one billion addition and multiplication per second? easy. i've billions of neurons doing that every milisecond.

Large Language Models explained briefly

2024年11月21日　 @gorkemvids4839 様　

00:03:20 - 00:07:58

🤯 I’m somewhat comfortable with large numbers and to think we’re in the early stages! I’m excited to see how this evolves over the next decade(s).

Large Language Models explained briefly

2024年11月21日　 @45414 様　

00:03:44 - 00:07:58

it's incredible they've been working on large language models for over a hundred million years

Large Language Models explained briefly

2024年11月21日　 @EggZu_ 様　

00:04:20 - 00:07:58

* *** Reinforcement Learning with Human Feedback:* After pre-training on massive text datasets, LLMs undergo further training through reinforcement learning, where human feedback is used to improve the quality and helpfulness of their responses.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:04:27 - 00:05:05

- Since this feat only took on the order of a year, the trainers must actually be doing quintillions (10^18) of operations per second🤯

Large Language Models explained briefly

2024年11月21日　 @SpencerTwiddy 様　

00:04:27 - 00:07:58

million of years. mkay ()

Large Language Models explained briefly

2024年11月21日　 @NiMareQ 様　

00:04:27 - 00:07:58

"Workers" casually washes over the thousands of underpaid/enslaved people in exploited countries that these models depend on to perform the RLHF. If the museum doesn't address this elsewhere then it's a bad museum

Large Language Models explained briefly

2024年11月21日　 @Dysiode 様　

00:04:46 - 00:07:58

afaik RLHF does not use human annotation for reinforcement learning on the base model. Instead, Human Feedback is used to align a reward model for the RL process on the base model.

Large Language Models explained briefly

2024年11月21日　 @PAiWExHD 様　

00:04:47 - 00:07:58

"Workers flag unhelpful or problematic predictions ...making them more likely to give predictions that users prefer."A bit shocking that he says this with a straight face and seemingly takes no issue with the ethical ramifications of this practice.

Large Language Models explained briefly

2024年11月21日　 @Barricade706 様　

00:04:49 - 00:07:58

this staggering amount of computation is also only made possible by an equally staggering amount of power and water consumption. AI training at this scale is exasperating climate change by rapidly increasing the amount of power big tech companies like Google are using.

Large Language Models explained briefly

2024年11月21日　 @VoonNBuddies 様　

00:04:59 - 00:07:58

at Why are the two “the” not associated to the same vector (numbers)?

Large Language Models explained briefly

2024年11月21日　 @cuccaio83 様　

00:05:00 - 00:07:58

* *** GPUs and Parallel Processing:* Training large language models requires immense computational power, which is made possible by GPUs that can perform many calculations in parallel.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:05:05 - 00:05:25

* *** Introduction to Transformers:* Transformers are a type of LLM that process text in parallel rather than sequentially, enabling them to handle larger datasets and learn more complex relationships.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:05:25 - 00:05:59

And that's why ChatGPT doesn't know how many Rs are in "strawberry"

Large Language Models explained briefly

2024年11月21日　 @mitigatedrisk4264 様　

00:05:31 - 00:07:58

While the beginning of this video provides a basic overview of LLMs using the "prediction metaphor", it lacks depth in explaining how these models process language and generate text. It helps a bit when you reach but the way these numbers are generated is still "black box" magic. This means the connection between word-level processing, sentence structure, and overall meaning is not adequately addressed.

Large Language Models explained briefly

2024年11月21日　 @donaldaxel 様　

00:05:35 - 00:07:58

I would have also given more of an explanation of the “long list of numbers” —spend a few seconds describing how words are mapped out in vector space (the classic _man:woman::king:queen_ type of thing)—so they just don’t seem like random numbers. (I’ve seen videos about deciphering the language of other animals explaining this type of thing, i.e., for, arguably, an even _less_ technically-inclined audience, and the explanation doesn’t seem overly detailed.) Plus it’s pretty interesting.

Large Language Models explained briefly

2024年11月21日　 @jeff__w 様　

00:05:40 - 00:07:58

* *** Attention Mechanism:* Transformers utilize an "attention" mechanism that allows different parts of the input text to interact and influence each other, enhancing the model's understanding of context.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:05:59 - 00:06:23

"talk tuah one another". My brain is rotting, but great video

Large Language Models explained briefly

2024年11月21日　 @EyeSackBzar 様　

00:06:01 - 00:07:58

Cheque

Large Language Models explained briefly

2024年11月21日　 @aaravkhanna5355 様　

00:06:08 - 00:07:58

* *** Feed-Forward Neural Networks:* In addition to attention, transformers also use feed-forward neural networks to further enhance their ability to capture patterns in language.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:06:23 - 00:07:19

to

Large Language Models explained briefly

2024年11月21日　 @yerwol 様　

00:06:50 - 00:07:10

Awesome stuff! One piece of feedback: At , you use the word "vector", but up until now you've only been saying "lists of numbers". If this is for a general audience, I think throwing in new terminology right at the end without explaining it could be confusing.

Large Language Models explained briefly

2024年11月21日　 @Qudito 様　

00:06:53 - 00:07:58

Is it just me or does this bit get a lot louder on the audio mix?

Large Language Models explained briefly

2024年11月21日　 @yerwol 様　

00:07:10 - 00:07:58

* *** Emergent Behavior:* The specific behavior of LLMs is an emergent phenomenon arising from the interplay of billions of parameters tuned during training, making it difficult to fully understand their decision-making process.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:07:19 - 00:07:48

you casually mention emergent behavior, have you been exposed to multilevel evolutionary selection? There appears to be great excitement about emergent behaviors in complex adaptive systems.I haven’t dug too deep into literature so the question might already be answered; what is the minimum complexity required for specific types of emergent behaviors?

Large Language Models explained briefly

2024年11月21日　 @MatthewMosher-s7j 様　

00:07:19 - 00:07:58

"The words that it generates are uncannily fluent, fascinating and even useful."

Large Language Models explained briefly

2024年11月21日　 @philipmiesbauer 様　

00:07:36 - 00:07:58

* *** Where to learn more:* The video concludes by suggesting a visit to the Computer History Museum exhibit and recommending other resources (a deep learning series and a technical talk) for those interested in learning more about transformers and attention.

Large Language Models explained briefly

2024年11月21日　 @wolpumba4099 様　

00:07:48 - 00:07:58

Never thought in a million years that I would ever see Kendrick Lamar in a 3B1B video

Large Language Models explained briefly

2024年11月21日　 @unholycrusader69 様　

00:07:48 - 00:07:58

チャンネル登録

3Blue1Brown

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。動画はYouTube公式の動画プレイヤーで再生されるため、再生数・収益などはすべて元動画に還元されます。

概要カレンダータイムライン動画一覧タイムテーブル YouTube配信チャンネル分析

Timetable

動画タイムテーブル

タイムテーブルが見つかりませんでした。

- Since this feat only took on the order of a year, the trainers must actually be doing quintillions (10^18) of operations per second🤯（00:04:27 - 00:07:58）Large Language Models explained briefly

* *** Purpose of the video:* The video was created as a short explainer for an exhibit at the Computer History Museum on large language models (LLMs).

Wow, the museum piece at looks like the teletype machine I used in high school to learn BASIC.Now it's safely behind glass on display as an ancient relic. Wow.

👋

* *** Introduction to LLMs:* LLMs are sophisticated mathematical functions that predict the next word in a sequence of text by assigning probabilities to all possible words.

* *** Chatbot Functionality:* Chatbots utilize LLMs to generate responses by repeatedly predicting the next word based on the user's input and the ongoing conversation.

And that's why Mr.Chatgpt behave like that when answering.

You lost the nounce of tokens not being words (probably on purpose) and then used exactly when describing generating tokens .

- Does that mean there’s a web request back to the LLM after each word is produced? If it’s just one request why does each word seem to stream in as opposed to one block response?

Ah, yes, "Paris is a place... in Paris." Thank you, AI, very cool!

the wim Wenders Paris Texas reference ❤️

How important is the system prompt really? And is it even worth fine tuning a model when you could just alter this prompt and tada you created an expert on a specific topic? Which use cases is it worth to fine tune for?

* *** Training LLMs:* LLMs are trained on vast amounts of text data (e.g., from the internet) to learn patterns and relationships between words. This process involves adjusting billions of parameters within the model.

I love how OpenAI said they can't create tools like ChatGPT without stealing. Truly makes you wonder what the hell these companies even do

I could watch an animation like this for a while

It was the best of times, it was the blurst of times.

@ it was the best of times, it was the blurst of times??

* *** Backpropagation:* The training process uses backpropagation to refine the model's parameters, increasing the probability of predicting the correct next word in the training examples.

one billion addition and multiplication per second? easy. i've billions of neurons doing that every milisecond.

🤯 I’m somewhat comfortable with large numbers and to think we’re in the early stages! I’m excited to see how this evolves over the next decade(s).

it's incredible they've been working on large language models for over a hundred million years

* *** Reinforcement Learning with Human Feedback:* After pre-training on massive text datasets, LLMs undergo further training through reinforcement learning, where human feedback is used to improve the quality and helpfulness of their responses.

- Since this feat only took on the order of a year, the trainers must actually be doing quintillions (10^18) of operations per second🤯

million of years. mkay ()

"Workers" casually washes over the thousands of underpaid/enslaved people in exploited countries that these models depend on to perform the RLHF. If the museum doesn't address this elsewhere then it's a bad museum

afaik RLHF does not use human annotation for reinforcement learning on the base model. Instead, Human Feedback is used to align a reward model for the RL process on the base model.

"Workers flag unhelpful or *problematic* predictions ...making them more likely to give predictions that users *prefer*."A bit shocking that he says this with a straight face and seemingly takes no issue with the ethical ramifications of this practice.

this staggering amount of computation is also only made possible by an equally staggering amount of power and water consumption. AI training at this scale is exasperating climate change by rapidly increasing the amount of power big tech companies like Google are using.

at Why are the two “the” not associated to the same vector (numbers)?

* *** GPUs and Parallel Processing:* Training large language models requires immense computational power, which is made possible by GPUs that can perform many calculations in parallel.

* *** Introduction to Transformers:* Transformers are a type of LLM that process text in parallel rather than sequentially, enabling them to handle larger datasets and learn more complex relationships.

And that's why ChatGPT doesn't know how many Rs are in "strawberry"

* *** Attention Mechanism:* Transformers utilize an "attention" mechanism that allows different parts of the input text to interact and influence each other, enhancing the model's understanding of context.

"talk tuah one another". My brain is rotting, but great video

Cheque

* *** Feed-Forward Neural Networks:* In addition to attention, transformers also use feed-forward neural networks to further enhance their ability to capture patterns in language.

to

Awesome stuff! One piece of feedback: At , you use the word "vector", but up until now you've only been saying "lists of numbers". If this is for a general audience, I think throwing in new terminology right at the end without explaining it could be confusing.

Is it just me or does this bit get a lot louder on the audio mix?

* *** Emergent Behavior:* The specific behavior of LLMs is an emergent phenomenon arising from the interplay of billions of parameters tuned during training, making it difficult to fully understand their decision-making process.

"The words that it generates are uncannily fluent, fascinating and even useful."

* *** Where to learn more:* The video concludes by suggesting a visit to the Computer History Museum exhibit and recommending other resources (a deep learning series and a technical talk) for those interested in learning more about transformers and attention.

Never thought in a million years that I would ever see Kendrick Lamar in a 3B1B video

3Blue1Brown

Timetable

よく話題になっている単語

- Since this feat only took on the order of a year, the trainers must actually be doing quintillions (10^18) of operations per second🤯（00:04:27 - 00:07:58）
Large Language Models explained briefly

"Workers flag unhelpful or problematic predictions ...making them more likely to give predictions that users prefer."A bit shocking that he says this with a straight face and seemingly takes no issue with the ethical ramifications of this practice.