
* *** Purpose of the video:* The video was created as a short explainer for an exhibit at the Computer History Museum on large language models (LLMs).

Wow, the museum piece at looks like the teletype machine I used in high school to learn BASIC.Now it's safely behind glass on display as an ancient relic. Wow.

👋

* *** Introduction to LLMs:* LLMs are sophisticated mathematical functions that predict the next word in a sequence of text by assigning probabilities to all possible words.

* *** Chatbot Functionality:* Chatbots utilize LLMs to generate responses by repeatedly predicting the next word based on the user's input and the ongoing conversation.

And that's why Mr.Chatgpt behave like that when answering.

You lost the nounce of tokens not being words (probably on purpose) and then used exactly when describing generating tokens .

- Does that mean there’s a web request back to the LLM after each word is produced? If it’s just one request why does each word seem to stream in as opposed to one block response?

Ah, yes, "Paris is a place... in Paris." Thank you, AI, very cool!

the wim Wenders Paris Texas reference ❤️

How important is the system prompt really? And is it even worth fine tuning a model when you could just alter this prompt and tada you created an expert on a specific topic? Which use cases is it worth to fine tune for?

- Извиняюсь за Каламбур и Рекурсию, интересно Можно ли понимать параметры В каком-то смысле Смыслом слов? Может всё не так-то уж и сложно и в то же время не так-то просто И у нас с машинами больше общего чем мы думаем. То есть аналогичным образом у нас в голове услышанные Или прочитанные слова Имеют какие-то параметры. А если говорить человеческим языком то какой-то смысл, Понятие. Таким образом мышление не такая уж и загадка. Мы тоже предсказываем следующее слово. У нас тоже параметры в зависимости от входных выходных данных могут менять значения. Так что не надо нас недооценивать. Кожаные мешки - это такие же алгоритмы работающие по тем же принципам только очень медленные. у нас тоже есть алгоритм Трансформера. Особенно когда смотрим или вспоминаем образы, а не слушаем. Наверное поэтому в некоторых эзотерических практиках и пытались выключить Внутренний диалог. Потому что речь - это примитивная машина тюринга ... А образы графика - это Параллельные вычисления. Поэтому лучше один раз увидеть Чем сто раз услышать. Что собственно здесь и продемонстрировано. Опять рекурсия.Но возвращаюсь к теме, думаю у нас тоже есть и рекуррентные нейронные Сети И всё остальное что присуще нейросетям.А значит это ещё один бонус в копилку того что нас можно оцифровать. Я имею в виду именно личность.А вообще этот канал отличная Находка! Спасибо алгоритмам Ютуба. Жаль нет времени, но подписался периодически когда буду есть смотреть. Тем более что в отличие от большинства видео здесь действительно видео. И есть что посмотреть. Особенно такое видео было бы полезно людям которые хотят понять как работают нейросети. Но именно тем кто хочет понять А не тем кто зациклена повторяет Что мы все не знаем как они работают.

* *** Training LLMs:* LLMs are trained on vast amounts of text data (e.g., from the internet) to learn patterns and relationships between words. This process involves adjusting billions of parameters within the model.

I love how OpenAI said they can't create tools like ChatGPT without stealing. Truly makes you wonder what the hell these companies even do

I could watch an animation like this for a while

It was the best of times, it was the blurst of times.

@ it was the best of times, it was the blurst of times??

* *** Backpropagation:* The training process uses backpropagation to refine the model's parameters, increasing the probability of predicting the correct next word in the training examples.

one billion addition and multiplication per second? easy. i've billions of neurons doing that every milisecond.

🤯 I’m somewhat comfortable with large numbers and to think we’re in the early stages! I’m excited to see how this evolves over the next decade(s).

it's incredible they've been working on large language models for over a hundred million years

* *** Reinforcement Learning with Human Feedback:* After pre-training on massive text datasets, LLMs undergo further training through reinforcement learning, where human feedback is used to improve the quality and helpfulness of their responses.

- Since this feat only took on the order of a year, the trainers must actually be doing quintillions (10^18) of operations per second🤯

million of years. mkay ()

"Workers" casually washes over the thousands of underpaid/enslaved people in exploited countries that these models depend on to perform the RLHF. If the museum doesn't address this elsewhere then it's a bad museum

afaik RLHF does not use human annotation for reinforcement learning on the base model. Instead, Human Feedback is used to align a reward model for the RL process on the base model.

"Workers flag unhelpful or *problematic* predictions ...making them more likely to give predictions that users *prefer*."A bit shocking that he says this with a straight face and seemingly takes no issue with the ethical ramifications of this practice.

this staggering amount of computation is also only made possible by an equally staggering amount of power and water consumption. AI training at this scale is exasperating climate change by rapidly increasing the amount of power big tech companies like Google are using.

at Why are the two “the” not associated to the same vector (numbers)?

* *** GPUs and Parallel Processing:* Training large language models requires immense computational power, which is made possible by GPUs that can perform many calculations in parallel.

* *** Introduction to Transformers:* Transformers are a type of LLM that process text in parallel rather than sequentially, enabling them to handle larger datasets and learn more complex relationships.

And that's why ChatGPT doesn't know how many Rs are in "strawberry"

While the beginning of this video provides a basic overview of LLMs using the "prediction metaphor", it lacks depth in explaining how these models process language and generate text. It helps a bit when you reach but the way these numbers are generated is still "black box" magic. This means the connection between word-level processing, sentence structure, and overall meaning is not adequately addressed.

I would have also given more of an explanation of the “long list of numbers” —spend a few seconds describing how words are mapped out in vector space (the classic _man:woman::king:queen_ type of thing)—so they just don’t seem like random numbers. (I’ve seen videos about deciphering the language of other animals explaining this type of thing, i.e., for, arguably, an even _less_ technically-inclined audience, and the explanation doesn’t seem overly detailed.) Plus it’s pretty interesting.

* *** Attention Mechanism:* Transformers utilize an "attention" mechanism that allows different parts of the input text to interact and influence each other, enhancing the model's understanding of context.

"talk tuah one another". My brain is rotting, but great video

Cheque

* *** Feed-Forward Neural Networks:* In addition to attention, transformers also use feed-forward neural networks to further enhance their ability to capture patterns in language.

to

Awesome stuff! One piece of feedback: At , you use the word "vector", but up until now you've only been saying "lists of numbers". If this is for a general audience, I think throwing in new terminology right at the end without explaining it could be confusing.

Is it just me or does this bit get a lot louder on the audio mix?

* *** Emergent Behavior:* The specific behavior of LLMs is an emergent phenomenon arising from the interplay of billions of parameters tuned during training, making it difficult to fully understand their decision-making process.

you casually mention emergent behavior, have you been exposed to multilevel evolutionary selection? There appears to be great excitement about emergent behaviors in complex adaptive systems.I haven’t dug too deep into literature so the question might already be answered; what is the minimum complexity required for specific types of emergent behaviors?

"The words that it generates are uncannily fluent, fascinating and even useful."

* *** Where to learn more:* The video concludes by suggesting a visit to the Computer History Museum exhibit and recommending other resources (a deep learning series and a technical talk) for those interested in learning more about transformers and attention.
