動画数:214件

- Predict, sample, repeat

- GPT stands for Generative Pretrained Transformer and is a core neural network model.

- - "Visual Exploration in Chapters"

Can I ask what you used in for text-to-voice?Learning up on speech synthesizers myself, the one you used here sounds fairly good at its job, considering how natural it sounds.

@, there is a transliteration from English to Chinese, but the grammar is technically incorrect. As someone that speaks Chinese, and those also at Mandarin Blueprint YT channel also do, it's important to note machine translation has a ways to go.

- - Title: Predicting Next Passage: Technical Terms RemovedPrediction: The passage will discuss the use of machine learning algorithms in medical imaging.

- - "Predicting Next Word: A Different Goal"

- - "Mystifying Success with Added Technology"

- - "Generate Story with Seed Text"

more of the story

- Overview of data flow through a transformer

see underlying distribution

- Inside a transformer

- - "Transformer Data Flow Overview"

You: “let’s kick things off…”Me: “holy F I thought that WAS the deep dive.”

- - "Attention Block for Vector Sequence"

Referring: ", each other and pass information back and forth to update their values." I understood based on the matrix calculations shown later in the video that while inference the information only moves forward (because of masking)while in training only it goes back and forth...?

brown at

- GPT-3 works by predicting next text based on snippets

- - "Predicting Next Text Chunks with AI"

- - Predicting Next Text with Seed

- - Predictive Game of Sampling

- - "Repeating Appended Data"

why for example at not the word with the highest probability is chosen but one with a value far lower?

Chile reference!! ❤

Santiago mentioned 🗣🗣🗣 📢📢📢📢📢📢📢🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥

- - "Exploring Chapter Details"

- Chapter layout

- - "Reviewing Background Knowledge for Second Nature"

- - "Skip to the Good Stuff with Background Knowledge"

- - "Heart of Deep Learning: Attention Blocks"

mentioned at , but don't see any Chapter 7 on your website. I assume you are still working on it.

- The premise of Deep Learning

- Deep learning models use data to determine model behavior through tunable parameters.

- - "Model Behavior Analysis"

- - Predicting Image Labels with AI Model

- - Predicting House Prices with Two Continuous Parameters

What's the middle image ? The left one is linear regression and the right most one is deep learning but the middle one didn't get any mention. Is it referring to decision trees?

12800 Einträgen * Vektor mit 12800 Einträgen = dot product zwischen cat und einem singulären Wort (wichtig: betrifft nur die embedding matrix!) ab12800 Einträgen kommt in die unembedding matrix, dann wird do product mit jedem von den 50000 Token gebildet und alles in Wahrscheinlichkeitswerten umgerechnet (1)

- - "Continuing with Previous Context"

- - "Explaining Choices Through Format Knowledge"

hold on, why do they have to be real? Why wouldn't complex numbers also work

- Deep learning models use weighted sums and non-linear functions to process data

- - "Probabilistic Modeling of Next Tokens"

, what is the source for this information can you please share. I really need it

- - "Charming Model Despite Size"

- Word embeddings

- - "Breaking Up Input Text"

- Words are converted into vectors for machine learning understanding.

- - "Tokens Include Word Pieces & Punctuation"

- - "Broken into Words"

embedding matrix

- - "Training AI Models with 50k Embeddings"

damn that was schnice

- - "New Foundation Unveiled"

: Use of gemsim to get closest words to tower

The King - Queen analogy () — It's a classic, but it still gives me goosebumps. The way word embeddings capture relationships like gender and royalty? Chef's kiss. 👑

- Model learns to associate directions with specific concepts

- - "Female Monarch? Find It Here!"

@ Everything I've been studying for the past three months suddenly snapped into focus.

Living for the Kyne reference at

bro! 😂

- - "Dot Product: A Way to Measure Vector Angle"

ein Token Vektor mit ca.

- - "Hypothesize Embedding Model"

- GPT-3 utilizes a large embedding matrix for word representations.

Getting increasingly higher results from the dot products of the plurality vector and increasing numbers is crazy!

Größe der embedding matrix

- - "617 Million Weights in Total"

note: words

- Embeddings beyond words

This the the clearest layman explanation of how attention works that I've ever seen. Amazing.

- - "Predictive AI Model Empowerment"

12800 Einträgen * Vektor mit 12800 Einträgen = dot product zwischen cat und einem singulären Wort (wichtig: betrifft nur die embedding matrix!) ab20:58!!! letzter Vektor mit 12800 Einträgen kommt in die unembedding matrix, dann wird do product mit jedem von den 50000 Token gebildet und alles in Wahrscheinlichkeitswerten umgerechnet 24:03 Temperatur (5)

- - Trained With Context Size for GPT-3

- GPT-3 is trained with a context size of 2048

- - "Incorporating predictions for fluent conversation"

At I'm curious about what the original question was. I was so surprised to see restaurants in Santiago, and specifically Ñuñoa!!! Hahahaha that's very specific! 🇨🇱

- Unembedding

Unembedding explained () — This one gets overlooked a lot, but it's like the decoder ring of the whole system. I appreciate the spotlight.

- - "Predict Next Token Probability"

- - "Snape Highly Rated Among Harry Potter Fans"

Disagree with Snape. The correct answer is Umbridge. But I assume that since Snape has far more occurrences than Umbridge in the training set, thus far more association can be established between Snape and a negative traits. This is something AI need to learn to grasp the amplitude of a character’s emotional impact to the readers lol

!!! letzter Vektor mit

at around there was a thing i do not understand that why the other vectors of the last layer are not used only the last word vector is interacted with the Unembedding matrix ?

- - "Unimbedding Matrix: A Key Player"

As an ML researcher this is an amazing video ❤. But please allow me to nitpick a little at

- Understanding the purpose and function of the softmax function in deep learning

- - Total Billions Ahead: Mini-Lesson Ends Chapter

- Softmax with temperature

Я много не понимаю что касается математических вычислений и что бы понять мне надо будет несколько раз пересмотреть этот цикл роликов и прочитать сопутствующую литературу. В "Softmax" переводит диапазон от минимума к максимуму в "процентное" соотношение от 0 к 1, а температура по сути даёт возможность градиентно инвертировать эти значения (при условии если не ограничивать диапазон).

- - "Next Word Distribution"

- - "Smaller values near 0 with 1"

Minor correction about how softmax works. Exponents can get out of hand very quickly and overwhelm the computer's memory. So to prevent that, what we can do is simply subtract the maximum value in our vector from all the other values, effectively turning the new maximum value into 0 and thus e^0 =1, with all the other values being smaller than 1. Then, when you use softmax to turn this new vector into a probability distribution, it's identical to what you would've gotten originally, but without the massive computational problem. It's really ingenious if you think about it.Disclaimer: I'm not an AI expert. I'm just reiterating what I learned from a sentdex video.

- - "Sum Positive Values"

Shouldn't the x values start from 0 too since when we are expressing the aggregate, we are starting from x0 and going till n-1? Like shouldn't it be x0 to x6 for the 7 values?

- - "ChatchyPt Creates Next Word"

Softmax with temperature () — Loved how you broke this down. Watching a robot sweat between “cat” and “bratwurst” at high temps? Hilarious and accurate.

Temperatur

At around the mark, I noticed a slight discrepancy in the indexing of the softmax . Currently, it starts from e^x1 instead of e^x0. To maintain consistency, you might consider adjusting the limits to either start from 1 to N or begin from e^x0. It's a minor detail, but I thought it might enhance the overall clarity of your presentation. Keep up the excellent work!

- Isn't it impossible to set temperature to zero, because you can't devide by zero?

?

- GPT-3 uses temperature to control word generation.

, the tempeature cannot be 0, closer to zero, the "softmax with tempeature" will sharpen the largest value

A little, humble remark: At , you're talking about the Temperature in Softmax. I think setting T to 0 does not work, as it would lead to "division by zero". Maybe I get it wrong ...

*approximately 0 (last time I checked you couldn't divide by zero) (Even though Im smartassing I find this video freakin awesome!!)

- - "Maximizing Next Token Predictions with GPT-3"

- - Next Word Prediction Logits

- - "Laying Foundations for Attention Understanding"

- Up next

- - "Next Chapter Awaits Smooth Ride"

- - "Next Chapter Available for Review"

- Dive into attention and support the channel
