- Building makemore Part 2: MLP

Building makemore Part 2: MLP

We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.).

Links:
- makemore on github: https:...
We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.).

Links:
- makemore on github: https://github.com/karpathy/makemore
- jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part2_mlp.ipynb
- collab notebook (new)!!!: https://colab.research.google.com/drive/1YIfmkftLrz6MPTOO9Vwqrop2Q5llHIGK?usp=sharing
- Bengio et al. 2003 MLP language model paper (pdf): https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
- my website: https://karpathy.ai
- my twitter:
- (new) Neural Networks: Zero to Hero series Discord channel: https://discord.gg/3zy8kqD9Cp , for people who'd like to chat more and go beyond youtube comments

Useful links:
- PyTorch internals ref http://blog.ezyang.com/2019/05/pytorch-internals/

Exercises:
- E01: Tune the hyperparameters of the training to beat my best validation loss of 2.2
- E02: I was not careful with the intialization of the network in this video. (1) What is the loss you'd get if the predicted probabilities at initialization were perfectly uniform? What loss do we achieve? (2) Can you tune the initialization to get a starting loss that is much more similar to (1)?
- E03: Read the Bengio et al 2003 paper (link above), implement and try any idea from the paper. Did it work?

Chapters:
00:00:00 intro
00:01:48 Bengio et al. 2003 (MLP language model) paper walkthrough
00:09:03 (re-)building our training dataset
00:12:19 implementing the embedding lookup table
00:18:35 implementing the hidden layer + internals of torch.Tensor: storage, views
00:29:15 implementing the output layer
00:29:53 implementing the negative log likelihood loss
00:32:17 summary of the full network
00:32:49 introducing F.cross_entropy and why
00:37:56 implementing the training loop, overfitting one batch
00:41:25 training on the full dataset, minibatches
00:45:40 finding a good initial learning rate
00:53:20 splitting up the dataset into train/val/test splits and why
01:00:49 experiment: larger hidden layer
01:05:27 visualizing the character embeddings
01:07:16 experiment: larger embedding size
01:11:46 summary of our final code, conclusion
01:13:24 sampling from the model
01:14:55 google collab (new!!) notebook advertisement

#deep learning #neural network #multilayer perceptron #nlp #language model
intro - Building makemore Part 2: MLP

intro

Building makemore Part 2: MLP
2022年09月12日 
00:00:00 - 00:01:48
, - Building makemore Part 2: MLP

,

Building makemore Part 2: MLP
2022年09月12日  @akshatsingh6036 様 
00:00:00 - 01:15:40
[<1809.89it/s]Last Loss: 2.403459072113037Best Loss: 1.4457638263702393 At Epoch: 25480============================================================ - Building makemore Part 2: MLP

[<1809.89it/s]Last Loss: 2.403459072113037Best Loss: 1.4457638263702393 At Epoch: 25480============================================================

Building makemore Part 2: MLP
2022年09月12日  @akshatsingh6036 様 
00:00:27 - 00:00:00
PS. At  I was just uber curious about his previous searches, so I google them: - Building makemore Part 2: MLP

PS. At I was just uber curious about his previous searches, so I google them:

Building makemore Part 2: MLP
2022年09月12日  @JuanManuelBerros 様 
00:01:34 - 00:27:27
Bengio et al. 2003 (MLP language model) paper walkthrough - Building makemore Part 2: MLP

Bengio et al. 2003 (MLP language model) paper walkthrough

Building makemore Part 2: MLP
2022年09月12日 
00:01:48 - 00:09:03
Why space is small? Even in two-dimensional space you can place an infinite number of points - Building makemore Part 2: MLP

Why space is small? Even in two-dimensional space you can place an infinite number of points

Building makemore Part 2: MLP
2022年09月12日  @14types 様 
00:03:25 - 01:15:40
(re-)building our training dataset - Building makemore Part 2: MLP

(re-)building our training dataset

Building makemore Part 2: MLP
2022年09月12日 
00:09:03 - 00:12:19
implementing the embedding lookup table - Building makemore Part 2: MLP

implementing the embedding lookup table

Building makemore Part 2: MLP
2022年09月12日 
00:12:19 - 00:18:35
Every time I think I finally understand what's happening, he does something like this:  😅 - Building makemore Part 2: MLP

Every time I think I finally understand what's happening, he does something like this: 😅

Building makemore Part 2: MLP
2022年09月12日  @nanuqcz 様 
00:17:28 - 01:15:40
implementing the hidden layer + internals of torch.Tensor: storage, views - Building makemore Part 2: MLP

implementing the hidden layer + internals of torch.Tensor: storage, views

Building makemore Part 2: MLP
2022年09月12日 
00:18:35 - 00:29:15
-dimensional vertically scrollable space to describe the functions of PyTorch () - Building makemore Part 2: MLP

-dimensional vertically scrollable space to describe the functions of PyTorch ()

Building makemore Part 2: MLP
2022年09月12日  @JavArButt 様 
00:20:25 - 01:15:40
at  I think it's supposed to be first letter not first word. It's first word in the paper but first letter in the example - Building makemore Part 2: MLP

at I think it's supposed to be first letter not first word. It's first word in the paper but first letter in the example

Building makemore Part 2: MLP
2022年09月12日  @JohnDoe-ph6vb 様 
00:21:24 - 01:15:40
At , when he says words does he mean the 3 character sequence that was made by block size?  And, so, when he refers to the picture behind him, does he mean each of those three blocks represents a indice in the block_size array? - Building makemore Part 2: MLP

At , when he says words does he mean the 3 character sequence that was made by block size? And, so, when he refers to the picture behind him, does he mean each of those three blocks represents a indice in the block_size array?

Building makemore Part 2: MLP
2022年09月12日  @dericortiz2713 様 
00:21:39 - 01:15:40
what about just `emb_reshaped = emb.reshape((emb.shape[0], emb.shape[1]*emb.shape[2]))` ? - Building makemore Part 2: MLP

what about just `emb_reshaped = emb.reshape((emb.shape[0], emb.shape[1]*emb.shape[2]))` ?

Building makemore Part 2: MLP
2022年09月12日  @sam.rodriguez 様 
00:23:47 - 01:15:40
Of course! Memory itself is a one dimensional "tensor". :D - Building makemore Part 2: MLP

Of course! Memory itself is a one dimensional "tensor". :D

Building makemore Part 2: MLP
2022年09月12日  @bloody_albatross 様 
00:24:50 - 01:15:40
for the PyTorch internals video (@) - Building makemore Part 2: MLP

for the PyTorch internals video (@)

Building makemore Part 2: MLP
2022年09月12日  @rezathr8968 様 
00:25:36 - 01:15:40
Please create the "entire video about the internals of pytorch" that you mentioned in . And thank you so much for the content, Andrej !! - Building makemore Part 2: MLP

Please create the "entire video about the internals of pytorch" that you mentioned in . And thank you so much for the content, Andrej !!

Building makemore Part 2: MLP
2022年09月12日  @pedroaugustoribeirogomes7999 様 
00:25:40 - 01:15:40
at  minute mark at the moment and gotta say, pytorch is amazing. so wonderful how easy they make it for devs with those small tricks. - Building makemore Part 2: MLP

at minute mark at the moment and gotta say, pytorch is amazing. so wonderful how easy they make it for devs with those small tricks.

Building makemore Part 2: MLP
2022年09月12日  @atac8538 様 
00:27:24 - 01:15:40
matthew -31>Then the governor’s soldiers took Jesus into the Praetorium and gathered the whole company of soldiers around him. They stripped him and put a scarlet robe on him, and then twisted together a crown of thorns and set it on his head. They put a staff in his right hand. Then they knelt in front of him and mocked him. “Hail, king of the Jews!” they said. They spit on him, and took the staff and struck him on the head again and again. After they had mocked him, they took off the robe and put his own clothes on him. Then they led him away to crucify him. - Building makemore Part 2: MLP

matthew -31>Then the governor’s soldiers took Jesus into the Praetorium and gathered the whole company of soldiers around him. They stripped him and put a scarlet robe on him, and then twisted together a crown of thorns and set it on his head. They put a staff in his right hand. Then they knelt in front of him and mocked him. “Hail, king of the Jews!” they said. They spit on him, and took the staff and struck him on the head again and again. After they had mocked him, they took off the robe and put his own clothes on him. Then they led him away to crucify him.

Building makemore Part 2: MLP
2022年09月12日  @JuanManuelBerros 様 
00:27:27 - 01:15:40
proverbs>You will have plenty of goats’ milk to feed your family and to nourish your female servants. - Building makemore Part 2: MLP

proverbs>You will have plenty of goats’ milk to feed your family and to nourish your female servants.

Building makemore Part 2: MLP
2022年09月12日  @JuanManuelBerros 様 
00:27:27 - 00:27:27
implementing the output layer - Building makemore Part 2: MLP

implementing the output layer

Building makemore Part 2: MLP
2022年09月12日 
00:29:15 - 00:29:53
we can also use torch.reshape() to get the right shape for W. However, there is a difference between torch.view and torch.reshapeTL;DR:If you just want to reshape tensors, use torch.reshape. If you're also concerned about memory usage and want to ensure that the two tensors share the same data, use torch.view. - Building makemore Part 2: MLP

we can also use torch.reshape() to get the right shape for W. However, there is a difference between torch.view and torch.reshapeTL;DR:If you just want to reshape tensors, use torch.reshape. If you're also concerned about memory usage and want to ensure that the two tensors share the same data, use torch.view.

Building makemore Part 2: MLP
2022年09月12日  @louiswang538 様 
00:29:20 - 01:15:40
implementing the negative log likelihood loss - Building makemore Part 2: MLP

implementing the negative log likelihood loss

Building makemore Part 2: MLP
2022年09月12日 
00:29:53 - 00:32:17
What's tanh? - Building makemore Part 2: MLP

What's tanh?

Building makemore Part 2: MLP
2022年09月12日  @BrutalStrike2 様 
00:30:03 - 01:15:40
"ideally all of these numbers here of course are one because then we are correctly predicting the next character" hmmmmmm it's reasonable to say these numbers are high, put not one, If the probability here is one, that will exclude any chance of other characters having similar context. - Building makemore Part 2: MLP

"ideally all of these numbers here of course are one because then we are correctly predicting the next character" hmmmmmm it's reasonable to say these numbers are high, put not one, If the probability here is one, that will exclude any chance of other characters having similar context.

Building makemore Part 2: MLP
2022年09月12日  @Pro-ish 様 
00:31:47 - 01:15:40
summary of the full network - Building makemore Part 2: MLP

summary of the full network

Building makemore Part 2: MLP
2022年09月12日 
00:32:17 - 00:32:49
introducing F.cross_entropy and why - Building makemore Part 2: MLP

introducing F.cross_entropy and why

Building makemore Part 2: MLP
2022年09月12日 
00:32:49 - 00:37:56
re: using cross_entropy function around , it sounds like pytorch takes the derivate of each step of exponentiation then normalization instead of simplifying them before taking the derivative. is that a "soft" limitation of the implementation in that a procedure could be defined to overcome it, or is there a bit of an mathematical intuition needed to understand how to rewrite the function to produce a simpler derivative? - Building makemore Part 2: MLP

re: using cross_entropy function around , it sounds like pytorch takes the derivate of each step of exponentiation then normalization instead of simplifying them before taking the derivative. is that a "soft" limitation of the implementation in that a procedure could be defined to overcome it, or is there a bit of an mathematical intuition needed to understand how to rewrite the function to produce a simpler derivative?

Building makemore Part 2: MLP
2022年09月12日  @mconio 様 
00:34:05 - 01:15:40
Since probs are invariant to an offset applied to logits, it's fun to plot the drift in the mean or sum of b2. Looks like Brownian motion. - Building makemore Part 2: MLP

Since probs are invariant to an offset applied to logits, it's fun to plot the drift in the mean or sum of b2. Looks like Brownian motion.

Building makemore Part 2: MLP
2022年09月12日  @LukaszWiklendt 様 
00:37:00 - 01:15:40
, who would tell you this when you are reading from a book. Exceptional teaching ability - Building makemore Part 2: MLP

, who would tell you this when you are reading from a book. Exceptional teaching ability

Building makemore Part 2: MLP
2022年09月12日  @AbhishekVaid 様 
00:37:14 - 01:15:40
implementing the training loop, overfitting one batch - Building makemore Part 2: MLP

implementing the training loop, overfitting one batch

Building makemore Part 2: MLP
2022年09月12日 
00:37:56 - 00:41:25
pfeeeewwww 😳 - Building makemore Part 2: MLP

pfeeeewwww 😳

Building makemore Part 2: MLP
2022年09月12日  @BuFu1O1 様 
00:38:00 - 01:15:40
training on the full dataset, minibatches - Building makemore Part 2: MLP

training on the full dataset, minibatches

Building makemore Part 2: MLP
2022年09月12日 
00:41:25 - 00:45:40
I don't understand the mini batching happening at . when using ix = torch.randint(0,X.shape,(32,)), and using this to index into X, you are just picking 32 data examples from X, not batching all of the data right? I thought by batching, you taking a batch of data, do a forward pass on all items in the batch, take the mean output and do back prop on that mean result outcome and update the model on that loss. Here I feel like Andrej is just selecting 32 individual data examples. Please do correct me if I'm wrong! I'm new to ML! - Building makemore Part 2: MLP

I don't understand the mini batching happening at . when using ix = torch.randint(0,X.shape,(32,)), and using this to index into X, you are just picking 32 data examples from X, not batching all of the data right? I thought by batching, you taking a batch of data, do a forward pass on all items in the batch, take the mean output and do back prop on that mean result outcome and update the model on that loss. Here I feel like Andrej is just selecting 32 individual data examples. Please do correct me if I'm wrong! I'm new to ML!

Building makemore Part 2: MLP
2022年09月12日  @thejessundar6370 様 
00:41:30 - 01:15:40
life lesson: much better to have an approximate gradient and take many steps than have an exact gradient and take a few steps - Building makemore Part 2: MLP

life lesson: much better to have an approximate gradient and take many steps than have an exact gradient and take a few steps

Building makemore Part 2: MLP
2022年09月12日  @GaurangPatel1 様 
00:44:25 - 01:15:40
Awesome videos, thank you for that! I have a question though about , "finding a good initial learning rate", which is either a mistake in the video or I misunderstood something. - Building makemore Part 2: MLP

Awesome videos, thank you for that! I have a question though about , "finding a good initial learning rate", which is either a mistake in the video or I misunderstood something.

Building makemore Part 2: MLP
2022年09月12日  @LambrosPetrou 様 
00:45:00 - 01:15:40
It seems it is slightly different from the approach presented here. Looking at the , it looks like for each iteration, we randomly select a min batch of size 32 from the whole training set, and update the parameters, then go on to the next iteration. - Building makemore Part 2: MLP

It seems it is slightly different from the approach presented here. Looking at the , it looks like for each iteration, we randomly select a min batch of size 32 from the whole training set, and update the parameters, then go on to the next iteration.

Building makemore Part 2: MLP
2022年09月12日  @leiyang2176 様 
00:45:34 - 01:15:40
finding a good initial learning rate - Building makemore Part 2: MLP

finding a good initial learning rate

Building makemore Part 2: MLP
2022年09月12日 
00:45:40 - 00:53:20
@ 'Finding a good initial learning rate', each learning rate is used just one time. The adjustment of the parameter of one learning rate is based on the parameters already adjusted using the prior smaller learning rates. I feel that each of the 1,000 learning rate candidates should go through the same number of iterations. Then, the losses at the end of the iterations are compared. Please tell me if I am wrong. Thanks! - Building makemore Part 2: MLP

@ 'Finding a good initial learning rate', each learning rate is used just one time. The adjustment of the parameter of one learning rate is based on the parameters already adjusted using the prior smaller learning rates. I feel that each of the 1,000 learning rate candidates should go through the same number of iterations. Then, the losses at the end of the iterations are compared. Please tell me if I am wrong. Thanks!

Building makemore Part 2: MLP
2022年09月12日  @myao8930 様 
00:45:40 - 01:15:40
I don't quite understand the part of  finding a good initial learning rate. Why the lowest point of loss value indicates the best learning rate? It takes some time for the loss value to decrease, right? - Building makemore Part 2: MLP

I don't quite understand the part of finding a good initial learning rate. Why the lowest point of loss value indicates the best learning rate? It takes some time for the loss value to decrease, right?

Building makemore Part 2: MLP
2022年09月12日  @YYoung1025 様 
00:45:40 - 01:15:40
On  I was waiting fot Karpathy's constant to appear. Thank you for the lecture, Andrej - Building makemore Part 2: MLP

On I was waiting fot Karpathy's constant to appear. Thank you for the lecture, Andrej

Building makemore Part 2: MLP
2022年09月12日  @gleb.timofeev 様 
00:45:45 - 01:15:40
At  Andrej says that the learning rate would be low in the beginning and high at the end. Why was it set like that? My intuition is that the learning rate should be in the opposite order. - Building makemore Part 2: MLP

At Andrej says that the learning rate would be low in the beginning and high at the end. Why was it set like that? My intuition is that the learning rate should be in the opposite order.

Building makemore Part 2: MLP
2022年09月12日  @manu_221b 様 
00:48:45 - 01:15:40
I believe that at  the losses and the learning rates are misaligned.The first loss (derived from completely random weights) is computed before the first learning rate is used, and therefor the first learning rate should be aligned with the second loss.You can simply solve this problem by using this snippet;lri = lri[:-1]lossi = lossi[1:] - Building makemore Part 2: MLP

I believe that at the losses and the learning rates are misaligned.The first loss (derived from completely random weights) is computed before the first learning rate is used, and therefor the first learning rate should be aligned with the second loss.You can simply solve this problem by using this snippet;lri = lri[:-1]lossi = lossi[1:]

Building makemore Part 2: MLP
2022年09月12日  @koenBotermans 様 
00:49:22 - 01:15:40
Question about , in the plot, y axis is the loss, and the x axis is learning rate, but x axis is also the step number. How do you know whether the y axis change is because of learning rate difference or step number increase? - Building makemore Part 2: MLP

Question about , in the plot, y axis is the loss, and the x axis is learning rate, but x axis is also the step number. How do you know whether the y axis change is because of learning rate difference or step number increase?

Building makemore Part 2: MLP
2022年09月12日  @datou666 様 
00:50:00 - 01:15:40
Great video! One question, @AndrejKarpathy: around  or so you show how to graph an optimal learning rate and ultimately you determine that the 0.1 you started with was pretty good. However, unless I'm misunderstanding your code, aren't you iterating over the 1000 different loss function candidates while *simultaneously* doing 1000 consecutive passes over the neural net? Meaning, the loss will naturally be lower during later iterations since you've already done a bunch of backward passes, so the biggest loss improvements would always be stacked towards the beginning of the 1000 iterations, right? Won't that bias your optimal learning rate calculation towards the first few candidates? - Building makemore Part 2: MLP

Great video! One question, @AndrejKarpathy: around or so you show how to graph an optimal learning rate and ultimately you determine that the 0.1 you started with was pretty good. However, unless I'm misunderstanding your code, aren't you iterating over the 1000 different loss function candidates while *simultaneously* doing 1000 consecutive passes over the neural net? Meaning, the loss will naturally be lower during later iterations since you've already done a bunch of backward passes, so the biggest loss improvements would always be stacked towards the beginning of the 1000 iterations, right? Won't that bias your optimal learning rate calculation towards the first few candidates?

Building makemore Part 2: MLP
2022年09月12日  @JayPinho 様 
00:50:30 - 01:15:40
Can anyone explain to me, why looking at loss plotted against exponent of the learning rate () the conclusion is that lr<0.1 "is way too low"? For me, its where the loss is actually getting lower, isn't it? - Building makemore Part 2: MLP

Can anyone explain to me, why looking at loss plotted against exponent of the learning rate () the conclusion is that lr<0.1 "is way too low"? For me, its where the loss is actually getting lower, isn't it?

Building makemore Part 2: MLP
2022年09月12日  @przemysawbuczkowski8715 様 
00:50:42 - 01:15:40
splitting up the dataset into train/val/test splits and why - Building makemore Part 2: MLP

splitting up the dataset into train/val/test splits and why

Building makemore Part 2: MLP
2022年09月12日 
00:53:20 - 01:00:49
To break the data to training, developement and test, one can also use torch.tensor_split.n1 = int(0.8 * X.shape[0])n2 = int(0.9 * X.shape[0])Xtr, Xdev, Xts = X.tensor_split((n1, n2), dim=0)Ytr, Ydev, Yts = Y.tensor_split((n1, n2), dim=0) - Building makemore Part 2: MLP

To break the data to training, developement and test, one can also use torch.tensor_split.n1 = int(0.8 * X.shape[0])n2 = int(0.9 * X.shape[0])Xtr, Xdev, Xts = X.tensor_split((n1, n2), dim=0)Ytr, Ydev, Yts = Y.tensor_split((n1, n2), dim=0)

Building makemore Part 2: MLP
2022年09月12日  @rmajdodin 様 
00:53:20 - 01:15:40
I'm confused at  why care must be taking with how many times you can use the test dataset as the model will learn from it. Is this because there is no equivalent of  'torch.no_grad()' for LLMs - will the LLM always update the weights when given data? - Building makemore Part 2: MLP

I'm confused at why care must be taking with how many times you can use the test dataset as the model will learn from it. Is this because there is no equivalent of 'torch.no_grad()' for LLMs - will the LLM always update the weights when given data?

Building makemore Part 2: MLP
2022年09月12日  @afai264 様 
00:56:17 - 01:15:40
Thank you for the lectures! @ Made me chuckle - Building makemore Part 2: MLP

Thank you for the lectures! @ Made me chuckle

Building makemore Part 2: MLP
2022年09月12日  @yukselkapan9996 様 
00:59:01 - 01:15:40
It can take days!! How can someone sleep with such pressure - Building makemore Part 2: MLP

It can take days!! How can someone sleep with such pressure

Building makemore Part 2: MLP
2022年09月12日  @armanchaudhary1832 様 
00:59:15 - 01:15:40
: - Building makemore Part 2: MLP

:

Building makemore Part 2: MLP
2022年09月12日  @phangb580 様 
01:00:00 - 01:15:40
experiment: larger hidden layer - Building makemore Part 2: MLP

experiment: larger hidden layer

Building makemore Part 2: MLP
2022年09月12日 
01:00:49 - 01:05:27
I also just noticed, he explicitly mentions these fluctuations at . Doh! - Building makemore Part 2: MLP

I also just noticed, he explicitly mentions these fluctuations at . Doh!

Building makemore Part 2: MLP
2022年09月12日  @3rdman99 様 
01:02:15 - 01:15:40
around  - the reason why we're not "overfitting" with the larger number of params might be the context size. with a context of 3, no number of params will remove the inherent uncertainty. - Building makemore Part 2: MLP

around - the reason why we're not "overfitting" with the larger number of params might be the context size. with a context of 3, no number of params will remove the inherent uncertainty.

Building makemore Part 2: MLP
2022年09月12日  @alois-h 様 
01:05:00 - 01:15:40
visualizing the character embeddings - Building makemore Part 2: MLP

visualizing the character embeddings

Building makemore Part 2: MLP
2022年09月12日 
01:05:27 - 01:07:16
Fascinating how the vowels end up clustered together! - Building makemore Part 2: MLP

Fascinating how the vowels end up clustered together!

Building makemore Part 2: MLP
2022年09月12日  @siddhantverma532 様 
01:06:56 - 01:15:40
experiment: larger embedding size - Building makemore Part 2: MLP

experiment: larger embedding size

Building makemore Part 2: MLP
2022年09月12日 
01:07:16 - 01:11:46
: it should be 10 dimensional embeddings for each *character* not word in this character-level language model. - Building makemore Part 2: MLP

: it should be 10 dimensional embeddings for each *character* not word in this character-level language model.

Building makemore Part 2: MLP
2022年09月12日  @HuifengOu-b5v 様 
01:07:20 - 01:15:40
you shouldn't have plotted stepi variable against the loss :D it could have worked if you'd ploted out just plt.plot(loss_history) or applied two different colours for those two runs - Building makemore Part 2: MLP

you shouldn't have plotted stepi variable against the loss :D it could have worked if you'd ploted out just plt.plot(loss_history) or applied two different colours for those two runs

Building makemore Part 2: MLP
2022年09月12日  @danieljaszczyszczykoeczews2616 様 
01:10:09 - 01:15:40
The plot of the steps and losses after running the training loop multiple times (~  mins)  https://youtu.be/TCH_1BHY58I?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&t=4233) would be wrong because stepi array keeps appending the same indices [0, 50000). I expect the graph to just start getting more unstable and unstable. - Building makemore Part 2: MLP

The plot of the steps and losses after running the training loop multiple times (~ mins) https://youtu.be/TCH_1BHY58I?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&t=4233) would be wrong because stepi array keeps appending the same indices [0, 50000). I expect the graph to just start getting more unstable and unstable.

Building makemore Part 2: MLP
2022年09月12日  @suyashkumar1990 様 
01:10:30 - 01:15:40
summary of our final code, conclusion - Building makemore Part 2: MLP

summary of our final code, conclusion

Building makemore Part 2: MLP
2022年09月12日 
01:11:46 - 01:13:24
sampling from the model - Building makemore Part 2: MLP

sampling from the model

Building makemore Part 2: MLP
2022年09月12日 
01:13:24 - 01:14:55
google collab (new!!) notebook advertisement - Building makemore Part 2: MLP

google collab (new!!) notebook advertisement

Building makemore Part 2: MLP
2022年09月12日 
01:14:55 - 01:15:40
Adrej is learning youtube tricks 😅 - Building makemore Part 2: MLP

Adrej is learning youtube tricks 😅

Building makemore Part 2: MLP
2022年09月12日  @404logicfound 様 
01:14:56 - 01:15:40

Andrej Karpathy

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。動画はYouTube公式の動画プレイヤーで再生されるため、再生数・収益などはすべて元動画に還元されます。

Timetable

動画タイムテーブル

タイムテーブルが見つかりませんでした。