Andrej Karpathy

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。動画はYouTube公式の動画プレイヤーで再生されるため、再生数・収益などはすべて元動画に還元されます。

Videos

動画一覧

動画数:17件

The spelled-out intro to language modeling: building makemore

The spelled-out intro to language modeling: building makemore

We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification). Links: - makemore on github: https://github.com/karpathy/makemore - jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part1_bigrams.ipynb - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - (new) Neural Networks: Zero to Hero series Discord channel: https://discord.gg/3zy8kqD9Cp , for people who'd like to chat more and go beyond youtube comments Useful links for practice: - Python + Numpy tutorial from CS231n https://cs231n.github.io/python-numpy-tutorial/ . We use torch.tensor instead of numpy.array in this video. Their design (e.g. broadcasting, data types, etc.) is so similar that practicing one is basically practicing the other, just be careful with some of the APIs - how various functions are named, what arguments they take, etc. - these details can vary. - PyTorch tutorial on Tensor https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html - Another PyTorch intro to Tensor https://pytorch.org/tutorials/beginner/nlp/pytorch_tutorial.html Exercises: E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model? E02: split up the dataset randomly into 80% train set, 10% dev set, 10% test set. Train the bigram and trigram models only on the training set. Evaluate them on dev and test splits. What can you see? E03: use the dev set to tune the strength of smoothing (or regularization) for the trigram model - i.e. try many possibilities and see which one works best based on the dev set loss. What patterns can you see in the train and dev set loss as you tune this strength? Take the best setting of the smoothing and evaluate on the test set once and at the end. How good of a loss do you achieve? E04: we saw that our 1-hot vectors merely select a row of W, so producing these vectors explicitly feels wasteful. Can you delete our use of F.one_hot in favor of simply indexing into rows of W? E05: look up and use F.cross_entropy instead. You should achieve the same result. Can you think of why we'd prefer to use F.cross_entropy instead? E06: meta-exercise! Think of a fun/interesting exercise and complete it. Chapters: 00:00:00 intro 00:03:03 reading and exploring the dataset 00:06:24 exploring the bigrams in the dataset 00:09:24 counting bigrams in a python dictionary 00:12:45 counting bigrams in a 2D torch tensor ("training the model") 00:18:19 visualizing the bigram tensor 00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token 00:24:02 sampling from the model 00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting 00:50:14 loss function (the negative log likelihood of the data under our model) 01:00:50 model smoothing with fake counts 01:02:57 PART 2: the neural network approach: intro 01:05:26 creating the bigram dataset for the neural net 01:10:01 feeding integers into neural nets? one-hot encodings 01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication 01:18:46 transforming neural net outputs into probabilities: the softmax 01:26:17 summary, preview to next steps, reference to micrograd 01:35:49 vectorized loss 01:38:36 backward and update, in PyTorch 01:42:55 putting everything together 01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix 01:50:18 note 2: model smoothing as regularization loss 01:54:31 sampling from the neural net 01:56:16 conclusion #deep learning #language model #gpt #bigram #neural network #pytorch #torch #tensor
2022年09月08日
00:00:00 - 01:57:45
Stable diffusion dreams of psychedelic faces

Stable diffusion dreams of psychedelic faces

Prompt: "psychedelic faces" Stable diffusion takes a noise vector as input and samples an image. To create this video I smoothly (spherically) interpolate between randomly chosen noise vectors and render frames along the way. This video was produced by one A100 GPU taking about 10 tabs and dreaming about the prompt overnight (~8 hours). While I slept and dreamt about other things. Music: Stars by JVNA Links: - Stable diffusion: https://stability.ai/blog - Code used to make this video: https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355 - My twitter: https://twitter.com/karpathy
2022年08月20日
00:00:00 - 00:04:02
Stable diffusion dreams of steampunk brains

Stable diffusion dreams of steampunk brains

Prompt: "ultrarealistic steam punk neural network machine in the shape of a brain, placed on a pedestal, covered with neurons made of gears. dramatic lighting. #unrealengine" Stable diffusion takes a noise vector as input and samples an image. To create this video I smoothly (spherically) interpolate between randomly chosen noise vectors and render frames along the way. This video was produced by one A100 GPU dreaming about the prompt overnight (~8 hours). While I slept and dreamt about other things. This is version 2 video of this prompt, with (I think?) a bit higher quality and trippy AGI music. Music: Wonders by JVNA Links: - Stable diffusion: https://stability.ai/blog - Code used to make this video: https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355 - My twitter: https://twitter.com/karpathy
2022年08月18日
00:00:00 - 00:19:26
Stable diffusion dreams of tattoos

Stable diffusion dreams of tattoos

Dreams of tattoos. (There are a few discrete jumps in the video because I had to erase portions that got just a little 🌶️, believe I got most of it) Links - Stable diffusion: https://stability.ai/blog - Code used to make this video: https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355 - My twitter: https://twitter.com/karpathy
2022年08月17日
00:00:00 - 00:01:46
The spelled-out intro to neural networks and backpropagation: building micrograd

The spelled-out intro to neural networks and backpropagation: building micrograd

This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school. Links: - micrograd on github: https://github.com/karpathy/micrograd - jupyter notebooks I built in this video: https://github.com/karpathy/nn-zero-to-hero/tree/master/lectures/micrograd - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - "discussion forum": nvm, use youtube comments below for now :) - (new) Neural Networks: Zero to Hero series Discord channel: https://discord.gg/3zy8kqD9Cp , for people who'd like to chat more and go beyond youtube comments Exercises: you should now be able to complete the following google collab, good luck!: https://colab.research.google.com/drive/1FPTx1RXtBfc4MaTkf7viZZD4U2F9gtKN?usp=sharing Chapters: 00:00:00 intro 00:00:25 micrograd overview 00:08:08 derivative of a simple function with one input 00:14:12 derivative of a function with multiple inputs 00:19:09 starting the core Value object of micrograd and its visualization 00:32:10 manual backpropagation example #1: simple expression 00:51:10 preview of a single optimization step 00:52:52 manual backpropagation example #2: a neuron 01:09:02 implementing the backward function for each operation 01:17:32 implementing the backward function for a whole expression graph 01:22:28 fixing a backprop bug when one node is used multiple times 01:27:05 breaking up a tanh, exercising with more operations 01:39:31 doing the same thing but in PyTorch: comparison 01:43:55 building out a neural net library (multi-layer perceptron) in micrograd 01:51:04 creating a tiny dataset, writing the loss function 01:57:56 collecting all of the parameters of the neural net 02:01:12 doing gradient descent optimization manually, training the network 02:14:03 summary of what we learned, how to go towards modern neural nets 02:16:46 walkthrough of the full code of micrograd on github 02:21:10 real stuff: diving into PyTorch, finding their backward pass for tanh 02:24:39 conclusion 02:25:20 outtakes :) #neural #network #backpropagation #lecture
2022年08月17日
00:00:00 - 02:25:52
Stable diffusion dreams of "blueberry spaghetti" for one night

Stable diffusion dreams of "blueberry spaghetti" for one night

Prompt: "blueberry spaghetti" Stable diffusion takes a noise vector as input and samples an image. To create this video I simply smoothly interpolate between randomly chosen noise vectors and render frames along the way. Links - Stable diffusion: https://stability.ai/blog - Code used to make this video: https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355 - My twitter: https://twitter.com/karpathy
2022年08月17日
00:00:00 - 00:05:03
Stable diffusion dreams of steam punk neural networks

Stable diffusion dreams of steam punk neural networks

A stable diffusion dream. The prompt was "ultrarealistic steam punk neural network machine in the shape of a brain, placed on a pedestal, covered with neurons made of gears. dramatic lighting. #unrealengine" the new and improved v2 version of this video is now here: https://www.youtube.com/watch?v=2oKjtvYslMY generated with this hacky script: https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355 The script slowly meanders through noise space to explore the space of possible generations for the fixed prompt. Stable diffusion: https://stability.ai/blog/stable-diffusion-announcement
2022年08月16日
00:00:00 - 00:02:35