- Building makemore Part 5: Building a WaveNet

Building makemore Part 5: Building a WaveNet

We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dil...
We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dilated convolutions (not yet covered). Along the way we get a better sense of torch.nn and what it is and how it works under the hood, and what a typical deep learning development process looks like (a lot of reading of documentation, keeping track of multidimensional tensor shapes, moving between jupyter notebooks and repository code, ...).

Links:
- makemore on github: https://github.com/karpathy/makemore
- jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part5_cnn1.ipynb
- collab notebook: https://colab.research.google.com/drive/1CXVEmCO_7r7WYZGb5qnjfyxTvQa13g5X?usp=sharing
- my website: https://karpathy.ai
- my twitter:
- our Discord channel: https://discord.gg/3zy8kqD9Cp

Supplementary links:
- WaveNet 2016 from DeepMind https://arxiv.org/abs/1609.03499
- Bengio et al. 2003 MLP LM https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Chapters:
intro
00:00:00 intro
00:01:40 starter code walkthrough
00:06:56 let’s fix the learning rate plot
00:09:16 pytorchifying our code: layers, containers, torch.nn, fun bugs
implementing wavenet
00:17:11 overview: WaveNet
00:19:33 dataset bump the context size to 8
00:19:55 re-running baseline code on block_size 8
00:21:36 implementing WaveNet
00:37:41 training the WaveNet: first pass
00:38:50 fixing batchnorm1d bug
00:45:21 re-training WaveNet with bug fix
00:46:07 scaling up our WaveNet
conclusions
00:46:58 experimental harness
00:47:44 WaveNet but with “dilated causal convolutions”
00:51:34 torch.nn
00:52:28 the development process of building deep neural nets
00:54:17 going forward
00:55:26 improve on my loss! how far can we improve a WaveNet on this data?
intro - Building makemore Part 5: Building a WaveNet

intro

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:00:00 - 00:01:40
starter code walkthrough - Building makemore Part 5: Building a WaveNet

starter code walkthrough

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:01:40 - 00:06:56
*Starter Code Walkthrough (****)* - Building makemore Part 5: Building a WaveNet

*Starter Code Walkthrough (****)*

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:01:43 - 00:09:19
Andrej is scrolling through the architecture, however not commenting why the first Linear layer has deactivated biases?I saw this in couple of other projects, can smbd clarify why or say where should I look to find an answer?Thank you - Building makemore Part 5: Building a WaveNet

Andrej is scrolling through the architecture, however not commenting why the first Linear layer has deactivated biases?I saw this in couple of other projects, can smbd clarify why or say where should I look to find an answer?Thank you

Building makemore Part 5: Building a WaveNet
2022年11月21日  @LevTelyatnikov 様 
00:05:40 - 00:56:22
let’s fix the learning rate plot - Building makemore Part 5: Building a WaveNet

let’s fix the learning rate plot

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:06:56 - 00:09:16
When I did the mean() trick at ~ I left out an audible gasp! That was such a neat trick, going to use that one in the future - Building makemore Part 5: Building a WaveNet

When I did the mean() trick at ~ I left out an audible gasp! That was such a neat trick, going to use that one in the future

Building makemore Part 5: Building a WaveNet
2022年11月21日  @AndrewOrtman 様 
00:08:50 - 00:56:22
pytorchifying our code: layers, containers, torch.nn, fun bugsimplementing wavenet - Building makemore Part 5: Building a WaveNet

pytorchifying our code: layers, containers, torch.nn, fun bugsimplementing wavenet

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:09:16 - 00:17:11
*PyTorchifying Our Code: Layers, Containers, Torch.nn, Fun Bugs (****)*- Embedding table and view operations are encapsulated into custom Embedding and Flatten modules.- A Sequential container is created to organize layers, similar to torch.nn.Sequential.- The forward pass is simplified using these new modules and container.- A bug related to BatchNorm in training mode with single-example batches is identified and fixed. - Building makemore Part 5: Building a WaveNet

*PyTorchifying Our Code: Layers, Containers, Torch.nn, Fun Bugs (****)*- Embedding table and view operations are encapsulated into custom Embedding and Flatten modules.- A Sequential container is created to organize layers, similar to torch.nn.Sequential.- The forward pass is simplified using these new modules and container.- A bug related to BatchNorm in training mode with single-example batches is identified and fixed.

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:09:19 - 00:17:12
@ Why not just call torch.flatten(start_dim, end_dim) inside Flatten(start_dim, end_dim=-1) layer? To use it in your particular case just create Flatten(1) layer - Building makemore Part 5: Building a WaveNet

@ Why not just call torch.flatten(start_dim, end_dim) inside Flatten(start_dim, end_dim=-1) layer? To use it in your particular case just create Flatten(1) layer

Building makemore Part 5: Building a WaveNet
2022年11月21日  @apivovarov2 様 
00:11:18 - 00:56:22
overview: WaveNet - Building makemore Part 5: Building a WaveNet

overview: WaveNet

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:17:11 - 00:19:33
*Overview: WaveNet (****)*- The limitations of the current MLP architecture are discussed, particularly the issue of squashing information too quickly.- The video introduces the WaveNet architecture, which progressively fuses information in a tree-like structure.- The concept of dilated causal convolutions is briefly mentioned as an implementation detail for efficiency. - Building makemore Part 5: Building a WaveNet

*Overview: WaveNet (****)*- The limitations of the current MLP architecture are discussed, particularly the issue of squashing information too quickly.- The video introduces the WaveNet architecture, which progressively fuses information in a tree-like structure.- The concept of dilated causal convolutions is briefly mentioned as an implementation detail for efficiency.

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:17:12 - 00:19:35
dataset bump the context size to 8 - Building makemore Part 5: Building a WaveNet

dataset bump the context size to 8

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:19:33 - 00:19:55
*Implementing WaveNet (****)*- The dataset block size is increased to 8 to provide more context for predictions.- The limitations of directly scaling up the context length in the MLP are highlighted.- A hierarchical model is implemented using FlattenConsecutive layers to group and process characters in pairs.- The shapes of tensors at each layer are inspected to ensure the network functions as intended.- A bug in the BatchNorm1D implementation is identified and fixed to correctly handle multi-dimensional inputs. - Building makemore Part 5: Building a WaveNet

*Implementing WaveNet (****)*- The dataset block size is increased to 8 to provide more context for predictions.- The limitations of directly scaling up the context length in the MLP are highlighted.- A hierarchical model is implemented using FlattenConsecutive layers to group and process characters in pairs.- The shapes of tensors at each layer are inspected to ensure the network functions as intended.- A bug in the BatchNorm1D implementation is identified and fixed to correctly handle multi-dimensional inputs.

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:19:35 - 00:45:25
re-running baseline code on block_size 8 - Building makemore Part 5: Building a WaveNet

re-running baseline code on block_size 8

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:19:55 - 00:21:36
implementing WaveNet - Building makemore Part 5: Building a WaveNet

implementing WaveNet

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:21:36 - 00:37:41
Does anyone know how visualize the dimensions of the tensors that are treated from around ? I'm having a really hard time to keep up with what's what. Thanks! - Building makemore Part 5: Building a WaveNet

Does anyone know how visualize the dimensions of the tensors that are treated from around ? I'm having a really hard time to keep up with what's what. Thanks!

Building makemore Part 5: Building a WaveNet
2022年11月21日  @lucasnevo 様 
00:23:45 - 00:56:22
training the WaveNet: first pass - Building makemore Part 5: Building a WaveNet

training the WaveNet: first pass

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:37:41 - 00:38:50
At , it sounds like we compared two architectures, both with 22k parameters and an 8 character window:* 1 layer, full connectivity* 3 layers, tree-like connectivityIn a single layer, full connectivity outperforms partial connectivity.But partial connectivity uses fewer parameters, so we can afford to build more layers. - Building makemore Part 5: Building a WaveNet

At , it sounds like we compared two architectures, both with 22k parameters and an 8 character window:* 1 layer, full connectivity* 3 layers, tree-like connectivityIn a single layer, full connectivity outperforms partial connectivity.But partial connectivity uses fewer parameters, so we can afford to build more layers.

Building makemore Part 5: Building a WaveNet
2022年11月21日  @davidespinosa1910 様 
00:38:00 - 00:56:22
fixing batchnorm1d bug - Building makemore Part 5: Building a WaveNet

fixing batchnorm1d bug

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:38:50 - 00:45:21
re-training WaveNet with bug fix - Building makemore Part 5: Building a WaveNet

re-training WaveNet with bug fix

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:45:21 - 00:46:07
*Re-training the WaveNet with Bug Fix (****)*- The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.- The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation. - Building makemore Part 5: Building a WaveNet

*Re-training the WaveNet with Bug Fix (****)*- The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.- The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation.

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:45:25 - 00:46:07
With the batchnorm bug at around , why does it still work?, if the batch norm is producing the wrong shape why is there not an error? - Building makemore Part 5: Building a WaveNet

With the batchnorm bug at around , why does it still work?, if the batch norm is producing the wrong shape why is there not an error?

Building makemore Part 5: Building a WaveNet
2022年11月21日  @redthunder6183 様 
00:46:00 - 00:56:22
scaling up our WaveNetconclusions - Building makemore Part 5: Building a WaveNet

scaling up our WaveNetconclusions

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:46:07 - 00:46:58
*Scaling up Our WaveNet (****)*- The number of embedding and hidden units are increased, leading to a model with 76,000 parameters.- Despite longer training times, the validation performance improves to 1.993.- The need for an experimental harness to efficiently conduct hyperparameter searches is emphasized. - Building makemore Part 5: Building a WaveNet

*Scaling up Our WaveNet (****)*- The number of embedding and hidden units are increased, leading to a model with 76,000 parameters.- Despite longer training times, the validation performance improves to 1.993.- The need for an experimental harness to efficiently conduct hyperparameter searches is emphasized.

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:46:07 - 00:46:59
experimental harness - Building makemore Part 5: Building a WaveNet

experimental harness

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:46:58 - 00:47:44
*Experimental Harness (****)*- The lack of a proper experimental setup is acknowledged as a limitation of the current approach.- Potential future topics are discussed, including:- Implementing dilated causal convolutions- Exploring residual and skip connections- Setting up an evaluation harness- Covering recurrent neural networks and transformers - Building makemore Part 5: Building a WaveNet

*Experimental Harness (****)*- The lack of a proper experimental setup is acknowledged as a limitation of the current approach.- Potential future topics are discussed, including:- Implementing dilated causal convolutions- Exploring residual and skip connections- Setting up an evaluation harness- Covering recurrent neural networks and transformers

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:46:59 - 00:55:27
WaveNet but with “dilated causal convolutions” - Building makemore Part 5: Building a WaveNet

WaveNet but with “dilated causal convolutions”

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:47:44 - 00:51:34
The sentence that Anderej said at  made me realize something, something very deep. 🔥 - Building makemore Part 5: Building a WaveNet

The sentence that Anderej said at made me realize something, something very deep. 🔥

Building makemore Part 5: Building a WaveNet
2022年11月21日  @enchanted_swiftie 様 
00:49:26 - 00:56:22
torch.nn - Building makemore Part 5: Building a WaveNet

torch.nn

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:51:34 - 00:52:28
the development process of building deep neural nets - Building makemore Part 5: Building a WaveNet

the development process of building deep neural nets

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:52:28 - 00:54:17
going forward - Building makemore Part 5: Building a WaveNet

going forward

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:54:17 - 00:55:26
improve on my loss! how far can we improve a WaveNet on this data? - Building makemore Part 5: Building a WaveNet

improve on my loss! how far can we improve a WaveNet on this data?

Building makemore Part 5: Building a WaveNet
2022年11月21日 
00:55:26 - 00:56:22
*Improve on My Loss! How Far Can We Improve a WaveNet on This Data? (****)*- The video concludes with a challenge to the viewers to further improve the WaveNet model's performance.- Suggestions for exploration include:- Trying different channel allocations- Experimenting with embedding dimensions- Comparing the hierarchical network to a large MLP- Implementing layers from the WaveNet paper- Tuning initialization and optimization parameters - Building makemore Part 5: Building a WaveNet

*Improve on My Loss! How Far Can We Improve a WaveNet on This Data? (****)*- The video concludes with a challenge to the viewers to further improve the WaveNet model's performance.- Suggestions for exploration include:- Trying different channel allocations- Experimenting with embedding dimensions- Comparing the hierarchical network to a large MLP- Implementing layers from the WaveNet paper- Tuning initialization and optimization parameters

Building makemore Part 5: Building a WaveNet
2022年11月21日  @wolpumba4099 様 
00:55:27 - 00:56:22

Andrej Karpathy

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。動画はYouTube公式の動画プレイヤーで再生されるため、再生数・収益などはすべて元動画に還元されます。

Timetable

動画タイムテーブル

タイムテーブルが見つかりませんでした。