Building makemore Part 5: Building a WaveNet

We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dilated convolutions (not yet covered). Along the way we get a better sense of torch.nn and what it is and how it works under the hood, and what a typical deep learning development process looks like (a lot of reading of documentation, keeping track of multidimensional tensor shapes, moving between jupyter notebooks and repository code, ...).

Links:
- makemore on github: https://github.com/karpathy/makemore
- jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part5_cnn1.ipynb
- collab notebook: https://colab.research.google.com/drive/1CXVEmCO_7r7WYZGb5qnjfyxTvQa13g5X?usp=sharing
- my website: https://karpathy.ai
- my twitter: https://twitter.com/karpathy
- our Discord channel: https://discord.gg/3zy8kqD9Cp

Supplementary links:
- WaveNet 2016 from DeepMind https://arxiv.org/abs/1609.03499
- Bengio et al. 2003 MLP LM https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Chapters:
intro
00:00:00 intro
00:01:40 starter code walkthrough
00:06:56 let’s fix the learning rate plot
00:09:16 pytorchifying our code: layers, containers, torch.nn, fun bugs
implementing wavenet
00:17:11 overview: WaveNet
00:19:33 dataset bump the context size to 8
00:19:55 re-running baseline code on block_size 8
00:21:36 implementing WaveNet
00:37:41 training the WaveNet: first pass
00:38:50 fixing batchnorm1d bug
00:45:21 re-training WaveNet with bug fix
00:46:07 scaling up our WaveNet
conclusions
00:46:58 experimental harness
00:47:44 WaveNet but with “dilated causal convolutions”
00:51:34 torch.nn
00:52:28 the development process of building deep neural nets
00:54:17 going forward
00:55:26 improve on my loss! how far can we improve a WaveNet on this data?

Building makemore Part 5: Building a WaveNet

intro

starter code walkthrough

*Starter Code Walkthrough (****)*

Andrej is scrolling through the architecture, however not commenting why the first Linear layer has deactivated biases?I saw this in couple of other projects, can smbd clarify why or say where should I look to find an answer?Thank you

let’s fix the learning rate plot

When I did the mean() trick at ~ I left out an audible gasp! That was such a neat trick, going to use that one in the future

pytorchifying our code: layers, containers, torch.nn, fun bugsimplementing wavenet

@ Why not just call torch.flatten(start_dim, end_dim) inside Flatten(start_dim, end_dim=-1) layer? To use it in your particular case just create Flatten(1) layer

overview: WaveNet

dataset bump the context size to 8

re-running baseline code on block_size 8

implementing WaveNet

Does anyone know how visualize the dimensions of the tensors that are treated from around ? I'm having a really hard time to keep up with what's what. Thanks!

training the WaveNet: first pass

fixing batchnorm1d bug

re-training WaveNet with bug fix

*Re-training the WaveNet with Bug Fix (****)*- The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.- The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation.

With the batchnorm bug at around , why does it still work?, if the batch norm is producing the wrong shape why is there not an error?

scaling up our WaveNetconclusions

experimental harness

WaveNet but with “dilated causal convolutions”

The sentence that Anderej said at made me realize something, something very deep. 🔥

torch.nn

the development process of building deep neural nets

going forward

improve on my loss! how far can we improve a WaveNet on this data?

Andrej Karpathy

Timetable

よく話題になっている単語