
intro

starter code walkthrough

*Starter Code Walkthrough (****)*

Andrej is scrolling through the architecture, however not commenting why the first Linear layer has deactivated biases?I saw this in couple of other projects, can smbd clarify why or say where should I look to find an answer?Thank you

let’s fix the learning rate plot

When I did the mean() trick at ~ I left out an audible gasp! That was such a neat trick, going to use that one in the future

pytorchifying our code: layers, containers, torch.nn, fun bugsimplementing wavenet

*PyTorchifying Our Code: Layers, Containers, Torch.nn, Fun Bugs (****)*- Embedding table and view operations are encapsulated into custom Embedding and Flatten modules.- A Sequential container is created to organize layers, similar to torch.nn.Sequential.- The forward pass is simplified using these new modules and container.- A bug related to BatchNorm in training mode with single-example batches is identified and fixed.

@ Why not just call torch.flatten(start_dim, end_dim) inside Flatten(start_dim, end_dim=-1) layer? To use it in your particular case just create Flatten(1) layer

overview: WaveNet

*Overview: WaveNet (****)*- The limitations of the current MLP architecture are discussed, particularly the issue of squashing information too quickly.- The video introduces the WaveNet architecture, which progressively fuses information in a tree-like structure.- The concept of dilated causal convolutions is briefly mentioned as an implementation detail for efficiency.

dataset bump the context size to 8

*Implementing WaveNet (****)*- The dataset block size is increased to 8 to provide more context for predictions.- The limitations of directly scaling up the context length in the MLP are highlighted.- A hierarchical model is implemented using FlattenConsecutive layers to group and process characters in pairs.- The shapes of tensors at each layer are inspected to ensure the network functions as intended.- A bug in the BatchNorm1D implementation is identified and fixed to correctly handle multi-dimensional inputs.

re-running baseline code on block_size 8

implementing WaveNet

Does anyone know how visualize the dimensions of the tensors that are treated from around ? I'm having a really hard time to keep up with what's what. Thanks!

training the WaveNet: first pass

At , it sounds like we compared two architectures, both with 22k parameters and an 8 character window:* 1 layer, full connectivity* 3 layers, tree-like connectivityIn a single layer, full connectivity outperforms partial connectivity.But partial connectivity uses fewer parameters, so we can afford to build more layers.

fixing batchnorm1d bug

re-training WaveNet with bug fix

*Re-training the WaveNet with Bug Fix (****)*- The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.- The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation.

With the batchnorm bug at around , why does it still work?, if the batch norm is producing the wrong shape why is there not an error?

scaling up our WaveNetconclusions

*Scaling up Our WaveNet (****)*- The number of embedding and hidden units are increased, leading to a model with 76,000 parameters.- Despite longer training times, the validation performance improves to 1.993.- The need for an experimental harness to efficiently conduct hyperparameter searches is emphasized.

experimental harness

*Experimental Harness (****)*- The lack of a proper experimental setup is acknowledged as a limitation of the current approach.- Potential future topics are discussed, including:- Implementing dilated causal convolutions- Exploring residual and skip connections- Setting up an evaluation harness- Covering recurrent neural networks and transformers

WaveNet but with “dilated causal convolutions”

The sentence that Anderej said at made me realize something, something very deep. 🔥

torch.nn

the development process of building deep neural nets

going forward

improve on my loss! how far can we improve a WaveNet on this data?
