Karpathy make more youtube video -> best ML lectures!
Hand craft nn (mlp3) vs. Torch.nn (mlp4)
Hand craft nn
- 自己要處理 train 和 evaluation 的不同
- BatchNormal 不一樣
- batch = 1 常常會有問題: 計算 var!!! batch = 1 計算 var 一定有問題!
Torch nn (CPU)
- 分成 model.train() and model.eval() 爲了
- BatchNormal1d, BatchNormal2d 在 training and evaluation 的不同
Torch nn (GPU)
- Very slow, not sure why!!!
Reference
Karpathy youtube video make more part 3: Building makemore Part 3: Activations & Gradients, BatchNorm (youtube.com)
[Build Better Deep Learning Models with Batch and Layer Normalization | Pinecone](https://www.pinecone.io/learn/batch-layer-normalization/) |
[2003.07845] PowerNorm: Rethinking Batch Normalization in Transformers (arxiv.org)
Building makemore Part 3: Activations & Gradients, BatchNorm (youtube.com)