Makemore Karpathy

Karpathy make more youtube video -> best ML lectures!

Hand craft nn (mlp3) vs. Torch.nn (mlp4)

Hand craft nn

  • 自己要處理 train 和 evaluation 的不同
    • BatchNormal 不一樣
    • batch = 1 常常會有問題: 計算 var!!! batch = 1 計算 var 一定有問題!

Torch nn (CPU)

  • 分成 model.train() and model.eval() 爲了
    • BatchNormal1d, BatchNormal2d 在 training and evaluation 的不同

Torch nn (GPU)

  • Very slow, not sure why!!!

Reference

Karpathy youtube video make more part 3: Building makemore Part 3: Activations & Gradients, BatchNorm (youtube.com)

[Build Better Deep Learning Models with Batch and Layer Normalization Pinecone](https://www.pinecone.io/learn/batch-layer-normalization/)

[2003.07845] PowerNorm: Rethinking Batch Normalization in Transformers (arxiv.org)

Building makemore Part 3: Activations & Gradients, BatchNorm (youtube.com)