Nano GPT
#ai
#colab
#gpt
#transformer
This post contains some notes on building nano GPT following Andrej Karpathy’s YouTube video Let’s build GPT: from scratch, in code, spelled out.
Set up
See https://yxiong.github.io/2023/05/19/colab-with-custom-gce-vm.html for more instructions.
- Create a new project in https://console.cloud.google.com/ named
Nano GPT
- enable billing (I do this at project level in order to properly keep track of costs)
- request to increase GPU quota to 1 (the default is 0)
- Deploy pre-configured colab VM from marketplace: https://console.cloud.google.com/marketplace/product/colab-marketplace-image-public/colab
- need to try different
Zone
to show see the availableMachine type
- started with a CPU instance
c2-standard-4
- need to try different
- Go to https://colab.research.google.com/ and “Connect to a custom GCE VM”
Bi-gram baseline model
- Random initial state before any optimization:
- Cross entropy loss is 4.68 (theoretical loss is -$\log(1/65)$=4.17)
- Output looks like
pdcbf?pGXepydZJSrF$Jrqt!:wwWSzPNxbjPiD&Q!a;yNt$Kr$o-gC$WSjJqfBKBySKtSKpwNNfyl&w:q-jluBatD$Lj;?yzyUca!UQ!vrpxZQgC-hlkq,ptKqHoiX-jjeLJ &slERj KUsBOL!mpJO!zLg'wNfqHAMgq'hZCWhu.W.IBcP RFJ&DEs,nw?pxE?xjNHHVxJ&D&vWWToiERJFuszPyZaNw$ EQJMgzaveDDIoiMl&sMHkzdRptRCPVjwW.RSVMjs-bgRkzrBTEa!!oP fRSxq.PLboTMkX'D
- After 10,000 steps of optimization (batch size 32, Adam optimizer)
- Cross entropy loss is 2.43
- Output looks like
CYOx? DUThinqunt. LaZAnde. athave l. KEONH: ARThanco be y,-hedarwnoddy scace, tridesar, wnl'shenous s ls, theresseys PlorseelapinghiybHen yof GLUCEN t l-t E: I hisgothers je are!-e! QLYotouciullle'z, Thitertho s? NDan'spererfo cist ripl chys er orlese; Yo jehof h hecere ek? wferommot mowo soaf yoi
References
- Let’s build GPT: from scratch, in code, spelled out.: YouTube video by Andrej Karpathy