1
0
Fork 0
mirror of https://github.com/karpathy/minGPT synced 2024-06-08 15:56:05 +02:00
Commit Graph

9 Commits

Author SHA1 Message Date
Thomas Viehmann 176be2d9bf initialize position embeddings 2022-03-26 13:36:20 +00:00
waynemystir 8fcaafb367 move instantiation of DataLoader 2020-11-20 13:44:49 -05:00
Andrej Karpathy 339f4e7ad3 fix dataloader issue pointed out by @fpgaminer in #28 and introduce shuffle=True and pin_memory=True as defaults. That said I'm still not very happy with this demo because we're likely overfitting a massive model to tiny text and nothing is really tuned at all. This needs a real train/test dataset and a tiny bit of hyperparameter search, todo. 2020-08-24 23:23:53 -07:00
Andrej Karpathy 63902c8d09 remove passive aggressive comment. control yourself andrej. 2020-08-23 19:36:23 -07:00
Andrej Karpathy 38d7327dfd instead of -1e10 use float -inf, which I think will play nicer with fp16 down the line 2020-08-23 17:47:05 -07:00
“Andrej bbbdac74fa properly separate params that should be weight decayed, and make a small incremental step towards Lightning compatibility by creating the optimizer object inside the model's configure_optimizers 2020-08-23 15:48:20 -07:00
“Andrej 23982656df add early stopping logic 2020-08-23 15:09:09 -07:00
Andrej Karpathy d708b1e5e2 fix a dumb bug, intended to use -1e10 instead of 1e-10. thank you @fpgaminer for spotting and bringing to my attention 2020-08-18 17:05:59 -07:00
Andrej Karpathy 0d9d098cd2 first commit, able to multigpu train fp32 GPTs on math and character-level data, but have done barely any tuning. 2020-08-17 00:39:02 -07:00