Thomas Viehmann
|
176be2d9bf
|
initialize position embeddings
|
2022-03-26 13:36:20 +00:00 |
|
waynemystir
|
8fcaafb367
|
move instantiation of DataLoader
|
2020-11-20 13:44:49 -05:00 |
|
Andrej Karpathy
|
339f4e7ad3
|
fix dataloader issue pointed out by @fpgaminer in #28 and introduce shuffle=True and pin_memory=True as defaults. That said I'm still not very happy with this demo because we're likely overfitting a massive model to tiny text and nothing is really tuned at all. This needs a real train/test dataset and a tiny bit of hyperparameter search, todo.
|
2020-08-24 23:23:53 -07:00 |
|
Andrej Karpathy
|
63902c8d09
|
remove passive aggressive comment. control yourself andrej.
|
2020-08-23 19:36:23 -07:00 |
|
Andrej Karpathy
|
38d7327dfd
|
instead of -1e10 use float -inf, which I think will play nicer with fp16 down the line
|
2020-08-23 17:47:05 -07:00 |
|
“Andrej
|
bbbdac74fa
|
properly separate params that should be weight decayed, and make a small incremental step towards Lightning compatibility by creating the optimizer object inside the model's configure_optimizers
|
2020-08-23 15:48:20 -07:00 |
|
“Andrej
|
23982656df
|
add early stopping logic
|
2020-08-23 15:09:09 -07:00 |
|
Andrej Karpathy
|
d708b1e5e2
|
fix a dumb bug, intended to use -1e10 instead of 1e-10. thank you @fpgaminer for spotting and bringing to my attention
|
2020-08-18 17:05:59 -07:00 |
|
Andrej Karpathy
|
0d9d098cd2
|
first commit, able to multigpu train fp32 GPTs on math and character-level data, but have done barely any tuning.
|
2020-08-17 00:39:02 -07:00 |
|