minGPT

mirror of https://github.com/karpathy/minGPT synced 2024-06-08 15:56:05 +02:00

Author	SHA1	Message	Date
Thomas Viehmann	176be2d9bf	initialize position embeddings	2022-03-26 13:36:20 +00:00
waynemystir	8fcaafb367	move instantiation of DataLoader	2020-11-20 13:44:49 -05:00
Andrej Karpathy	339f4e7ad3	fix dataloader issue pointed out by @fpgaminer in #28 and introduce shuffle=True and pin_memory=True as defaults. That said I'm still not very happy with this demo because we're likely overfitting a massive model to tiny text and nothing is really tuned at all. This needs a real train/test dataset and a tiny bit of hyperparameter search, todo.	2020-08-24 23:23:53 -07:00
Andrej Karpathy	63902c8d09	remove passive aggressive comment. control yourself andrej.	2020-08-23 19:36:23 -07:00
Andrej Karpathy	38d7327dfd	instead of -1e10 use float -inf, which I think will play nicer with fp16 down the line	2020-08-23 17:47:05 -07:00
“Andrej	bbbdac74fa	properly separate params that should be weight decayed, and make a small incremental step towards Lightning compatibility by creating the optimizer object inside the model's configure_optimizers	2020-08-23 15:48:20 -07:00
“Andrej	23982656df	add early stopping logic	2020-08-23 15:09:09 -07:00
Andrej Karpathy	d708b1e5e2	fix a dumb bug, intended to use -1e10 instead of 1e-10. thank you @fpgaminer for spotting and bringing to my attention	2020-08-18 17:05:59 -07:00
Andrej Karpathy	0d9d098cd2	first commit, able to multigpu train fp32 GPTs on math and character-level data, but have done barely any tuning.	2020-08-17 00:39:02 -07:00