1
0
Fork 0
mirror of https://github.com/karpathy/minGPT synced 2024-05-21 06:56:03 +02:00
Commit Graph

75 Commits

Author SHA1 Message Date
Andrej 12f346a63d make generation script into a notebook, makes much more sense that way i think, and much easier to use 2022-07-08 23:53:14 +00:00
Andrej b14c99191a rename the script to make sense 2022-07-08 22:58:49 +00:00
Andrej 803f38800d refactor pretrained weight loading into from_pretrained and add unit tests 2022-07-08 22:56:15 +00:00
Andrej 4a56b20f80 fix parameter counting 2022-07-08 21:10:54 +00:00
Andrej 28bcbd0ad9 ocd is killing me 2022-07-08 20:35:08 +00:00
Andrej 449a980d39 updates to readme 2022-07-08 20:34:23 +00:00
Andrej b7c9acc46c remove legacy notebooks 2022-07-08 20:20:35 +00:00
Andrej e7fe54898d refactor readme to match the repo 2022-07-08 20:19:17 +00:00
Andrej Karpathy 7569ab9d7f simple notebook demo showing how to use minGPT 2022-07-07 18:15:26 -07:00
Andrej 2e979dde5f ummm eyeroll 2022-07-01 15:34:34 +00:00
Andrej Karpathy 2f3400f42a split out register_callback to set/add 2022-07-01 08:32:19 -07:00
Andrej Karpathy d9ea878100 add maxiters to trainer 2022-07-01 08:31:46 -07:00
Andrej Karpathy 1f33571400 changing default seed to the suspect SOTA value as per https://arxiv.org/abs/2109.08203 2022-06-29 10:05:41 -07:00
Andrej 7f6775a671 no 2022-06-27 20:41:51 +00:00
Andrej 00aa9cb2ed ok i hated the previous global/local config idea. reverting it and simplying and i think this is the best api so far 2022-06-27 20:41:01 +00:00
Andrej ea20661f78 be more defensive around model_type, don't let the user shoot themselves in the foot 2022-06-27 19:26:26 +00:00
Andrej 1c8842dbe9 small tweaks 2022-06-27 19:20:05 +00:00
Andrej b483fbe8db suppress warnings and lightweight docs and changes 2022-06-24 20:48:05 +00:00
Andrej c6c973738b implement scaled init per gpt-2 paper 2022-06-24 17:48:20 +00:00
Andrej 7e68832554 delete ugly Conv1D, a real abomination of this Universe 2022-06-24 03:22:27 +00:00
Andrej a1cad3f37a oops i forgot to push this file for generation and testing that we load the weights well 2022-06-24 02:52:15 +00:00
Andrej 13a42a6ce0 ok step 1, create a get_pretrained function that inits with openai weights' 2022-06-24 01:43:39 +00:00
Andrej dfb892044d big big refactor so that we can load actual gpt2 weights from openai. this is will wip, want to clean it up good 2022-06-23 23:33:44 +00:00
Andrej 3cf811e67c delegate more stuff to the Trainer class 2022-06-01 17:55:36 +00:00
Andrej 8860486f66 attempt to make model config a little bit better, still hate it 2022-06-01 17:14:22 +00:00
Andrej 2db3a4b7b3 first implementation of chargpt, just pasting it into current api, but i really hate the way the code is set up atm, wip 2022-05-31 23:07:54 +00:00
Andrej 52cb434db2 tiny tweaks to printing and some function apis 2022-05-31 23:07:00 +00:00
Andrej Karpathy 9ec160cd8c small tweaks. found an issue with my brilliant plan to solve all configuration problems. have to think about more 2022-05-28 15:05:34 -07:00
Andrej 82768a7a95 small tweaks and a bug fix that makes me doubt the current approach with the configs a bit... shop myself in the foot a bit 2022-05-28 03:44:32 +00:00
Andrej b162d3f44e fix small bugs and add ability to train/eval on either cpu or gpu 2022-05-28 03:17:24 +00:00
Andrej Karpathy fa1b46f78a bit more logging, including saving a model but only if it's the best one yet 2022-05-27 16:06:31 -07:00
Andrej Karpathy a330148c22 add ability to override config params from command line args. re-inventing the wheel a bit here, should i just use yacs or something? i just really really really do not like dependencies 2022-05-27 12:16:07 -07:00
Andrej Karpathy 8425759c24 early work, refactoring the adder first 2022-05-27 10:04:52 -07:00
Andrej Karpathy 3ed14b2cec i know it doesn't look like much, but this kwarg was not used lol :D 2022-03-27 17:48:05 +01:00
Andrej Karpathy 107b6d7e31 add comment to clarify #39 . Ty @JonathanSum for inspiration PR 2022-03-26 13:52:51 +00:00
Andrej Karpathy dffb6a14e2 Merge branch 'waynemystir-master' 2022-03-26 13:48:03 +00:00
Andrej Karpathy 031ad36f29 don't use default kwargs, in my experience lead to bugs always 2022-03-26 13:47:52 +00:00
Thomas Viehmann 176be2d9bf initialize position embeddings 2022-03-26 13:36:20 +00:00
Mishig Davaadorj fd2977c4e0 Fix broken hugging face link & add link to huggingface / transformers 2022-03-26 13:36:20 +00:00
Andrej 94d880648e
Merge pull request #62 from t-vi/init
initialize position embeddings
2022-03-26 12:23:31 +00:00
Andrej ea8706d964
Merge pull request #63 from mishig25/patch-1
Fix broken hugging face link & add link to huggingface / transformers
2022-03-26 12:21:39 +00:00
Mishig Davaadorj bac74347ff
Fix broken hugging face link & add link to huggingface / transformers 2021-12-08 11:37:43 +01:00
Thomas Viehmann 744d41003a initialize position embeddings 2021-10-25 14:43:04 +02:00
waynemystir 8fcaafb367 move instantiation of DataLoader 2020-11-20 13:44:49 -05:00
Andrej 4050db6040
Merge pull request #32 from brchristian/patch-1
Fix typo in comment in play_char.ipynb
2020-08-25 22:41:18 -07:00
Andrej c43600576e
Merge pull request #31 from fpgaminer/master
fix CharDataset::__len__ off by one error
2020-08-25 22:40:47 -07:00
brchristian 4b5d96b99c
Fix typo in comment in play_char.ipynb 2020-08-25 20:40:17 -07:00
fpgaminer a7b13e02ff fix CharDataset::__len__ off by one error 2020-08-25 18:16:49 -07:00
Andrej Karpathy 339f4e7ad3 fix dataloader issue pointed out by @fpgaminer in #28 and introduce shuffle=True and pin_memory=True as defaults. That said I'm still not very happy with this demo because we're likely overfitting a massive model to tiny text and nothing is really tuned at all. This needs a real train/test dataset and a tiny bit of hyperparameter search, todo. 2020-08-24 23:23:53 -07:00
Andrej 94187b944c
Merge pull request #25 from michaellavelle/shakespeare
Correcting the Bard's name
2020-08-23 22:38:42 -07:00