Andrej Karpathy
|
a330148c22
|
add ability to override config params from command line args. re-inventing the wheel a bit here, should i just use yacs or something? i just really really really do not like dependencies
|
2022-05-27 12:16:07 -07:00 |
|
Andrej Karpathy
|
8425759c24
|
early work, refactoring the adder first
|
2022-05-27 10:04:52 -07:00 |
|
Andrej Karpathy
|
3ed14b2cec
|
i know it doesn't look like much, but this kwarg was not used lol :D
|
2022-03-27 17:48:05 +01:00 |
|
Andrej Karpathy
|
107b6d7e31
|
add comment to clarify #39 . Ty @JonathanSum for inspiration PR
|
2022-03-26 13:52:51 +00:00 |
|
Andrej Karpathy
|
dffb6a14e2
|
Merge branch 'waynemystir-master'
|
2022-03-26 13:48:03 +00:00 |
|
Andrej Karpathy
|
031ad36f29
|
don't use default kwargs, in my experience lead to bugs always
|
2022-03-26 13:47:52 +00:00 |
|
Thomas Viehmann
|
176be2d9bf
|
initialize position embeddings
|
2022-03-26 13:36:20 +00:00 |
|
Mishig Davaadorj
|
fd2977c4e0
|
Fix broken hugging face link & add link to huggingface / transformers
|
2022-03-26 13:36:20 +00:00 |
|
Andrej
|
94d880648e
|
Merge pull request #62 from t-vi/init
initialize position embeddings
|
2022-03-26 12:23:31 +00:00 |
|
Andrej
|
ea8706d964
|
Merge pull request #63 from mishig25/patch-1
Fix broken hugging face link & add link to huggingface / transformers
|
2022-03-26 12:21:39 +00:00 |
|
Mishig Davaadorj
|
bac74347ff
|
Fix broken hugging face link & add link to huggingface / transformers
|
2021-12-08 11:37:43 +01:00 |
|
Thomas Viehmann
|
744d41003a
|
initialize position embeddings
|
2021-10-25 14:43:04 +02:00 |
|
waynemystir
|
8fcaafb367
|
move instantiation of DataLoader
|
2020-11-20 13:44:49 -05:00 |
|
Andrej
|
4050db6040
|
Merge pull request #32 from brchristian/patch-1
Fix typo in comment in play_char.ipynb
|
2020-08-25 22:41:18 -07:00 |
|
Andrej
|
c43600576e
|
Merge pull request #31 from fpgaminer/master
fix CharDataset::__len__ off by one error
|
2020-08-25 22:40:47 -07:00 |
|
brchristian
|
4b5d96b99c
|
Fix typo in comment in play_char.ipynb
|
2020-08-25 20:40:17 -07:00 |
|
fpgaminer
|
a7b13e02ff
|
fix CharDataset::__len__ off by one error
|
2020-08-25 18:16:49 -07:00 |
|
Andrej Karpathy
|
339f4e7ad3
|
fix dataloader issue pointed out by @fpgaminer in #28 and introduce shuffle=True and pin_memory=True as defaults. That said I'm still not very happy with this demo because we're likely overfitting a massive model to tiny text and nothing is really tuned at all. This needs a real train/test dataset and a tiny bit of hyperparameter search, todo.
|
2020-08-24 23:23:53 -07:00 |
|
Andrej
|
94187b944c
|
Merge pull request #25 from michaellavelle/shakespeare
Correcting the Bard's name
|
2020-08-23 22:38:42 -07:00 |
|
Michael Lavelle
|
effa35fd93
|
Correcting the Bard's name
|
2020-08-24 06:35:45 +01:00 |
|
Andrej Karpathy
|
63902c8d09
|
remove passive aggressive comment. control yourself andrej.
|
2020-08-23 19:36:23 -07:00 |
|
Andrej Karpathy
|
38d7327dfd
|
instead of -1e10 use float -inf, which I think will play nicer with fp16 down the line
|
2020-08-23 17:47:05 -07:00 |
|
“Andrej
|
f683085892
|
resolve merge conflict, this is not going well at all
|
2020-08-23 17:30:23 -07:00 |
|
“Andrej
|
a8835cfebc
|
bleh resolve merge conflicts
|
2020-08-23 17:26:19 -07:00 |
|
“Andrej
|
5a67ab913d
|
add early stopping to cifar10 image demo
|
2020-08-23 17:19:45 -07:00 |
|
“Andrej
|
421caf8b20
|
mit license file
|
2020-08-23 17:09:21 -07:00 |
|
“Andrej
|
bbbdac74fa
|
properly separate params that should be weight decayed, and make a small incremental step towards Lightning compatibility by creating the optimizer object inside the model's configure_optimizers
|
2020-08-23 15:48:20 -07:00 |
|
“Andrej
|
23982656df
|
add early stopping logic
|
2020-08-23 15:09:09 -07:00 |
|
Andrej Karpathy
|
d100e2251a
|
Merge branch 'master' of github.com:karpathy/minGPT
|
2020-08-22 17:32:10 -07:00 |
|
Andrej Karpathy
|
eca27f6316
|
Merge branch 'master' of github.com:karpathy/minGPT
|
2020-08-22 17:32:10 -07:00 |
|
Andrej Karpathy
|
4e152c7aee
|
add demo of image gpt trained on CIFAR-10
|
2020-08-22 17:29:53 -07:00 |
|
Andrej Karpathy
|
d15a85719e
|
add demo of image gpt trained on CIFAR-10
|
2020-08-22 17:29:53 -07:00 |
|
Andrej
|
382ac70290
|
Merge pull request #21 from jkravanja/master
Sort characters to always return same mapping
|
2020-08-21 16:39:49 -07:00 |
|
Andrej
|
ebcc03ec7e
|
Merge pull request #21 from jkravanja/master
Sort characters to always return same mapping
|
2020-08-21 16:39:49 -07:00 |
|
Jaka Kravanja
|
88bf19a869
|
Sort characters to always return same mapping
|
2020-08-22 02:09:25 +03:00 |
|
Jaka Kravanja
|
004b807eb2
|
Sort characters to always return same mapping
|
2020-08-22 02:09:25 +03:00 |
|
Andrej
|
3f1d1036d7
|
Merge pull request #6 from shivamtawari/patch-1
Update README.md
|
2020-08-19 00:17:36 -07:00 |
|
Andrej
|
c97efac9a9
|
Merge pull request #6 from shivamtawari/patch-1
Update README.md
|
2020-08-19 00:17:36 -07:00 |
|
Andrej Karpathy
|
8909e1b646
|
fix a dumb bug, intended to use -1e10 instead of 1e-10. thank you @fpgaminer for spotting and bringing to my attention
|
2020-08-18 17:05:59 -07:00 |
|
Andrej Karpathy
|
d708b1e5e2
|
fix a dumb bug, intended to use -1e10 instead of 1e-10. thank you @fpgaminer for spotting and bringing to my attention
|
2020-08-18 17:05:59 -07:00 |
|
Shivam Tawari
|
31cd989610
|
Update README.md
|
2020-08-18 15:14:28 +05:30 |
|
Shivam Tawari
|
25c2ad25dd
|
Update README.md
|
2020-08-18 15:14:28 +05:30 |
|
Andrej Karpathy
|
5433de6158
|
first commit, able to multigpu train fp32 GPTs on math and character-level data, but have done barely any tuning.
|
2020-08-17 00:39:02 -07:00 |
|
Andrej Karpathy
|
0d9d098cd2
|
first commit, able to multigpu train fp32 GPTs on math and character-level data, but have done barely any tuning.
|
2020-08-17 00:39:02 -07:00 |
|