minGPT

mirror/minGPT

Fork 0

mirror of https://github.com/karpathy/minGPT synced 2024-11-15 19:10:39 +01:00

Commit Graph

Select branches

Hide Pull Requests

feature/lightning

master

refactor_wip

#100

#100

#101

#101

#102

#102

#103

#103

#106

#106

#108

#109

#109

#110

#110

#116

#116

#117

#117

#121

#121

#122

#122

#124

#124

#137

#138

#138

#14

#14

#141

#142

#143

#143

#16

#18

#19

#19

#2

#20

#21

#24

#25

#26

#27

#29

#30

#31

#32

#35

#35

#38

#41

#43

#46

#50

#52

#53

#55

#58

#6

#60

#60

#61

#62

#63

#64

#66

#66

#69

#69

#7

#7

#72

#72

#73

#74

#76

#78

#80

#80

#81

#82

#83

#84

#85

#86

#86

#87

#87

#88

#88

#89

#89

#91

#91

#92

#92

#96

#96

#97

#97

5b22aacc53

typo marxav 2020-11-10 22:58:28 +0100
17efd27da0

replace language by orthography marxav 2020-11-10 22:56:50 +0100
a500aea83a

Create marxav 2020-10-27 15:46:44 +0100
7c2a08d8f3

Reverting BlingDL changes due to errors Alex 2020-10-02 12:38:04 -0700
754480332e

Removed another wrong code line Alex 2020-10-02 12:31:29 -0700
b59e120227

Trying BlinkDL improvements separately from PytorchLXLA Alex 2020-10-02 12:26:33 -0700
c9ee99858f

Reverted back to old code due to serious issues with the new one Alex 2020-10-02 12:18:21 -0700
17bfe78350

Added all improvements by BlinkDL Alex 2020-10-02 04:30:46 -0700
705483ec3a

Added auto-TQDM Alex 2020-10-02 04:24:48 -0700
20e5d4e654

An attempt to tune the minGPT code. Please stand-by... Alex 2020-10-02 04:23:31 -0700
b139d2fc72

Updated tqdm Alex 2020-09-22 05:52:28 -0700
b0906c2117

#39 Happy Sugar Life 2020-09-20 21:48:57 -1200
3ea94c6e7b

typo 'terations' Happy Sugar Life 2020-09-14 04:50:21 -1200
359137b457

Add files via upload Alex 2020-09-09 20:46:55 -0700
686b41c0f8

[Racially Neutral Code] Happy Sugar Life 2020-09-05 02:07:22 -1200
0fa482b35d

Merge a796899f656345ac541aba49eccb368f49b7d730 into 4050db60409b5bbaaa3302cee1e49847fc145c65 Andrej 2020-08-30 11:41:57 -0700
a796899f65 reorg the bench code to support multigpu training, have to indent properly under __main__ feature/lightning Andrej Karpathy 2020-08-30 11:40:31 -0700
492b79fb31 get rid of spurious function for the model Andrej Karpathy 2020-08-30 11:39:55 -0700
d91bb1c0be make labels non-blocking transfer to overlap them, but i don't really expect this to do too much to latency Andrej Karpathy 2020-08-30 11:11:46 -0700
4817231b23 testing now works with both lightning and minLightning Andrej Karpathy 2020-08-30 11:11:17 -0700
9b1e5a461f delete Result structs in favor of dicts Andrej Karpathy 2020-08-30 10:46:32 -0700
452a5ab9a0 massive refactor yet again. this was all probably a pretty bad idea Andrej Karpathy 2020-08-29 23:58:45 -0700
1aa67ca527 switch to a faster version of zero_grad() Andrej Karpathy 2020-08-29 20:50:48 -0700
ebd40f112c support fp16/32 precision in bench Andrej Karpathy 2020-08-29 17:47:06 -0700
0ed3376b3f move instantiation of text dataset into the constructor so we don't have to create it twice Andrej Karpathy 2020-08-29 17:33:31 -0700
fa10298a8d use a standard benchmark (text8) and implement train/val/test splits Andrej Karpathy 2020-08-29 17:30:41 -0700
fb37e03cd1 refactor into a datamodule, attempt number 1 Andrej Karpathy 2020-08-29 16:38:58 -0700
81650ae4d7 one more refactor, this is better because the equivalence to lightning is now much cleaner and all of lightning functionality is in one file Andrej Karpathy 2020-08-29 15:40:21 -0700
990c0c7d9a final integration pieces, now runs with both, but it ain't super pretty yet... Andrej Karpathy 2020-08-29 15:19:55 -0700
a5a6d1a638 add training_step to the model and remove DataParallel functionality from the base Trainer, will go to Lightning Andrej Karpathy 2020-08-29 14:01:51 -0700
923b6fcf17 and finally get rid of Config object for the Trainer Andrej Karpathy 2020-08-29 13:39:00 -0700
c0823ec247 model is also passed into fit() instead of __init__ ,sure. Andrej Karpathy 2020-08-29 12:48:24 -0700
e88f0767cb data loaders are passed directly to fit() instead of the dataset, version 2 haha Andrej Karpathy 2020-08-29 12:41:12 -0700
3fa57cd175 data loaders are passed directly to fit() instead of the datasets Andrej Karpathy 2020-08-29 12:39:29 -0700
08f5b9ac03 refactor out the learning rate decay class as a callback Andrej Karpathy 2020-08-29 12:28:49 -0700
61102983f5 step 1: free the GPT module of config and flatten out the args. WIP, breaks notebooks Andrej Karpathy 2020-08-29 11:36:15 -0700
4050db6040

Merge pull request #32 from brchristian/patch-1 Andrej 2020-08-25 22:41:18 -0700
c43600576e

Merge pull request #31 from fpgaminer/master Andrej 2020-08-25 22:40:47 -0700
4b5d96b99c

Fix typo in comment in play_char.ipynb brchristian 2020-08-25 20:40:17 -0700
a7b13e02ff fix CharDataset::__len__ off by one error fpgaminer 2020-08-25 18:16:49 -0700
a17028e8bc

Merge pull request #3 from abiller/karpathy-master Ariel Biller 2020-08-25 22:17:29 +0300
f7560c1b0e i hate merging notebooks. there ought to be a law! ariel 2020-08-25 22:14:25 +0300
b8703820b7

Create CODE_OF_CONDUCT.md Gaushik M.R 2020-08-25 07:29:02 -0400
339f4e7ad3 fix dataloader issue pointed out by @fpgaminer in #28 and introduce shuffle=True and pin_memory=True as defaults. That said I'm still not very happy with this demo because we're likely overfitting a massive model to tiny text and nothing is really tuned at all. This needs a real train/test dataset and a tiny bit of hyperparameter search, todo. Andrej Karpathy 2020-08-24 23:23:53 -0700
f00dbe408c bugfix, pycharm com. does not refactor jupyter notebooks (!!) ariel 2020-08-24 23:28:59 +0300
448c847781 Convert config to attrs - even less lines and now immutability is enforced. ariel 2020-08-24 23:16:22 +0300
ed8b745ea4 Renaming for enhanced readability, addint some type annotations ariel 2020-08-24 23:15:29 +0300
18ce6dff5a

Merge pull request #1 from karpathy/master Ariel Biller 2020-08-24 22:22:31 +0300
8b6e3a0d83 remove comment j-planet 2020-08-24 00:10:45 -0700
1f16e14924 replace the first 2 digits of the answer with 0s j-planet 2020-08-24 00:07:19 -0700
94187b944c

Merge pull request #25 from michaellavelle/shakespeare Andrej 2020-08-23 22:38:42 -0700
effa35fd93 Correcting the Bard's name Michael Lavelle 2020-08-24 06:35:45 +0100
63902c8d09 remove passive aggressive comment. control yourself andrej. Andrej Karpathy 2020-08-23 19:36:23 -0700
38d7327dfd instead of -1e10 use float -inf, which I think will play nicer with fp16 down the line Andrej Karpathy 2020-08-23 17:47:05 -0700
f683085892 resolve merge conflict, this is not going well at all “Andrej 2020-08-23 17:30:23 -0700
a8835cfebc bleh resolve merge conflicts “Andrej 2020-08-23 17:26:19 -0700
5a67ab913d add early stopping to cifar10 image demo “Andrej 2020-08-23 17:19:45 -0700
421caf8b20 mit license file “Andrej 2020-08-23 17:09:21 -0700
bbbdac74fa properly separate params that should be weight decayed, and make a small incremental step towards Lightning compatibility by creating the optimizer object inside the model's configure_optimizers “Andrej 2020-08-23 15:48:20 -0700
23982656df add early stopping logic “Andrej 2020-08-23 15:09:09 -0700
2a2e8d5388 Excluding LayerNorm, Embedding, freestanding Parameters and parameters with bias in the name from weight decay. Allowing an override list to be configured on TrainerConfig Michael Lavelle 2020-08-23 21:30:44 +0100
08c08d8db0

Merge f8ce2784903c93d7cec8d91fcb93c7d38a9e4baf into d100e2251a258ea6c72e59eeba83539567e8fc8c Benjamin Wild 2020-08-22 21:21:54 -0400
14d2a75f20

Merge ad77167036e87b72d6db117678020741005da6c6 into d100e2251a258ea6c72e59eeba83539567e8fc8c William Falcon 2020-08-23 07:55:33 +0700
3065d61408

Merge d8e08a2471468142838bd4e97d132fe52d89896c into d100e2251a258ea6c72e59eeba83539567e8fc8c Sergey Kolesnikov 2020-08-23 06:22:08 +0530