minGPT

mirror/minGPT

Fork 0

mirror of https://github.com/karpathy/minGPT synced 2024-11-15 19:10:39 +01:00

Commit Graph

Select branches

Hide Pull Requests

feature/lightning

master

refactor_wip

#100

#100

#101

#101

#102

#102

#103

#103

#106

#106

#108

#109

#109

#110

#110

#116

#116

#117

#117

#121

#121

#122

#122

#124

#124

#137

#138

#138

#14

#14

#141

#142

#143

#143

#16

#18

#19

#19

#2

#20

#21

#24

#25

#26

#27

#29

#30

#31

#32

#35

#35

#38

#41

#43

#46

#50

#52

#53

#55

#58

#6

#60

#60

#61

#62

#63

#64

#66

#66

#69

#69

#7

#7

#72

#72

#73

#74

#76

#78

#80

#80

#81

#82

#83

#84

#85

#86

#86

#87

#87

#88

#88

#89

#89

#91

#91

#92

#92

#96

#96

#97

#97

d9ea878100 add maxiters to trainer Andrej Karpathy 2022-07-01 08:31:46 -0700
1f33571400 changing default seed to the suspect SOTA value as per https://arxiv.org/abs/2109.08203 Andrej Karpathy 2022-06-29 10:05:41 -0700
7f6775a671 no Andrej 2022-06-27 20:41:51 +0000
00aa9cb2ed ok i hated the previous global/local config idea. reverting it and simplying and i think this is the best api so far Andrej 2022-06-27 20:41:01 +0000
ea20661f78 be more defensive around model_type, don't let the user shoot themselves in the foot Andrej 2022-06-27 19:26:26 +0000
1c8842dbe9 small tweaks Andrej 2022-06-27 19:20:05 +0000
b483fbe8db suppress warnings and lightweight docs and changes Andrej 2022-06-24 20:48:05 +0000
c6c973738b implement scaled init per gpt-2 paper Andrej 2022-06-24 17:48:20 +0000
7e68832554 delete ugly Conv1D, a real abomination of this Universe Andrej 2022-06-24 03:22:27 +0000
a1cad3f37a oops i forgot to push this file for generation and testing that we load the weights well Andrej 2022-06-24 02:52:15 +0000
13a42a6ce0 ok step 1, create a get_pretrained function that inits with openai weights' Andrej 2022-06-24 01:43:39 +0000
dfb892044d big big refactor so that we can load actual gpt2 weights from openai. this is will wip, want to clean it up good Andrej 2022-06-23 23:33:44 +0000
3cf811e67c delegate more stuff to the Trainer class Andrej 2022-06-01 17:55:36 +0000
8860486f66 attempt to make model config a little bit better, still hate it Andrej 2022-06-01 17:14:22 +0000
2db3a4b7b3 first implementation of chargpt, just pasting it into current api, but i really hate the way the code is set up atm, wip Andrej 2022-05-31 23:07:54 +0000
52cb434db2 tiny tweaks to printing and some function apis Andrej 2022-05-31 23:07:00 +0000
9ec160cd8c small tweaks. found an issue with my brilliant plan to solve all configuration problems. have to think about more Andrej Karpathy 2022-05-28 15:05:34 -0700
82768a7a95 small tweaks and a bug fix that makes me doubt the current approach with the configs a bit... shop myself in the foot a bit Andrej 2022-05-28 03:44:32 +0000
b162d3f44e fix small bugs and add ability to train/eval on either cpu or gpu Andrej 2022-05-28 03:17:24 +0000
fa1b46f78a bit more logging, including saving a model but only if it's the best one yet Andrej Karpathy 2022-05-27 16:06:31 -0700
a330148c22 add ability to override config params from command line args. re-inventing the wheel a bit here, should i just use yacs or something? i just really really really do not like dependencies Andrej Karpathy 2022-05-27 12:16:07 -0700
8425759c24 early work, refactoring the adder first Andrej Karpathy 2022-05-27 10:04:52 -0700
288ef99ee1 Better comment Mishig Davaadorj 2022-04-28 16:07:33 +0200
c79fc9dc78 Chore Mishig Davaadorj 2022-04-28 16:06:47 +0200
7e4da7c8d9 Add tests Mishig Davaadorj 2022-04-28 00:19:41 +0200
a81eaef8d7 Add tests Mishig Davaadorj 2022-04-28 00:19:41 +0200
b43f3d8b9c

Merge 0ac226d73b85dc68aa15eaf1ec070f98b8d5881c into 3ed14b2cec0dfdad3f4b2831f2b4a86d11aef150 SpeedCoder5 2022-04-25 16:02:09 -0400
0ac226d73b #71 use config n_head instead of hardcoded 4 heads speedcoder5 2022-04-25 15:57:28 -0400
6290adf5c9

Merge 8565f3867a64947df366165acd0ce9bf2eff901c into 3ed14b2cec0dfdad3f4b2831f2b4a86d11aef150 Rohan Awhad 2022-04-21 20:00:27 +0530
8565f3867a

Added the condition for test_dataset's presence. Rohan Awhad 2022-04-21 19:59:40 +0530
3ed14b2cec i know it doesn't look like much, but this kwarg was not used lol :D Andrej Karpathy 2022-03-27 17:48:05 +0100
107b6d7e31 add comment to clarify #39 . Ty @JonathanSum for inspiration PR Andrej Karpathy 2022-03-26 13:52:51 +0000
dffb6a14e2 Merge branch 'waynemystir-master' Andrej Karpathy 2022-03-26 13:48:03 +0000
031ad36f29 don't use default kwargs, in my experience lead to bugs always Andrej Karpathy 2022-03-26 13:47:52 +0000
176be2d9bf initialize position embeddings Thomas Viehmann 2021-10-25 14:43:04 +0200
fd2977c4e0 Fix broken hugging face link & add link to huggingface / transformers Mishig Davaadorj 2021-12-08 11:37:43 +0100
79b1cfe6e2

Merge 5a88af8014d0201eb24519cef8253b0aec63111c into 94d880648e4fe5ccb2ff8f4b6cea0e30c8a63cc7 Eduard Mukans 2022-03-26 12:26:44 +0000
94d880648e

Merge pull request #62 from t-vi/init Andrej 2022-03-26 12:23:31 +0000
ea8706d964

Merge pull request #63 from mishig25/patch-1 Andrej 2022-03-26 12:21:39 +0000
5c53da1609

Merge cc68da585e93bbb833c1e7d0d380b1ad1de63352 into 4050db60409b5bbaaa3302cee1e49847fc145c65 Aravind Srinivas 2022-03-18 00:17:42 -0700
cc68da585e add util for load_checkpoint; and make sure sampling works aravindsrinivas 2022-03-18 07:16:20 +0000
6f1fccdd4b

Update .gitignore Aravind Srinivas 2022-03-17 16:30:25 -0700
d70fc028e6

remove input.txt Aravind Srinivas 2022-03-17 16:29:29 -0700
0630dbe926 add distributed data parallel trainer aravindsrinivas 2022-03-17 23:14:27 +0000
c12d1884fa modify to use pooling instead annasajkh 2022-01-03 03:41:02 +0700
bac74347ff

Fix broken hugging face link & add link to huggingface / transformers Mishig Davaadorj 2021-12-08 11:37:43 +0100
744d41003a initialize position embeddings Thomas Viehmann 2021-10-25 14:43:04 +0200
4a5522f18e Add play_copilot notebook Matt Potma 2021-07-17 14:32:38 -0700
5b208e7fbc Remove play_copilot python script Matt Potma 2021-07-17 14:32:27 -0700
e50265911d Add python script for cloud training Matt Potma 2021-07-13 21:20:48 -0700
5a88af8014

Merge pull request #3 from emukans/feature/play-words Eduard Mukans 2021-04-02 01:14:07 +0300
1a92a7df66 Clean-up Eduards Mukans 2021-04-02 01:12:01 +0300
b3adfdedb7

Merge pull request #2 from emukans/feature/play-words Eduard Mukans 2021-04-02 01:10:38 +0300
4e578f2a7f Clean-up Eduards Mukans 2021-04-02 01:09:49 +0300
aee14d6d9b

Merge pull request #1 from emukans/feature/play-words Eduard Mukans 2021-04-02 01:06:40 +0300
43a6b6eb0c Refactored tokenization and trained a model Eduards Mukans 2021-04-02 01:05:46 +0300
af71bff018 Add play_word notebook with BPE tokenizer Eduards Mukans 2021-03-30 18:49:06 +0300
cc8eda2120

update README (#1) Ravi Kalia 2021-02-25 18:30:39 -0500
bb8b24234a

Update README.md Ravi Kalia 2021-02-25 18:29:33 -0500
a492a38901 sqrt-d scaling change Aravind 2020-12-30 10:43:19 +0000
b4f8d57aaf

Reverted back to old version because the fix did not work properly Alex 2020-12-27 21:19:16 -0800
67a54091c8

Trying out the layer norm fix by Scikud Alex 2020-12-27 19:22:36 -0800
8fcaafb367 move instantiation of DataLoader waynemystir 2020-11-20 13:44:49 -0500
99ff4719d7

Layer norm should be after residual block Scikud 2020-11-12 00:19:58 -0800