1
0
mirror of https://github.com/karpathy/minGPT synced 2024-11-15 19:10:39 +01:00

Commit Graph

  • d9ea878100 add maxiters to trainer Andrej Karpathy 2022-07-01 08:31:46 -0700
  • 1f33571400 changing default seed to the suspect SOTA value as per https://arxiv.org/abs/2109.08203 Andrej Karpathy 2022-06-29 10:05:41 -0700
  • 7f6775a671 no Andrej 2022-06-27 20:41:51 +0000
  • 00aa9cb2ed ok i hated the previous global/local config idea. reverting it and simplying and i think this is the best api so far Andrej 2022-06-27 20:41:01 +0000
  • ea20661f78 be more defensive around model_type, don't let the user shoot themselves in the foot Andrej 2022-06-27 19:26:26 +0000
  • 1c8842dbe9 small tweaks Andrej 2022-06-27 19:20:05 +0000
  • b483fbe8db suppress warnings and lightweight docs and changes Andrej 2022-06-24 20:48:05 +0000
  • c6c973738b implement scaled init per gpt-2 paper Andrej 2022-06-24 17:48:20 +0000
  • 7e68832554 delete ugly Conv1D, a real abomination of this Universe Andrej 2022-06-24 03:22:27 +0000
  • a1cad3f37a oops i forgot to push this file for generation and testing that we load the weights well Andrej 2022-06-24 02:52:15 +0000
  • 13a42a6ce0 ok step 1, create a get_pretrained function that inits with openai weights' Andrej 2022-06-24 01:43:39 +0000
  • dfb892044d big big refactor so that we can load actual gpt2 weights from openai. this is will wip, want to clean it up good Andrej 2022-06-23 23:33:44 +0000
  • 3cf811e67c delegate more stuff to the Trainer class Andrej 2022-06-01 17:55:36 +0000
  • 8860486f66 attempt to make model config a little bit better, still hate it Andrej 2022-06-01 17:14:22 +0000
  • 2db3a4b7b3 first implementation of chargpt, just pasting it into current api, but i really hate the way the code is set up atm, wip Andrej 2022-05-31 23:07:54 +0000
  • 52cb434db2 tiny tweaks to printing and some function apis Andrej 2022-05-31 23:07:00 +0000
  • 9ec160cd8c small tweaks. found an issue with my brilliant plan to solve all configuration problems. have to think about more Andrej Karpathy 2022-05-28 15:05:34 -0700
  • 82768a7a95 small tweaks and a bug fix that makes me doubt the current approach with the configs a bit... shop myself in the foot a bit Andrej 2022-05-28 03:44:32 +0000
  • b162d3f44e fix small bugs and add ability to train/eval on either cpu or gpu Andrej 2022-05-28 03:17:24 +0000
  • fa1b46f78a bit more logging, including saving a model but only if it's the best one yet Andrej Karpathy 2022-05-27 16:06:31 -0700
  • a330148c22 add ability to override config params from command line args. re-inventing the wheel a bit here, should i just use yacs or something? i just really really really do not like dependencies Andrej Karpathy 2022-05-27 12:16:07 -0700
  • 8425759c24 early work, refactoring the adder first Andrej Karpathy 2022-05-27 10:04:52 -0700
  • 288ef99ee1 Better comment Mishig Davaadorj 2022-04-28 16:07:33 +0200
  • c79fc9dc78 Chore Mishig Davaadorj 2022-04-28 16:06:47 +0200
  • 7e4da7c8d9 Add tests Mishig Davaadorj 2022-04-28 00:19:41 +0200
  • a81eaef8d7 Add tests Mishig Davaadorj 2022-04-28 00:19:41 +0200
  • b43f3d8b9c
    Merge 0ac226d73b85dc68aa15eaf1ec070f98b8d5881c into 3ed14b2cec0dfdad3f4b2831f2b4a86d11aef150 SpeedCoder5 2022-04-25 16:02:09 -0400
  • 0ac226d73b #71 use config n_head instead of hardcoded 4 heads speedcoder5 2022-04-25 15:57:28 -0400
  • 6290adf5c9
    Merge 8565f3867a64947df366165acd0ce9bf2eff901c into 3ed14b2cec0dfdad3f4b2831f2b4a86d11aef150 Rohan Awhad 2022-04-21 20:00:27 +0530
  • 8565f3867a
    Added the condition for test_dataset's presence. Rohan Awhad 2022-04-21 19:59:40 +0530
  • 3ed14b2cec i know it doesn't look like much, but this kwarg was not used lol :D Andrej Karpathy 2022-03-27 17:48:05 +0100
  • 107b6d7e31 add comment to clarify #39 . Ty @JonathanSum for inspiration PR Andrej Karpathy 2022-03-26 13:52:51 +0000
  • dffb6a14e2 Merge branch 'waynemystir-master' Andrej Karpathy 2022-03-26 13:48:03 +0000
  • 031ad36f29 don't use default kwargs, in my experience lead to bugs always Andrej Karpathy 2022-03-26 13:47:52 +0000
  • 176be2d9bf initialize position embeddings Thomas Viehmann 2021-10-25 14:43:04 +0200
  • fd2977c4e0 Fix broken hugging face link & add link to huggingface / transformers Mishig Davaadorj 2021-12-08 11:37:43 +0100
  • 79b1cfe6e2
    Merge 5a88af8014d0201eb24519cef8253b0aec63111c into 94d880648e4fe5ccb2ff8f4b6cea0e30c8a63cc7 Eduard Mukans 2022-03-26 12:26:44 +0000
  • 94d880648e
    Merge pull request #62 from t-vi/init Andrej 2022-03-26 12:23:31 +0000
  • ea8706d964
    Merge pull request #63 from mishig25/patch-1 Andrej 2022-03-26 12:21:39 +0000
  • 5c53da1609
    Merge cc68da585e93bbb833c1e7d0d380b1ad1de63352 into 4050db60409b5bbaaa3302cee1e49847fc145c65 Aravind Srinivas 2022-03-18 00:17:42 -0700
  • cc68da585e add util for load_checkpoint; and make sure sampling works aravindsrinivas 2022-03-18 07:16:20 +0000
  • 6f1fccdd4b
    Update .gitignore Aravind Srinivas 2022-03-17 16:30:25 -0700
  • d70fc028e6
    remove input.txt Aravind Srinivas 2022-03-17 16:29:29 -0700
  • 0630dbe926 add distributed data parallel trainer aravindsrinivas 2022-03-17 23:14:27 +0000
  • c12d1884fa modify to use pooling instead annasajkh 2022-01-03 03:41:02 +0700
  • bac74347ff
    Fix broken hugging face link & add link to huggingface / transformers Mishig Davaadorj 2021-12-08 11:37:43 +0100
  • 744d41003a initialize position embeddings Thomas Viehmann 2021-10-25 14:43:04 +0200
  • 4a5522f18e Add play_copilot notebook Matt Potma 2021-07-17 14:32:38 -0700
  • 5b208e7fbc Remove play_copilot python script Matt Potma 2021-07-17 14:32:27 -0700
  • e50265911d Add python script for cloud training Matt Potma 2021-07-13 21:20:48 -0700
  • 5a88af8014
    Merge pull request #3 from emukans/feature/play-words Eduard Mukans 2021-04-02 01:14:07 +0300
  • 1a92a7df66 Clean-up Eduards Mukans 2021-04-02 01:12:01 +0300
  • b3adfdedb7
    Merge pull request #2 from emukans/feature/play-words Eduard Mukans 2021-04-02 01:10:38 +0300
  • 4e578f2a7f Clean-up Eduards Mukans 2021-04-02 01:09:49 +0300
  • aee14d6d9b
    Merge pull request #1 from emukans/feature/play-words Eduard Mukans 2021-04-02 01:06:40 +0300
  • 43a6b6eb0c Refactored tokenization and trained a model Eduards Mukans 2021-04-02 01:05:46 +0300
  • af71bff018 Add play_word notebook with BPE tokenizer Eduards Mukans 2021-03-30 18:49:06 +0300
  • cc8eda2120
    update README (#1) Ravi Kalia 2021-02-25 18:30:39 -0500
  • bb8b24234a
    Update README.md Ravi Kalia 2021-02-25 18:29:33 -0500
  • a492a38901 sqrt-d scaling change Aravind 2020-12-30 10:43:19 +0000
  • b4f8d57aaf
    Reverted back to old version because the fix did not work properly Alex 2020-12-27 21:19:16 -0800
  • 67a54091c8
    Trying out the layer norm fix by Scikud Alex 2020-12-27 19:22:36 -0800
  • 8fcaafb367 move instantiation of DataLoader waynemystir 2020-11-20 13:44:49 -0500
  • 99ff4719d7
    Layer norm should be after residual block Scikud 2020-11-12 00:19:58 -0800