Andrej
|
37baab71b9
|
Update README.md
explain why a lot of the action going forward has moved ---> nanoGPT
|
2023-01-08 08:50:20 -08:00 |
|
Andrej
|
7218bcfa52
|
Merge pull request #84 from ericjang/master
Add setup.py to allow mingpt to be used as a third-party library
|
2022-08-03 21:35:21 -07:00 |
|
Eric
|
48c815bb16
|
Add setup.py to allow mingpt to be used as a third-party library
|
2022-08-03 16:55:11 -07:00 |
|
Andrej
|
cafce4544b
|
Merge pull request #83 from mishig25/patch-1
Use XOR operator `^` for checking assertion `type_given XOR params_given`
|
2022-07-28 17:04:56 -07:00 |
|
Mishig Davaadorj
|
90420ee978
|
Use XOR operator ^ for checking assertion type_given XOR params_given
Use XOR operator `^` for checking assertion `type_given XOR params_given` in `GPT.__init__`
|
2022-07-28 22:33:51 +02:00 |
|
Andrej
|
ca74e9a13c
|
Merge pull request #82 from neverix/patch-1
Fix README.md typo
|
2022-07-27 11:46:53 -07:00 |
|
neverix
|
e461bf6f00
|
Fix README.md typo
Funnily enough, this error is also in the published paper
|
2022-07-27 21:10:29 +03:00 |
|
Andrej
|
31559f7dc5
|
Merge pull request #81 from luigidisotto/callbacks-optimizer
Add optimizer to Trainer's self for callbacks.
|
2022-07-26 09:44:31 -07:00 |
|
Luigi Di Sotto
|
c4c650e3d5
|
Add optimizer to Trainer's self for callbacks.
|
2022-07-26 10:17:44 +02:00 |
|
Andrej
|
e2065c59c6
|
use a bit more extended example that has my last name too because nice to show how it breaks up into more tokens
|
2022-07-12 04:31:31 +00:00 |
|
Andrej
|
d8dd157f9c
|
add a full example into the script as well
|
2022-07-12 04:25:17 +00:00 |
|
Andrej
|
59fea1ba1f
|
Merge pull request #78 from nat/patch-1
Typos
|
2022-07-11 21:05:32 -07:00 |
|
Nat Friedman
|
e9f6e3d448
|
Typos
Fixed some small typos.
|
2022-07-11 20:55:38 -07:00 |
|
Andrej
|
0fc12d703d
|
adjust the readme docs to reflect bpe changes
|
2022-07-12 02:14:39 +00:00 |
|
Andrej
|
9642f40b83
|
add a refactored BPE encoder from openai. Basically I dont super trust the huggingface tokenizer, the implementation sprawls multiple files and inheritance and has special magic handling around AddedTokens that I don't fully follow. Prefer to roll our own explicit implementation here that exactly mirrors the code of OpenAI and nothing else
|
2022-07-12 02:01:41 +00:00 |
|
Andrej
|
40635a91f4
|
few added todo notes to readme
|
2022-07-11 18:59:29 +00:00 |
|
Andrej
|
acaadacd59
|
refactor sequence generation into the model and match the huggingface/transformers api. touches everything but this makes a lot more sense to me aesthetically
|
2022-07-11 18:50:53 +00:00 |
|
Andrej
|
5af9e5c5d7
|
small typo fixes in readme
|
2022-07-09 00:41:50 +00:00 |
|
Andrej
|
610baf2314
|
quick docs on some planned todos
|
2022-07-09 00:27:51 +00:00 |
|
Andrej
|
12f346a63d
|
make generation script into a notebook, makes much more sense that way i think, and much easier to use
|
2022-07-08 23:53:14 +00:00 |
|
Andrej
|
b14c99191a
|
rename the script to make sense
|
2022-07-08 22:58:49 +00:00 |
|
Andrej
|
803f38800d
|
refactor pretrained weight loading into from_pretrained and add unit tests
|
2022-07-08 22:56:15 +00:00 |
|
Andrej
|
4a56b20f80
|
fix parameter counting
|
2022-07-08 21:10:54 +00:00 |
|
Andrej
|
28bcbd0ad9
|
ocd is killing me
|
2022-07-08 20:35:08 +00:00 |
|
Andrej
|
449a980d39
|
updates to readme
|
2022-07-08 20:34:23 +00:00 |
|
Andrej
|
b7c9acc46c
|
remove legacy notebooks
|
2022-07-08 20:20:35 +00:00 |
|
Andrej
|
e7fe54898d
|
refactor readme to match the repo
|
2022-07-08 20:19:17 +00:00 |
|
Andrej Karpathy
|
7569ab9d7f
|
simple notebook demo showing how to use minGPT
|
2022-07-07 18:15:26 -07:00 |
|
Andrej
|
2e979dde5f
|
ummm eyeroll
|
2022-07-01 15:34:34 +00:00 |
|
Andrej Karpathy
|
2f3400f42a
|
split out register_callback to set/add
|
2022-07-01 08:32:19 -07:00 |
|
Andrej Karpathy
|
d9ea878100
|
add maxiters to trainer
|
2022-07-01 08:31:46 -07:00 |
|
Andrej Karpathy
|
1f33571400
|
changing default seed to the suspect SOTA value as per https://arxiv.org/abs/2109.08203
|
2022-06-29 10:05:41 -07:00 |
|
Andrej
|
7f6775a671
|
no
|
2022-06-27 20:41:51 +00:00 |
|
Andrej
|
00aa9cb2ed
|
ok i hated the previous global/local config idea. reverting it and simplying and i think this is the best api so far
|
2022-06-27 20:41:01 +00:00 |
|
Andrej
|
ea20661f78
|
be more defensive around model_type, don't let the user shoot themselves in the foot
|
2022-06-27 19:26:26 +00:00 |
|
Andrej
|
1c8842dbe9
|
small tweaks
|
2022-06-27 19:20:05 +00:00 |
|
Andrej
|
b483fbe8db
|
suppress warnings and lightweight docs and changes
|
2022-06-24 20:48:05 +00:00 |
|
Andrej
|
c6c973738b
|
implement scaled init per gpt-2 paper
|
2022-06-24 17:48:20 +00:00 |
|
Andrej
|
7e68832554
|
delete ugly Conv1D, a real abomination of this Universe
|
2022-06-24 03:22:27 +00:00 |
|
Andrej
|
a1cad3f37a
|
oops i forgot to push this file for generation and testing that we load the weights well
|
2022-06-24 02:52:15 +00:00 |
|
Andrej
|
13a42a6ce0
|
ok step 1, create a get_pretrained function that inits with openai weights'
|
2022-06-24 01:43:39 +00:00 |
|
Andrej
|
dfb892044d
|
big big refactor so that we can load actual gpt2 weights from openai. this is will wip, want to clean it up good
|
2022-06-23 23:33:44 +00:00 |
|
Andrej
|
3cf811e67c
|
delegate more stuff to the Trainer class
|
2022-06-01 17:55:36 +00:00 |
|
Andrej
|
8860486f66
|
attempt to make model config a little bit better, still hate it
|
2022-06-01 17:14:22 +00:00 |
|
Andrej
|
2db3a4b7b3
|
first implementation of chargpt, just pasting it into current api, but i really hate the way the code is set up atm, wip
|
2022-05-31 23:07:54 +00:00 |
|
Andrej
|
52cb434db2
|
tiny tweaks to printing and some function apis
|
2022-05-31 23:07:00 +00:00 |
|
Andrej Karpathy
|
9ec160cd8c
|
small tweaks. found an issue with my brilliant plan to solve all configuration problems. have to think about more
|
2022-05-28 15:05:34 -07:00 |
|
Andrej
|
82768a7a95
|
small tweaks and a bug fix that makes me doubt the current approach with the configs a bit... shop myself in the foot a bit
|
2022-05-28 03:44:32 +00:00 |
|
Andrej
|
b162d3f44e
|
fix small bugs and add ability to train/eval on either cpu or gpu
|
2022-05-28 03:17:24 +00:00 |
|
Andrej Karpathy
|
fa1b46f78a
|
bit more logging, including saving a model but only if it's the best one yet
|
2022-05-27 16:06:31 -07:00 |
|