minGPT

mirror of https://github.com/karpathy/minGPT synced 2024-11-15 19:10:39 +01:00

Author	SHA1	Message	Date
Andrej	37baab71b9	Update README.md explain why a lot of the action going forward has moved ---> nanoGPT	2023-01-08 08:50:20 -08:00
Andrej	7218bcfa52	Merge pull request #84 from ericjang/master Add setup.py to allow mingpt to be used as a third-party library	2022-08-03 21:35:21 -07:00
Eric	48c815bb16	Add setup.py to allow mingpt to be used as a third-party library	2022-08-03 16:55:11 -07:00
Andrej	cafce4544b	Merge pull request #83 from mishig25/patch-1 Use XOR operator `^` for checking assertion `type_given XOR params_given`	2022-07-28 17:04:56 -07:00
Mishig Davaadorj	90420ee978	Use XOR operator `^` for checking assertion `type_given XOR params_given` Use XOR operator `^` for checking assertion `type_given XOR params_given` in `GPT.__init__`	2022-07-28 22:33:51 +02:00
Andrej	ca74e9a13c	Merge pull request #82 from neverix/patch-1 Fix README.md typo	2022-07-27 11:46:53 -07:00
neverix	e461bf6f00	Fix README.md typo Funnily enough, this error is also in the published paper	2022-07-27 21:10:29 +03:00
Andrej	31559f7dc5	Merge pull request #81 from luigidisotto/callbacks-optimizer Add optimizer to Trainer's self for callbacks.	2022-07-26 09:44:31 -07:00
Luigi Di Sotto	c4c650e3d5	Add optimizer to Trainer's self for callbacks.	2022-07-26 10:17:44 +02:00
Andrej	e2065c59c6	use a bit more extended example that has my last name too because nice to show how it breaks up into more tokens	2022-07-12 04:31:31 +00:00
Andrej	d8dd157f9c	add a full example into the script as well	2022-07-12 04:25:17 +00:00
Andrej	59fea1ba1f	Merge pull request #78 from nat/patch-1 Typos	2022-07-11 21:05:32 -07:00
Nat Friedman	e9f6e3d448	Typos Fixed some small typos.	2022-07-11 20:55:38 -07:00
Andrej	0fc12d703d	adjust the readme docs to reflect bpe changes	2022-07-12 02:14:39 +00:00
Andrej	9642f40b83	add a refactored BPE encoder from openai. Basically I dont super trust the huggingface tokenizer, the implementation sprawls multiple files and inheritance and has special magic handling around AddedTokens that I don't fully follow. Prefer to roll our own explicit implementation here that exactly mirrors the code of OpenAI and nothing else	2022-07-12 02:01:41 +00:00
Andrej	40635a91f4	few added todo notes to readme	2022-07-11 18:59:29 +00:00
Andrej	acaadacd59	refactor sequence generation into the model and match the huggingface/transformers api. touches everything but this makes a lot more sense to me aesthetically	2022-07-11 18:50:53 +00:00
Andrej	5af9e5c5d7	small typo fixes in readme	2022-07-09 00:41:50 +00:00
Andrej	610baf2314	quick docs on some planned todos	2022-07-09 00:27:51 +00:00
Andrej	12f346a63d	make generation script into a notebook, makes much more sense that way i think, and much easier to use	2022-07-08 23:53:14 +00:00
Andrej	b14c99191a	rename the script to make sense	2022-07-08 22:58:49 +00:00
Andrej	803f38800d	refactor pretrained weight loading into from_pretrained and add unit tests	2022-07-08 22:56:15 +00:00
Andrej	4a56b20f80	fix parameter counting	2022-07-08 21:10:54 +00:00
Andrej	28bcbd0ad9	ocd is killing me	2022-07-08 20:35:08 +00:00
Andrej	449a980d39	updates to readme	2022-07-08 20:34:23 +00:00
Andrej	b7c9acc46c	remove legacy notebooks	2022-07-08 20:20:35 +00:00
Andrej	e7fe54898d	refactor readme to match the repo	2022-07-08 20:19:17 +00:00
Andrej Karpathy	7569ab9d7f	simple notebook demo showing how to use minGPT	2022-07-07 18:15:26 -07:00
Andrej	2e979dde5f	ummm eyeroll	2022-07-01 15:34:34 +00:00
Andrej Karpathy	2f3400f42a	split out register_callback to set/add	2022-07-01 08:32:19 -07:00
Andrej Karpathy	d9ea878100	add maxiters to trainer	2022-07-01 08:31:46 -07:00
Andrej Karpathy	1f33571400	changing default seed to the suspect SOTA value as per https://arxiv.org/abs/2109.08203	2022-06-29 10:05:41 -07:00
Andrej	7f6775a671	no	2022-06-27 20:41:51 +00:00
Andrej	00aa9cb2ed	ok i hated the previous global/local config idea. reverting it and simplying and i think this is the best api so far	2022-06-27 20:41:01 +00:00
Andrej	ea20661f78	be more defensive around model_type, don't let the user shoot themselves in the foot	2022-06-27 19:26:26 +00:00
Andrej	1c8842dbe9	small tweaks	2022-06-27 19:20:05 +00:00
Andrej	b483fbe8db	suppress warnings and lightweight docs and changes	2022-06-24 20:48:05 +00:00
Andrej	c6c973738b	implement scaled init per gpt-2 paper	2022-06-24 17:48:20 +00:00
Andrej	7e68832554	delete ugly Conv1D, a real abomination of this Universe	2022-06-24 03:22:27 +00:00
Andrej	a1cad3f37a	oops i forgot to push this file for generation and testing that we load the weights well	2022-06-24 02:52:15 +00:00
Andrej	13a42a6ce0	ok step 1, create a get_pretrained function that inits with openai weights'	2022-06-24 01:43:39 +00:00
Andrej	dfb892044d	big big refactor so that we can load actual gpt2 weights from openai. this is will wip, want to clean it up good	2022-06-23 23:33:44 +00:00
Andrej	3cf811e67c	delegate more stuff to the Trainer class	2022-06-01 17:55:36 +00:00
Andrej	8860486f66	attempt to make model config a little bit better, still hate it	2022-06-01 17:14:22 +00:00
Andrej	2db3a4b7b3	first implementation of chargpt, just pasting it into current api, but i really hate the way the code is set up atm, wip	2022-05-31 23:07:54 +00:00
Andrej	52cb434db2	tiny tweaks to printing and some function apis	2022-05-31 23:07:00 +00:00
Andrej Karpathy	9ec160cd8c	small tweaks. found an issue with my brilliant plan to solve all configuration problems. have to think about more	2022-05-28 15:05:34 -07:00
Andrej	82768a7a95	small tweaks and a bug fix that makes me doubt the current approach with the configs a bit... shop myself in the foot a bit	2022-05-28 03:44:32 +00:00
Andrej	b162d3f44e	fix small bugs and add ability to train/eval on either cpu or gpu	2022-05-28 03:17:24 +00:00
Andrej Karpathy	fa1b46f78a	bit more logging, including saving a model but only if it's the best one yet	2022-05-27 16:06:31 -07:00

1 2

94 Commits