go-enry

Author	SHA1	Message	Date
Alex	335b4a64d8	Merge pull request #53 from go-enry/mcuadros-patch-1 ci: update go version	2021-06-18 16:45:29 +02:00
Alex	be3b43a42e	Merge pull request #52 from look/look/update-linguist-again Update generated code for Linguist 7.14.0	2021-06-18 16:45:05 +02:00
Máximo Cuadros	6511190bd8	ci: update go version	2021-05-18 04:00:27 +02:00
Luke Francl	a81924ae12	Update README	2021-04-26 15:41:10 -07:00
Luke Francl	dfb8041dcc	Update generated code for Linguist 7.14.0	2021-04-26 09:36:25 -07:00
Alex	a724bce4a1	Merge pull request #49 from go-enry/bzz-doc-facelift Docs: mention Rust bindings and IsGenerated	2021-04-24 09:01:10 +02:00
Alex	2ddd4985bc	doc: mention Rust bindings and IsGenerated	2021-04-24 08:56:06 +02:00
Alex	7168084e5e	Merge pull request #44 from zeripath/speed-up-is-vendor Make IsVendor quicker	2021-04-24 08:35:22 +02:00
Alex	0a9864e6ec	Merge pull request #46 from look/look/add-language-id Add GetLanguageID function	2021-04-24 08:32:32 +02:00
Andrew Thornton	20726a1de3	Make IsVendor quicker Although iterating across the regexps is quicker than naively concatenating them, it is still quite slow. This PR proposes a slightly cleverer solution. First instead of just concatenating with groups this PR uses non-capturing groups. This speeds up the regexp processing. Secondly we group the regexps in to 3 groups - those that have to be at the start, those that are segments or at the start and the rest. This makes a considerable speed improvement. Thirdly the regexps are sorted within those groups - which also speeds things up. All in all for a non-vendored file this makes IsVendor around twice as fast. Signed-off-by: Andrew Thornton <art27@cantab.net>	2021-04-23 10:18:28 +01:00
Luke Francl	cabfdaffc0	Update GetLanguageID to return a found boolean per code review	2021-04-22 16:55:42 -07:00
6543	d2d4c32d4d	Extend & simplify the test for IsVendor (#45 )	2021-04-22 22:24:27 +02:00
Alex	b60e5c6f5a	Merge pull request #47 from look/look/mimic-linguist-detect Rewrite GetLanguages to work like Linguist.detect	2021-04-22 21:38:22 +02:00
Máximo Cuadros	11cbde8956	Merge pull request #48 from look/look/rm-travis Remove .travis.yml	2021-04-17 01:10:19 +02:00
Luke Francl	ed7a1e67b4	Remove .travis.yml This file doesn't appear to be used any more, since the builds are run using GitHub Actions. This file is affected by the recent Codecov Bash Uploader exploit[1], but since it hasn't been running, I don't think the project is affected. [1] https://about.codecov.io/security-update/	2021-04-15 15:11:39 -07:00
Luke Francl	bf7167fc44	Rewrite GetLanguages to work like Linguist.detect Prior to this change, GetLanguages collected all candidate languages from each strategy to pass to the next strategy (without de-duplicating them). Linguist only uses the previous strategy's candidates for the next strategy. Also, it would overwrite languages with nil if a strategy returned that, so you could get into a situation where you go from multiple languages to no language. See the Ruby code for details: `aad49acc06/lib/linguist.rb (L14-L49)` This addresses https://github.com/src-d/enry/issues/207 because GetLanguages should not return all candidates detected, otherwise it would work differently than Linguist.	2021-04-13 12:04:47 -07:00
Luke Francl	eb043e80a8	Add GetLanguageID function The Linguist-defined language IDs are important to our use case because they are used as database identifiers. This adds a new generator to extract the language IDs into a map and uses that to implement GetLanguageID. Because one language has the ID 0, there is no way to tell if a language name is found or not. If desired, we could add this by returning (string, bool) from GetLanguageID. But none of the other functions that take language names do this, so I didn't want to introduce it here.	2021-04-13 11:49:21 -07:00
Alex	7f5d84ad74	Merge pull request #43 from lafriks-fork/feat/v7.13.0 Sync with Liguist v7.13.0	2021-03-12 08:02:57 +01:00
Lauris BH	323d739170	Fix test	2021-03-07 18:34:08 +02:00
Lauris BH	c40b34c351	Sync with Liguist v7.13.0	2021-03-07 18:02:04 +02:00
Alexander	1ad7deb89e	Merge pull request #42 from lafriks-fork/feat/sync_v7.12.2 Sync with github/linguist version v7.12.2	2021-03-06 15:35:46 +01:00
Lauris BH	497e2f85d3	Sync with github/linguist version v7.12.2	2021-01-17 14:10:38 +02:00
Alexander	3faf9450da	Merge pull request #40 from lafriks-fork/feat/strategy_xml Add XML strategy	2020-12-02 00:10:52 +01:00
Lauris BH	0596fda1a4	Fix strategy order	2020-11-26 13:56:25 +02:00
Alexander	6edbff3dec	Merge pull request #38 from softagram/fix-readme-cmd Fix typo in the pip command in README.md	2020-11-26 12:46:28 +01:00
Alexander	8de21f365e	Merge pull request #39 from lafriks-fork/feat/sync_7.12.1 Sync with linguist 7.12.1	2020-11-26 12:38:46 +01:00
Lauris BH	8ac98f4b77	Update readme	2020-11-15 15:48:03 +02:00
Lauris BH	6d8f15af5b	Add XML strategy	2020-11-15 15:43:37 +02:00
Lauris BH	289ac3d9f0	Sync with linguist 7.12.1	2020-11-15 14:32:56 +02:00
Ville Laitila	8d83871580	Fix typo in the pip command in README.md	2020-11-14 23:29:53 +02:00
Alexander	0fb4b8a768	Merge pull request #35 from lafriks-fork/feat/manpage_strategy Add support for Roff man pages filenames	2020-10-22 00:10:39 +02:00
Alexander	7688057adc	Merge pull request #37 from lafriks-fork/sync_7_11_1 sync to the latest github/linguist v7.11.1	2020-10-22 00:08:04 +02:00
Lauris BH	bc76dd38b0	sync to the latest github/linguist v7.11.1	2020-10-12 12:32:48 +03:00
Lauris BH	cb353b4b05	Add support for Roff man pages filenames	2020-10-12 12:18:57 +03:00
Alexander	d7f6b27b7d	Merge pull request #34 from lafriks-fork/sync_7_11 sync to the latest github/linguist v7.11.0	2020-09-24 12:24:42 +02:00
Lauris BH	7c562a6c34	sync to the latest github/linguist v7.11.0	2020-09-17 10:34:41 +03:00
Alexander	5717abd4c0	Merge pull request #30 from bzz/python-ci CI for Python bindings	2020-08-17 11:56:30 +02:00
Alexander Bezzubov	e98983b3f9	ci: add Python tests profile (\wo gopy) Signed-off-by: Alexander Bezzubov <alexander.bezzubov@jetbrains.com>	2020-08-12 15:23:01 +02:00
Alexander Bezzubov	328c16f948	py: use readme as pypy description Signed-off-by: Alexander Bezzubov <alexander.bezzubov@jetbrains.com>	2020-08-12 15:22:55 +02:00
Alexander Bezzubov	7ee65cc9d0	doc: upd build instructions Signed-off-by: Alexander Bezzubov <alexander.bezzubov@jetbrains.com>	2020-08-12 15:22:50 +02:00
Alexander	5d58b1aaaf	Merge pull request #29 from vsmaxim/master python: cover the rest of python bindings from shared library, add tests, add docstrings for API	2020-08-12 14:35:58 +02:00
Maxim Vasilev	59f0f17834	Remove unneded todos	2020-08-11 00:29:33 +03:00
Maxim Vasilev	08bc9bca0e	Cover the rest of python bindings from shared library, add tests, add docstrings, add setup.py.	2020-08-11 00:12:43 +03:00
Máximo Cuadros	dc6fc02209	Merge pull request #24 from erizocosmico/fix/bail-out-if-not-enough-lines data: bailout in some cases if there arent enough lines	2020-05-28 16:45:10 +02:00
Miguel Molina	78696c2272	data: bailout in some cases if there arent enough lines Signed-off-by: Miguel Molina <miguel@erizocosmi.co>	2020-05-28 13:39:59 +02:00
Máximo Cuadros	2880ccae4a	Merge pull request #23 from erizocosmico/fix/get-first-line data: fix getting the first line for empty content	2020-05-28 11:52:49 +02:00
Miguel Molina	79398a925d	data: fix getting the first line for empty content Signed-off-by: Miguel Molina <miguel@erizocosmi.co>	2020-05-28 11:28:49 +02:00
Máximo Cuadros	e1f1b57a84	Merge pull request #22 from erizocosmico/feature/generated implement IsGenerated helper to filter out generated files	2020-05-28 10:34:37 +02:00
Miguel Molina	8ff885a3a8	implement IsGenerated helper to filter out generated files Closes #17 Implements the IsGenerated helper function to filter out generated files using the rules and matchers in: - https://github.com/github/linguist/blob/master/lib/linguist/generated.rb Since the vast majority of matchers have very different logic, it cannot be autogenerated directly from linguist like other logics in enry, so it's translated by hand. There are three different types of matchers in this implementation: - By extension, which mark as generated based only in the extension. These are the fastest matchers, so they're done first. - By file name, which matches patterns against the filename. These are performed in second place. Unlike linguist, we try to use string functions instead of regexps as much as possible. - Finally, the rest of the matchers, which go into the content and try to identify if they're generated or not based on the content. Unlike linguist, we try to only read the content we need and not split it all unless it's necessary and use byte functions instead of regexps as much as possible. Signed-off-by: Miguel Molina <miguel@erizocosmi.co>	2020-05-28 08:55:13 +02:00
Máximo Cuadros	bda45fdc8e	go.mod: update go-oniguruma v1.2.1	2020-05-06 21:42:07 +02:00

1 2 3 4 5 ...

570 Commits