Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up enry.IsVendor #15213

Merged
merged 6 commits into from
Apr 1, 2021
Merged

Speed up enry.IsVendor #15213

merged 6 commits into from
Apr 1, 2021

Conversation

zeripath
Copy link
Contributor

enry.IsVendor is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related #15143

Signed-off-by: Andrew Thornton art27@cantab.net

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath added the performance/speed performance issues with slow downs label Mar 30, 2021
@zeripath zeripath added this to the 1.15.0 milestone Mar 30, 2021
@zeripath
Copy link
Contributor Author

This currently is about twice as fast as enry.IsVendor(...) on a non-vendored file

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 30, 2021
@silverwind
Copy link
Member

Seems like a good idea. Linguist does similar, but they simply | the individual regex parts. Is this not an option here? Also, should upstream this.

@zeripath
Copy link
Contributor Author

  • Simply | the individual parts leads to a 4 times slow down.
  • Sorting the regexps and | is 3 times slower.
  • making the groups non capturing makes it twice as slow
  • Placing all of the ^ first and making it common to the ones that use it is around equal
  • Following those by (^|/) and with the common prefix extracted is 2 times faster.

This iterative process should suggest that some further simplifications should speed things up further. However if we drop full regexp compatibility we should be able to make it faster with a path trie. (I recently came across a library that looked reasonable for this - but its name escapes me.)

@zeripath
Copy link
Contributor Author

My suspicion is that go's regexp simplifier is just not as good as ruby's or even that GitHub is using something like hyperscan to make it faster.

It's really worth noting that twice as fast here is still not really enough - it really would be better to be of the order of at least 10 times as fast.

Now in the associated issue the profile indicated more than a few other places that needed speed ups - however I suspect that some of these may be already on their way to being improved. (For example I think since the chi migration we shouldn't be listing branches or tags to do a http push anymore and we don't do a hash to assert if a password is set - we do however test password authentication twice during pushing!!)

@silverwind
Copy link
Member

If enry.isVendor is a pure function (e.g. not reading files like .gitattributes), memoization could be another big gain if it's being called repeatedly on the same paths.

@zeripath
Copy link
Contributor Author

I don't think we can assume that but I guess there are likely to be some common paths that could be captured. It's possible though that a trie can be just as fast as a cache.

Certainly though there are lots of places we could be memoizing by throwing stuff in to caches or precomputing - in particular commit verification.

@lafriks
Copy link
Member

lafriks commented Mar 30, 2021

Maybe this could be contributed upstream?

Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 31, 2021
@silverwind
Copy link
Member

Should definitely upstream this, including the tests which already seem more extensive than enry's own. Thought I guess as a short-term solution it's fine.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Apr 1, 2021
@6543
Copy link
Member

6543 commented Apr 1, 2021

🚀

@6543 6543 merged commit ff460ca into go-gitea:master Apr 1, 2021
@zeripath zeripath deleted the speed-up-is-vendor branch April 1, 2021 18:19
zeripath added a commit to zeripath/gitea that referenced this pull request Apr 1, 2021
Backport go-gitea#15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
zeripath added a commit to zeripath/gitea that referenced this pull request Apr 1, 2021
Backport go-gitea#15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
6543 pushed a commit that referenced this pull request Apr 1, 2021
Backport #15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related #15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
6543 pushed a commit that referenced this pull request Apr 1, 2021
Backport #15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related #15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
@6543 6543 added the backport/done All backports for this PR have been created label Apr 7, 2021
@go-gitea go-gitea locked and limited conversation to collaborators May 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. performance/speed performance issues with slow downs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants