1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-03 22:26:15 +02:00

Commit Graph

  • 38c7cc46ce
    Merge be694bf5f8 into 258a499330 dependabot[bot] 2024-01-15 20:55:32 +0000
  • be694bf5f8
    Bump org.apache.maven.plugins:maven-surefire-plugin from 3.2.2 to 3.2.5 dependabot/maven/org.apache.maven.plugins-maven-surefire-plugin-3.2.5 dependabot[bot] 2024-01-15 20:55:30 +0000
  • c1c5c446df
    Bump org.apache.maven.plugins:maven-surefire-plugin from 3.2.2 to 3.2.3 dependabot[bot] 2023-12-18 20:43:28 +0000
  • a7fe96bf61
    Merge b1302b9f53 into 258a499330 dependabot[bot] 2023-12-04 20:26:21 +0000
  • b1302b9f53
    Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.6.0 to 3.6.3 dependabot/maven/org.apache.maven.plugins-maven-javadoc-plugin-3.6.3 dependabot[bot] 2023-12-04 20:26:18 +0000
  • 43626c08fb
    Merge 41ed1f3df6 into 258a499330 dependabot[bot] 2023-12-04 20:26:14 +0000
  • 41ed1f3df6
    Bump commons-io:commons-io from 2.15.0 to 2.15.1 dependabot/maven/commons-io-commons-io-2.15.1 dependabot[bot] 2023-12-04 20:26:12 +0000
  • c0ff1bc5f2
    Merge a9003ce8bb into 258a499330 Alex Karezin 2023-11-23 02:38:43 -0700
  • fed8a6dd13
    Merge ef34c50599 into 258a499330 dependabot[bot] 2023-11-14 09:34:39 +0000
  • 222293b948
    Merge e7032c58cd into 258a499330 dependabot[bot] 2023-11-14 09:34:39 +0000
  • 258a499330 Bump org.apache.maven.plugins:maven-surefire-plugin from 3.1.2 to 3.2.2 master dependabot[bot] 2023-11-13 20:12:34 +0000
  • 9c33970a12
    Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.6.0 to 3.6.2 dependabot[bot] 2023-11-13 20:12:38 +0000
  • 93bd24b9cb
    Bump org.apache.maven.plugins:maven-surefire-plugin from 3.1.2 to 3.2.2 dependabot[bot] 2023-11-13 20:12:34 +0000
  • e7032c58cd
    Bump junit.version from 5.9.3 to 5.10.1 dependabot/maven/junit.version-5.10.1 dependabot[bot] 2023-11-06 20:38:30 +0000
  • 21b2dcfbf8
    Bump org.apache.maven.plugins:maven-surefire-plugin from 3.1.2 to 3.2.1 dependabot[bot] 2023-10-30 20:32:57 +0000
  • 27432bbde0
    Bump com.googlecode.maven-download-plugin:download-maven-plugin dependabot[bot] 2023-10-30 20:32:49 +0000
  • ef34c50599
    Bump org.jacoco:jacoco-maven-plugin from 0.8.10 to 0.8.11 dependabot/maven/org.jacoco-jacoco-maven-plugin-0.8.11 dependabot[bot] 2023-10-30 20:32:38 +0000
  • ccb218a86a Update CHANGES.txt to include recent fixes and upgrades Sebastian Nagel 2023-10-29 10:49:21 +0100
  • 03b5543451 Bump de.thetaphi:forbiddenapis from 3.5.1 to 3.6 dependabot[bot] 2023-10-02 20:31:52 +0000
  • 54c65dae65 Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.5.0 to 3.6.0 dependabot[bot] 2023-09-18 20:02:10 +0000
  • 2c64f79773 Bump commons-io:commons-io from 2.13.0 to 2.15.0 dependabot[bot] 2023-10-29 09:39:39 +0000
  • bfa80f4fef
    Bump commons-io:commons-io from 2.13.0 to 2.15.0 dependabot[bot] 2023-10-29 09:39:39 +0000
  • e7421e9785 Bump org.slf4j:slf4j-api from 2.0.7 to 2.0.9 dependabot[bot] 2023-09-04 20:20:02 +0000
  • 54576e810d
    [Sitemaps] Google Sitemap PageMap extensions, implements #388 (#442) Sebastian Nagel 2023-10-28 17:09:45 +0200
  • 2a4e32c05a [Sitemaps] Google Sitemap PageMap extensions, implements #388 Sebastian Nagel 2023-06-09 14:32:55 +0200
  • ed1cebeff7 [Domains] Installation of a gzip-compressed public suffix list from cache breaks EffectiveTldFinder, fixes #441 - downgrade Maven download plugin (1.7.1 -> 1.6.8) Sebastian Nagel 2023-10-27 05:45:06 +0200
  • b843ff812e [Domains] Installation of a gzip-compressed public suffix list from cache breaks EffectiveTldFinder, fixes #441 - downgrade Maven download plugin (1.7.1 -> 1.6.8) Sebastian Nagel 2023-10-27 05:45:06 +0200
  • 72c2fe8ad4
    Bump commons-io:commons-io from 2.13.0 to 2.14.0 dependabot[bot] 2023-10-02 20:31:54 +0000
  • bb8ef3834e
    Bump de.thetaphi:forbiddenapis from 3.5.1 to 3.6 dependabot[bot] 2023-10-02 20:31:52 +0000
  • 8392c5e15e
    Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.5.0 to 3.6.0 dependabot[bot] 2023-09-18 20:02:10 +0000
  • 2d177c3bfd
    Bump org.slf4j:slf4j-api from 2.0.7 to 2.0.9 dependabot[bot] 2023-09-04 20:20:02 +0000
  • 4192e3fab7
    Fix typo in README.md Ken Krugler 2023-08-30 12:59:02 -0700
  • a9003ce8bb
    Update README.md - add a link to repository map Alex Karezin 2023-08-01 11:19:08 -0400
  • 4b1117943b Bump com.googlecode.maven-download-plugin:download-maven-plugin dependabot[bot] 2023-07-24 20:11:19 +0000
  • ed0d9ac884
    Bump junit.version from 5.9.3 to 5.10.0 dependabot[bot] 2023-07-24 20:11:24 +0000
  • 55e2d495c0
    Bump com.googlecode.maven-download-plugin:download-maven-plugin dependabot[bot] 2023-07-24 20:11:19 +0000
  • 69c4f606f7 Release 1.4 - fix release data in news section - add note that user-agent product tokens must lower-case Sebastian Nagel 2023-07-18 15:03:51 +0200
  • a3ff95502f Release 1.4 - update news section - add 1.4 Javadocs to README Sebastian Nagel 2023-07-18 13:44:15 +0200
  • 80f287ecfd Javadoc 1.4 gh-pages Sebastian Nagel 2023-07-18 12:56:54 +0200
  • 0e1758fcee Update CHANGES.txt for next development iteration (1.5-SNAPSHOT) Sebastian Nagel 2023-07-13 11:25:00 +0200
  • 3e958801f6 [maven-release-plugin] prepare for next development iteration Sebastian Nagel 2023-07-13 10:30:12 +0200
  • ce9cf46020 Update CHANGES.txt for release of crawler-commons 1.4 crawler-commons-1.4 Sebastian Nagel 2023-07-13 11:28:48 +0200
  • 2b8717d9e5 [maven-release-plugin] prepare release crawler-commons-1.4 Sebastian Nagel 2023-07-13 10:30:08 +0200
  • a62bd80140 Updates changelog for #376, #380, #401, #414, #425, #428, #422/#424, #114/#390/#430, #245/#360 Sebastian Nagel 2023-07-12 16:16:30 +0200
  • 6fb34cf856
    Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests (#360) Sebastian Nagel 2023-07-12 15:28:59 +0200
  • 83454bd8b1 Apply code formatting template Sebastian Nagel 2023-07-12 10:38:51 +0200
  • 47e294acb3 Implement Robots Exclusion Protocol (REP) IETF RFC 9309 - avoid locale-sensitive methods Sebastian Nagel 2023-07-03 21:20:09 +0200
  • d5a41154b6 Implement Robots Exclusion Protocol (REP) IETF RFC 9309 - port unit tests from https://github.com/google/robotstxt - adapt unit tests dealing with overlong lines and percent-encoded URL paths were the behavior of SimpleRobotRulesParser is not wrong and maybe even seen as an improvement compared to restrictions put on API input params by the Google robots.txt parser Sebastian Nagel 2023-05-11 18:04:22 +0200
  • cae3908680 Implement Robots Exclusion Protocol (REP) IETF RFC 9309 - port unit tests from https://github.com/google/robotstxt - adapt "Google-only" unit tests dealing with overlong lines and none-standard user-agent names Sebastian Nagel 2023-04-22 20:39:26 +0200
  • c45c9cc788 Implement Robots Exclusion Protocol (REP) IETF RFC 9309 - port unit tests from https://github.com/google/robotstxt Sebastian Nagel 2022-02-06 19:32:41 +0100
  • 871e4e61d2
    Merge pull request #430 from sebastian-nagel/cc-390-114-robots-closing-rule-group Sebastian Nagel 2023-07-12 10:35:48 +0200
  • d685bafb2d
    [Robots.txt] SimpleRobotRulesParser main() to follow five redirects (#428) Sebastian Nagel 2023-07-11 15:49:00 +0200
  • de7221dafc
    [Robots.txt] Empty disallow statement not to clear other rules, fixes #422 (#424) Sebastian Nagel 2023-07-11 15:47:33 +0200
  • 7ae8617563
    [Robots.txt] Add more spelling variants and typos of robots.txt directives (#425) Sebastian Nagel 2023-07-11 15:46:07 +0200
  • e67299432c [Robots.txt] Clarify behavior when to close blocks of multiple user-agents - must keep state whether Crawl-delay is already set for a specific agent as separate variable - add unit test to ensure that no already set Crawl-delay is overridden by a (lower) value of another agent Sebastian Nagel 2023-07-10 15:18:23 +0200
  • 17e8544980 [Robots.txt] Clarify behavior when to close blocks of multiple user-agents - fix unit test broken by introducing compliance with RFC 9309 Sebastian Nagel 2023-06-16 16:35:21 +0200
  • 4524cfb5c0 [Robots.txt] Clarify behavior when to close blocks of multiple user-agents, closes #390 [Robots.txt] Handle robots.txt with missing sections (and implicit master rules), fixes #114 - do not close rule blocks / groups on other directives than specified in RFC 9309: groups are only closed on a user-agent line at least one allow/disallow line was read before - set Crawl-delay independently from grouping, but never override or set the value for a specific agent using a value defined for the wildcard agent Sebastian Nagel 2023-06-13 12:31:07 +0200
  • d710c85871 BaseRobotRules: Document that Crawl-delay is stored in milliseconds Sebastian Nagel 2023-06-16 16:04:38 +0200
  • a3900425f3 [Robots.txt] Handle robots.txt with missing sections (and implicit master rules) - add unit test to verify solution of #114 Sebastian Nagel 2023-07-10 12:59:15 +0200
  • 86109c029a Updates changelog for #423/#426, #427, #429 Sebastian Nagel 2023-07-10 10:23:20 +0200
  • 54498a0e5a [Robots.txt] Rename default user-agent / robot name in unit tests - replace occurrences of the user-agent name supposed to match the wildcard user-agent rule group by "anybot" Sebastian Nagel 2023-06-16 13:54:55 +0200
  • 9412dff606 [Robots.txt] SimpleRobotRulesParser main() to follow five redirects when fetching robots.txt over HTTP as required by RFC 9309 Sebastian Nagel 2023-06-15 11:55:48 +0200
  • 99289f7835 [Robots.txt] Pass empty collection of agent names to select rules for any robot (wildcard user-agent name) - in SimpleRobotRulesParser main() - add unit test to verify that wildcard user-agent rules are selected if empty collection of agent names is passed Sebastian Nagel 2023-06-15 11:17:50 +0200
  • a5bd9645fa [Robots.txt] Update Javadoc to document changes in Robots.txt classes related to RFC 9309 compliance - document effect of rules merging in combination with multiple agent names, fixes #423 - document that rules addressed to the wildcard agent are followed if none of the passed agent names matches - without any need to pass the wildcard agent name as one of the agent names - complete documentation - use @inheritDoc to avoid duplicated documentation - strip doc strings where inherited automatically by @Override annotations Sebastian Nagel 2023-06-14 16:52:20 +0200