1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-06-24 03:37:44 +02:00
crawler-commons/src/test/java/crawlercommons
Sebastian Nagel 6fb34cf856
Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests (#360)
- port unit tests from https://github.com/google/robotstxt
- adapt "Google-only" unit tests dealing with overlong lines
  and none-standard user-agent names
- adapt unit tests dealing with overlong lines and percent-encoded
  URL paths were the behavior of SimpleRobotRulesParser is not
  wrong and could be even seen as an improvement compared to
  the restrictions put on API input params by the Google robots.txt parser
2023-07-12 15:28:59 +02:00
..
domains Merge pull request #252 from sebastian-nagel/cc-251-domain-max-length-check 2019-10-15 16:24:47 +02:00
filters/basic Add a builder API for configuring the BasicURLNormalizer 2021-10-04 17:24:26 +02:00
mimetypes Upgraded to Junit v5.5 (#250) 2019-07-15 21:29:03 +03:00
robots Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests (#360) 2023-07-12 15:28:59 +02:00
sitemaps [Sitemaps] Disable support for DTDs in sitemaps by default 2022-03-02 16:03:13 +01:00