1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-03 22:26:15 +02:00
Commit Graph

622 Commits

Author SHA1 Message Date
Sebastian Nagel 8bb1694669 Updates changelog for #192, #362, #383, #389 and merged dependabot pull requests 2023-05-11 16:52:23 +02:00
dependabot[bot] 1eefc10ce1 Bump maven-surefire-plugin from 3.0.0 to 3.1.0
Bumps [maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.0.0 to 3.1.0.
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.0.0...surefire-3.1.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-11 16:20:45 +02:00
dependabot[bot] 5c317d3c23 Bump maven-gpg-plugin from 3.0.1 to 3.1.0
Bumps [maven-gpg-plugin](https://github.com/apache/maven-gpg-plugin) from 3.0.1 to 3.1.0.
- [Commits](https://github.com/apache/maven-gpg-plugin/compare/maven-gpg-plugin-3.0.1...maven-gpg-plugin-3.1.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-gpg-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-11 16:20:16 +02:00
Sebastian Nagel 79bef97d40
Merge pull request #401 from sebastian-nagel/cc-389-allow-disallow-unicode-paths
[Robots.txt] Handle allow/disallow directives containing unescaped Unicode characters
2023-05-11 16:19:23 +02:00
dependabot[bot] e691cec4cf Bump junit.version from 5.9.2 to 5.9.3
Bumps `junit.version` from 5.9.2 to 5.9.3.

Updates `junit-jupiter-engine` from 5.9.2 to 5.9.3
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.2...r5.9.3)

Updates `junit-jupiter-params` from 5.9.2 to 5.9.3
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.2...r5.9.3)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-02 08:08:32 +02:00
dependabot[bot] 764ef96ea1 Bump download-maven-plugin from 1.6.8 to 1.7.0
Bumps [download-maven-plugin](https://github.com/maven-download-plugin/maven-download-plugin) from 1.6.8 to 1.7.0.
- [Release notes](https://github.com/maven-download-plugin/maven-download-plugin/releases)
- [Commits](https://github.com/maven-download-plugin/maven-download-plugin/compare/1.6.8...1.7.0)

---
updated-dependencies:
- dependency-name: com.googlecode.maven-download-plugin:download-maven-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-02 08:08:26 +02:00
dependabot[bot] 1291f5cddb Bump forbiddenapis from 3.4 to 3.5.1
Bumps [forbiddenapis](https://github.com/policeman-tools/forbidden-apis) from 3.4 to 3.5.1.
- [Release notes](https://github.com/policeman-tools/forbidden-apis/releases)
- [Commits](https://github.com/policeman-tools/forbidden-apis/compare/3.4...3.5.1)

---
updated-dependencies:
- dependency-name: de.thetaphi:forbiddenapis
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-25 09:51:52 +02:00
dependabot[bot] a980ae10da Bump maven-surefire-plugin from 2.22.2 to 3.0.0
Bumps [maven-surefire-plugin](https://github.com/apache/maven-surefire) from 2.22.2 to 3.0.0.
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-2.22.2...surefire-3.0.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-25 09:51:42 +02:00
Sebastian Nagel a395cfee73 Add link to RFC 9309 to Javadoc class description 2023-04-24 17:36:08 +02:00
Sebastian Nagel be2d5c24d3 Fix line wrapping in comments 2023-04-24 17:27:16 +02:00
Sebastian Nagel 2c2cb3bf7a [Robots.txt] Handle allow/disallow directives containing unescaped
Unicode characters, fixes #389
- use UTF-8 as default input encoding of robots.txt files
- add unit test
  - test matching of Unicode paths in allow/disallow directives
  - test for proper matching of ASCII paths if encoding is not
    UTF-8 (and no byte order mark present)
2023-04-24 17:27:16 +02:00
Sebastian Nagel d8a6126365
[Robots.txt] RFC compliance: matching user-agent names when selecting rule blocks (#362)
* RFC compliance: matching user-agent names when selecting rule blocks
- add unit test to verify that the rule with the completely
  matched user-agent name is selected, and no partial prefix match
  is preferred (cf. also #192)

* RFC compliance: matching user-agent names when selecting rule blocks

- refactor agent name matching and move splitting robotNames string
  at comma into a separate method to be called once at the beginning
  of parsing the robots.txt file

- extend the robots parser API and add a method to pass agent names
  as a collection following the RFC 9309 with no splitting of the
  names into words/tokens.

- deprecate "old" method which splits the robot name into tokens and
  performs prefix matching

- by default user agent names are matched literally but case-insensitive
  following RFC 9309. Add method to "restore" the prefix matching:
  "setExactUserAgentMatching(false)"

- BaseRobotRulesParser: move the documented details about how
  user-agent names are matched into SimpleRobotRulesParser

- unit tests: add tests for issues described in #192, configure exact
  user-agent matching if required

* RFC compliance: matching user-agent names when selecting rule blocks
- match user-agent product token at beginning of user-agent
  line/statement followed by ignored non-token characters,
  e.g. "foo" is matched in "User-agent: foo/1.2"

* RFC compliance: matching user-agent names when selecting rule blocks
- match user-agent product tokens followed by ignored characters
  also in legacy prefix matching mode, e.g. match "butterfly" in
  "User-agent: Butterfly/1.0"
- refactor prefix matching: switch inner and outer loop, handle
  check for (common) wild-card user-agent outside of loop

* RFC compliance: matching user-agent names when selecting rule blocks
- make exact user-agent matching the default in unit tests,
  explicitly pass flag for legacy prefix user-agent matching
  in unit tests where needed
  - names not following the ua pattern in the specificiation "[a-zA-Z_-]+"
  - user-agent lines with multiple user-agent names

* RFC compliance: matching user-agent names when selecting rule blocks
- make the method to handle prefix/partial user-agent product token
  matches protected, so that it can be overridden to match non-standard
  user-agent product tokens, e.g. "Go!zilla"
2023-04-24 17:24:59 +02:00
dependabot[bot] f2982c5d11 Bump maven-deploy-plugin from 3.0.0 to 3.1.1
Bumps [maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin) from 3.0.0 to 3.1.1.
- [Release notes](https://github.com/apache/maven-deploy-plugin/releases)
- [Commits](https://github.com/apache/maven-deploy-plugin/compare/maven-deploy-plugin-3.0.0...maven-deploy-plugin-3.1.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-deploy-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:07:56 +02:00
dependabot[bot] a0bf1c9167 Bump slf4j-api from 1.7.36 to 2.0.7
Bumps [slf4j-api](https://github.com/qos-ch/slf4j) from 1.7.36 to 2.0.7.
- [Release notes](https://github.com/qos-ch/slf4j/releases)
- [Commits](https://github.com/qos-ch/slf4j/commits)

---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-api
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:07:39 +02:00
dependabot[bot] 8da99dc711 Bump maven-release-plugin from 2.5.3 to 3.0.0
Bumps [maven-release-plugin](https://github.com/apache/maven-release) from 2.5.3 to 3.0.0.
- [Release notes](https://github.com/apache/maven-release/releases)
- [Commits](https://github.com/apache/maven-release/compare/maven-release-2.5.3...maven-release-3.0.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-release-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:06:48 +02:00
dependabot[bot] 0ac4df1ef1 Bump maven-compiler-plugin from 3.10.1 to 3.11.0
Bumps [maven-compiler-plugin](https://github.com/apache/maven-compiler-plugin) from 3.10.1 to 3.11.0.
- [Release notes](https://github.com/apache/maven-compiler-plugin/releases)
- [Commits](https://github.com/apache/maven-compiler-plugin/compare/maven-compiler-plugin-3.10.1...maven-compiler-plugin-3.11.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-compiler-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:04:49 +02:00
dependabot[bot] 62828f8d35 Bump maven-javadoc-plugin from 3.4.1 to 3.5.0
Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.4.1 to 3.5.0.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases)
- [Commits](https://github.com/apache/maven-javadoc-plugin/compare/maven-javadoc-plugin-3.4.1...maven-javadoc-plugin-3.5.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:04:23 +02:00
dependabot[bot] 6b674b1894 Bump junit.version from 5.9.1 to 5.9.2
Bumps `junit.version` from 5.9.1 to 5.9.2.

Updates `junit-jupiter-engine` from 5.9.1 to 5.9.2
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.1...r5.9.2)

Updates `junit-jupiter-params` from 5.9.1 to 5.9.2
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.1...r5.9.2)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-19 12:06:17 +01:00
Sebastian Nagel cec5716a4c
Upgrade the project to use Java 11 (#376) 2023-01-19 11:55:58 +01:00
dependabot[bot] fce53a0933 Bump maven-jar-plugin from 3.2.2 to 3.3.0
Bumps [maven-jar-plugin](https://github.com/apache/maven-jar-plugin) from 3.2.2 to 3.3.0.
- [Release notes](https://github.com/apache/maven-jar-plugin/releases)
- [Commits](https://github.com/apache/maven-jar-plugin/compare/maven-jar-plugin-3.2.2...maven-jar-plugin-3.3.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-jar-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:59:15 +02:00
dependabot[bot] 838d3d1e26 Bump maven-javadoc-plugin from 3.4.0 to 3.4.1
Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.4.0 to 3.4.1.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases)
- [Commits](https://github.com/apache/maven-javadoc-plugin/compare/maven-javadoc-plugin-3.4.0...maven-javadoc-plugin-3.4.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:59:04 +02:00
dependabot[bot] c78bf500f7 Bump junit.version from 5.9.0 to 5.9.1
Bumps `junit.version` from 5.9.0 to 5.9.1.

Updates `junit-jupiter-engine` from 5.9.0 to 5.9.1
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.0...r5.9.1)

Updates `junit-jupiter-params` from 5.9.0 to 5.9.1
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.0...r5.9.1)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:58:00 +02:00
dependabot[bot] 546ea4d633 Bump forbiddenapis from 3.3 to 3.4
Bumps [forbiddenapis](https://github.com/policeman-tools/forbidden-apis) from 3.3 to 3.4.
- [Release notes](https://github.com/policeman-tools/forbidden-apis/releases)
- [Commits](https://github.com/policeman-tools/forbidden-apis/compare/3.3...3.4)

---
updated-dependencies:
- dependency-name: de.thetaphi:forbiddenapis
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:57:45 +02:00
Sebastian Nagel 3368cb53ef
Improve readability of robots.txt unit tests (#383)
- put lines of embedded robots.txt test files
  into separate code lines (except for empty lines)
- apply code formatting template
2022-10-06 13:26:13 +02:00
Sebastian Nagel 09bc9c064c Updates changelog for #351 2022-08-11 14:12:20 +02:00
Eduardo Jimenez 4ad101cf0d Ran java formatter 2022-08-11 14:08:36 +02:00
Eduardo Jimenez 1f0e79b72a Improve robots check draft rfc compliance 2022-08-11 14:08:36 +02:00
Sebastian Nagel d3ccb553df Updates changelog for #378/#380, #377, #379 2022-08-10 10:17:01 +02:00
Sebastian Nagel 0a5a7ff217
Merge pull request #379 from crawler-commons/dependabot/maven/junit.version-5.9.0
Bump junit.version from 5.8.2 to 5.9.0
2022-08-10 10:13:34 +02:00
Sebastian Nagel 5b63dce5c8
Merge pull request #377 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-deploy-plugin-3.0.0
Bump maven-deploy-plugin from 2.8.2 to 3.0.0
2022-08-10 10:13:24 +02:00
Sebastian Nagel 08629d53ff
Merge pull request #380 from sebastian-nagel/cc-378-javadoc-search
Javadoc: ensure Javascript search is working, fixes #378
2022-08-10 10:11:41 +02:00
Sebastian Nagel 9253f676b8 Javadoc: ensure Javascript search is working, fixes #378
- (only for JDK / Java 11) pass option --no-module-directories
2022-08-08 15:53:54 +02:00
dependabot[bot] 17b88275f7
Bump junit.version from 5.8.2 to 5.9.0
Bumps `junit.version` from 5.8.2 to 5.9.0.

Updates `junit-jupiter-engine` from 5.8.2 to 5.9.0
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.8.2...r5.9.0)

Updates `junit-jupiter-params` from 5.8.2 to 5.9.0
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.8.2...r5.9.0)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-minor
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-01 20:16:08 +00:00
Sebastian Nagel 39d0f5fd97 Release 1.3
- README: add 1.3 to the News
2022-07-28 13:25:30 +02:00
Sebastian Nagel 1e859ccb16 Release 1.3
- add 1.3 Javadocs to README
- update previous version Javadoc links to use https://
- prepare CHANGES.txt for next development iteration
2022-07-28 12:12:38 +02:00
dependabot[bot] 037d58f867
Bump maven-deploy-plugin from 2.8.2 to 3.0.0
Bumps [maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin) from 2.8.2 to 3.0.0.
- [Release notes](https://github.com/apache/maven-deploy-plugin/releases)
- [Commits](https://github.com/apache/maven-deploy-plugin/compare/maven-deploy-plugin-2.8.2...maven-deploy-plugin-3.0.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-deploy-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-25 20:29:06 +00:00
Sebastian Nagel 89ce5d216e [maven-release-plugin] prepare for next development iteration 2022-07-19 09:51:21 +02:00
Sebastian Nagel e2e7898369 [maven-release-plugin] prepare release crawler-commons-1.3 2022-07-19 09:51:18 +02:00
Sebastian Nagel 39ab109dca Prepare release of crawler-commons-1.3
- update CHANGES.txt
2022-07-19 09:19:56 +02:00
Sebastian Nagel 527e4cf229 Update change log: add #354, #361, #373, #374 2022-07-15 08:59:13 +02:00
Sebastian Nagel 9d558f6532
Merge pull request #361 from crawler-commons/dependabot/maven/org.slf4j-slf4j-api-1.7.36
Bump slf4j-api from 1.7.32 to 1.7.36
2022-07-14 14:22:10 +02:00
Sebastian Nagel b07f4e0172
Merge pull request #354 from crawler-commons/dependabot/maven/org.slf4j-slf4j-log4j12-1.7.33
Bump slf4j-log4j12 from 1.7.32 to 1.7.33
2022-07-14 14:22:03 +02:00
Sebastian Nagel 4e94ff52dc
Merge pull request #373 from crawler-commons/dependabot/maven/de.thetaphi-forbiddenapis-3.3
Bump forbiddenapis from 3.2 to 3.3
2022-07-14 14:21:46 +02:00
Sebastian Nagel 6925d4dac3
Merge pull request #374 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-javadoc-plugin-3.4.0
Bump maven-javadoc-plugin from 3.3.2 to 3.4.0
2022-07-14 14:21:34 +02:00
dependabot[bot] 0ea8aea915
Bump maven-javadoc-plugin from 3.3.2 to 3.4.0
Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.3.2 to 3.4.0.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases)
- [Commits](https://github.com/apache/maven-javadoc-plugin/compare/maven-javadoc-plugin-3.3.2...maven-javadoc-plugin-3.4.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-25 20:17:35 +00:00
dependabot[bot] b577f0519b
Bump forbiddenapis from 3.2 to 3.3
Bumps [forbiddenapis](https://github.com/policeman-tools/forbidden-apis) from 3.2 to 3.3.
- [Release notes](https://github.com/policeman-tools/forbidden-apis/releases)
- [Commits](https://github.com/policeman-tools/forbidden-apis/compare/3.2...3.3)

---
updated-dependencies:
- dependency-name: de.thetaphi:forbiddenapis
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-28 20:09:57 +00:00
dependabot[bot] 25385042e2 Bump maven-compiler-plugin from 3.10.0 to 3.10.1
Bumps [maven-compiler-plugin](https://github.com/apache/maven-compiler-plugin) from 3.10.0 to 3.10.1.
- [Release notes](https://github.com/apache/maven-compiler-plugin/releases)
- [Commits](https://github.com/apache/maven-compiler-plugin/compare/maven-compiler-plugin-3.10.0...maven-compiler-plugin-3.10.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-compiler-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-15 11:09:56 +01:00
Sebastian Nagel 5e67b46b11 Merge pull request #371 from ebx/1.3-SNAPSHOT-EBX 2022-03-02 16:13:48 +01:00
Sebastian Nagel 23ee0634dc [Sitemaps] Disable support for DTDs in sitemaps by default
- update change log
- apply code formatting
- add support for parsing sitemaps with DTD in SiteMapTester
2022-03-02 16:03:13 +01:00
kennethwong-hc 273ac6ac7e Allow set option for allow DTD, instead of system setting 2022-03-02 13:15:13 +00:00