Sebastian Nagel
8bb1694669
Updates changelog for #192 , #362 , #383 , #389 and merged dependabot pull requests
2023-05-11 16:52:23 +02:00
dependabot[bot]
1eefc10ce1
Bump maven-surefire-plugin from 3.0.0 to 3.1.0
...
Bumps [maven-surefire-plugin](https://github.com/apache/maven-surefire ) from 3.0.0 to 3.1.0.
- [Release notes](https://github.com/apache/maven-surefire/releases )
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.0.0...surefire-3.1.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-05-11 16:20:45 +02:00
dependabot[bot]
5c317d3c23
Bump maven-gpg-plugin from 3.0.1 to 3.1.0
...
Bumps [maven-gpg-plugin](https://github.com/apache/maven-gpg-plugin ) from 3.0.1 to 3.1.0.
- [Commits](https://github.com/apache/maven-gpg-plugin/compare/maven-gpg-plugin-3.0.1...maven-gpg-plugin-3.1.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-gpg-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-05-11 16:20:16 +02:00
Sebastian Nagel
79bef97d40
Merge pull request #401 from sebastian-nagel/cc-389-allow-disallow-unicode-paths
...
[Robots.txt] Handle allow/disallow directives containing unescaped Unicode characters
2023-05-11 16:19:23 +02:00
dependabot[bot]
e691cec4cf
Bump junit.version from 5.9.2 to 5.9.3
...
Bumps `junit.version` from 5.9.2 to 5.9.3.
Updates `junit-jupiter-engine` from 5.9.2 to 5.9.3
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.2...r5.9.3 )
Updates `junit-jupiter-params` from 5.9.2 to 5.9.3
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.2...r5.9.3 )
---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
dependency-type: direct:development
update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-05-02 08:08:32 +02:00
dependabot[bot]
764ef96ea1
Bump download-maven-plugin from 1.6.8 to 1.7.0
...
Bumps [download-maven-plugin](https://github.com/maven-download-plugin/maven-download-plugin ) from 1.6.8 to 1.7.0.
- [Release notes](https://github.com/maven-download-plugin/maven-download-plugin/releases )
- [Commits](https://github.com/maven-download-plugin/maven-download-plugin/compare/1.6.8...1.7.0 )
---
updated-dependencies:
- dependency-name: com.googlecode.maven-download-plugin:download-maven-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-05-02 08:08:26 +02:00
dependabot[bot]
1291f5cddb
Bump forbiddenapis from 3.4 to 3.5.1
...
Bumps [forbiddenapis](https://github.com/policeman-tools/forbidden-apis ) from 3.4 to 3.5.1.
- [Release notes](https://github.com/policeman-tools/forbidden-apis/releases )
- [Commits](https://github.com/policeman-tools/forbidden-apis/compare/3.4...3.5.1 )
---
updated-dependencies:
- dependency-name: de.thetaphi:forbiddenapis
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-25 09:51:52 +02:00
dependabot[bot]
a980ae10da
Bump maven-surefire-plugin from 2.22.2 to 3.0.0
...
Bumps [maven-surefire-plugin](https://github.com/apache/maven-surefire ) from 2.22.2 to 3.0.0.
- [Release notes](https://github.com/apache/maven-surefire/releases )
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-2.22.2...surefire-3.0.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-25 09:51:42 +02:00
Sebastian Nagel
a395cfee73
Add link to RFC 9309 to Javadoc class description
2023-04-24 17:36:08 +02:00
Sebastian Nagel
be2d5c24d3
Fix line wrapping in comments
2023-04-24 17:27:16 +02:00
Sebastian Nagel
2c2cb3bf7a
[Robots.txt] Handle allow/disallow directives containing unescaped
...
Unicode characters, fixes #389
- use UTF-8 as default input encoding of robots.txt files
- add unit test
- test matching of Unicode paths in allow/disallow directives
- test for proper matching of ASCII paths if encoding is not
UTF-8 (and no byte order mark present)
2023-04-24 17:27:16 +02:00
Sebastian Nagel
d8a6126365
[Robots.txt] RFC compliance: matching user-agent names when selecting rule blocks ( #362 )
...
* RFC compliance: matching user-agent names when selecting rule blocks
- add unit test to verify that the rule with the completely
matched user-agent name is selected, and no partial prefix match
is preferred (cf. also #192 )
* RFC compliance: matching user-agent names when selecting rule blocks
- refactor agent name matching and move splitting robotNames string
at comma into a separate method to be called once at the beginning
of parsing the robots.txt file
- extend the robots parser API and add a method to pass agent names
as a collection following the RFC 9309 with no splitting of the
names into words/tokens.
- deprecate "old" method which splits the robot name into tokens and
performs prefix matching
- by default user agent names are matched literally but case-insensitive
following RFC 9309. Add method to "restore" the prefix matching:
"setExactUserAgentMatching(false)"
- BaseRobotRulesParser: move the documented details about how
user-agent names are matched into SimpleRobotRulesParser
- unit tests: add tests for issues described in #192 , configure exact
user-agent matching if required
* RFC compliance: matching user-agent names when selecting rule blocks
- match user-agent product token at beginning of user-agent
line/statement followed by ignored non-token characters,
e.g. "foo" is matched in "User-agent: foo/1.2"
* RFC compliance: matching user-agent names when selecting rule blocks
- match user-agent product tokens followed by ignored characters
also in legacy prefix matching mode, e.g. match "butterfly" in
"User-agent: Butterfly/1.0"
- refactor prefix matching: switch inner and outer loop, handle
check for (common) wild-card user-agent outside of loop
* RFC compliance: matching user-agent names when selecting rule blocks
- make exact user-agent matching the default in unit tests,
explicitly pass flag for legacy prefix user-agent matching
in unit tests where needed
- names not following the ua pattern in the specificiation "[a-zA-Z_-]+"
- user-agent lines with multiple user-agent names
* RFC compliance: matching user-agent names when selecting rule blocks
- make the method to handle prefix/partial user-agent product token
matches protected, so that it can be overridden to match non-standard
user-agent product tokens, e.g. "Go!zilla"
2023-04-24 17:24:59 +02:00
dependabot[bot]
f2982c5d11
Bump maven-deploy-plugin from 3.0.0 to 3.1.1
...
Bumps [maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin ) from 3.0.0 to 3.1.1.
- [Release notes](https://github.com/apache/maven-deploy-plugin/releases )
- [Commits](https://github.com/apache/maven-deploy-plugin/compare/maven-deploy-plugin-3.0.0...maven-deploy-plugin-3.1.1 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-deploy-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:07:56 +02:00
dependabot[bot]
a0bf1c9167
Bump slf4j-api from 1.7.36 to 2.0.7
...
Bumps [slf4j-api](https://github.com/qos-ch/slf4j ) from 1.7.36 to 2.0.7.
- [Release notes](https://github.com/qos-ch/slf4j/releases )
- [Commits](https://github.com/qos-ch/slf4j/commits )
---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-api
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:07:39 +02:00
dependabot[bot]
8da99dc711
Bump maven-release-plugin from 2.5.3 to 3.0.0
...
Bumps [maven-release-plugin](https://github.com/apache/maven-release ) from 2.5.3 to 3.0.0.
- [Release notes](https://github.com/apache/maven-release/releases )
- [Commits](https://github.com/apache/maven-release/compare/maven-release-2.5.3...maven-release-3.0.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-release-plugin
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:06:48 +02:00
dependabot[bot]
0ac4df1ef1
Bump maven-compiler-plugin from 3.10.1 to 3.11.0
...
Bumps [maven-compiler-plugin](https://github.com/apache/maven-compiler-plugin ) from 3.10.1 to 3.11.0.
- [Release notes](https://github.com/apache/maven-compiler-plugin/releases )
- [Commits](https://github.com/apache/maven-compiler-plugin/compare/maven-compiler-plugin-3.10.1...maven-compiler-plugin-3.11.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-compiler-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:04:49 +02:00
dependabot[bot]
62828f8d35
Bump maven-javadoc-plugin from 3.4.1 to 3.5.0
...
Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin ) from 3.4.1 to 3.5.0.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases )
- [Commits](https://github.com/apache/maven-javadoc-plugin/compare/maven-javadoc-plugin-3.4.1...maven-javadoc-plugin-3.5.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-04-20 17:04:23 +02:00
dependabot[bot]
6b674b1894
Bump junit.version from 5.9.1 to 5.9.2
...
Bumps `junit.version` from 5.9.1 to 5.9.2.
Updates `junit-jupiter-engine` from 5.9.1 to 5.9.2
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.1...r5.9.2 )
Updates `junit-jupiter-params` from 5.9.1 to 5.9.2
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.1...r5.9.2 )
---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
dependency-type: direct:development
update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-01-19 12:06:17 +01:00
Sebastian Nagel
cec5716a4c
Upgrade the project to use Java 11 ( #376 )
2023-01-19 11:55:58 +01:00
dependabot[bot]
fce53a0933
Bump maven-jar-plugin from 3.2.2 to 3.3.0
...
Bumps [maven-jar-plugin](https://github.com/apache/maven-jar-plugin ) from 3.2.2 to 3.3.0.
- [Release notes](https://github.com/apache/maven-jar-plugin/releases )
- [Commits](https://github.com/apache/maven-jar-plugin/compare/maven-jar-plugin-3.2.2...maven-jar-plugin-3.3.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-jar-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:59:15 +02:00
dependabot[bot]
838d3d1e26
Bump maven-javadoc-plugin from 3.4.0 to 3.4.1
...
Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin ) from 3.4.0 to 3.4.1.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases )
- [Commits](https://github.com/apache/maven-javadoc-plugin/compare/maven-javadoc-plugin-3.4.0...maven-javadoc-plugin-3.4.1 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:59:04 +02:00
dependabot[bot]
c78bf500f7
Bump junit.version from 5.9.0 to 5.9.1
...
Bumps `junit.version` from 5.9.0 to 5.9.1.
Updates `junit-jupiter-engine` from 5.9.0 to 5.9.1
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.0...r5.9.1 )
Updates `junit-jupiter-params` from 5.9.0 to 5.9.1
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.0...r5.9.1 )
---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
dependency-type: direct:development
update-type: version-update:semver-patch
- dependency-name: org.junit.jupiter:junit-jupiter-params
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:58:00 +02:00
dependabot[bot]
546ea4d633
Bump forbiddenapis from 3.3 to 3.4
...
Bumps [forbiddenapis](https://github.com/policeman-tools/forbidden-apis ) from 3.3 to 3.4.
- [Release notes](https://github.com/policeman-tools/forbidden-apis/releases )
- [Commits](https://github.com/policeman-tools/forbidden-apis/compare/3.3...3.4 )
---
updated-dependencies:
- dependency-name: de.thetaphi:forbiddenapis
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 17:57:45 +02:00
Sebastian Nagel
3368cb53ef
Improve readability of robots.txt unit tests ( #383 )
...
- put lines of embedded robots.txt test files
into separate code lines (except for empty lines)
- apply code formatting template
2022-10-06 13:26:13 +02:00
Sebastian Nagel
09bc9c064c
Updates changelog for #351
2022-08-11 14:12:20 +02:00
Eduardo Jimenez
4ad101cf0d
Ran java formatter
2022-08-11 14:08:36 +02:00
Eduardo Jimenez
1f0e79b72a
Improve robots check draft rfc compliance
2022-08-11 14:08:36 +02:00
Sebastian Nagel
d3ccb553df
Updates changelog for #378/#380, #377 , #379
2022-08-10 10:17:01 +02:00
Sebastian Nagel
0a5a7ff217
Merge pull request #379 from crawler-commons/dependabot/maven/junit.version-5.9.0
...
Bump junit.version from 5.8.2 to 5.9.0
2022-08-10 10:13:34 +02:00
Sebastian Nagel
5b63dce5c8
Merge pull request #377 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-deploy-plugin-3.0.0
...
Bump maven-deploy-plugin from 2.8.2 to 3.0.0
2022-08-10 10:13:24 +02:00
Sebastian Nagel
08629d53ff
Merge pull request #380 from sebastian-nagel/cc-378-javadoc-search
...
Javadoc: ensure Javascript search is working, fixes #378
2022-08-10 10:11:41 +02:00
Sebastian Nagel
9253f676b8
Javadoc: ensure Javascript search is working, fixes #378
...
- (only for JDK / Java 11) pass option --no-module-directories
2022-08-08 15:53:54 +02:00
dependabot[bot]
17b88275f7
Bump junit.version from 5.8.2 to 5.9.0
...
Bumps `junit.version` from 5.8.2 to 5.9.0.
Updates `junit-jupiter-engine` from 5.8.2 to 5.9.0
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.8.2...r5.9.0 )
Updates `junit-jupiter-params` from 5.8.2 to 5.9.0
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.8.2...r5.9.0 )
---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
dependency-type: direct:development
update-type: version-update:semver-minor
- dependency-name: org.junit.jupiter:junit-jupiter-params
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-08-01 20:16:08 +00:00
Sebastian Nagel
39d0f5fd97
Release 1.3
...
- README: add 1.3 to the News
2022-07-28 13:25:30 +02:00
Sebastian Nagel
1e859ccb16
Release 1.3
...
- add 1.3 Javadocs to README
- update previous version Javadoc links to use https://
- prepare CHANGES.txt for next development iteration
2022-07-28 12:12:38 +02:00
dependabot[bot]
037d58f867
Bump maven-deploy-plugin from 2.8.2 to 3.0.0
...
Bumps [maven-deploy-plugin](https://github.com/apache/maven-deploy-plugin ) from 2.8.2 to 3.0.0.
- [Release notes](https://github.com/apache/maven-deploy-plugin/releases )
- [Commits](https://github.com/apache/maven-deploy-plugin/compare/maven-deploy-plugin-2.8.2...maven-deploy-plugin-3.0.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-deploy-plugin
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-07-25 20:29:06 +00:00
Sebastian Nagel
89ce5d216e
[maven-release-plugin] prepare for next development iteration
2022-07-19 09:51:21 +02:00
Sebastian Nagel
e2e7898369
[maven-release-plugin] prepare release crawler-commons-1.3
2022-07-19 09:51:18 +02:00
Sebastian Nagel
39ab109dca
Prepare release of crawler-commons-1.3
...
- update CHANGES.txt
2022-07-19 09:19:56 +02:00
Sebastian Nagel
527e4cf229
Update change log: add #354 , #361 , #373 , #374
2022-07-15 08:59:13 +02:00
Sebastian Nagel
9d558f6532
Merge pull request #361 from crawler-commons/dependabot/maven/org.slf4j-slf4j-api-1.7.36
...
Bump slf4j-api from 1.7.32 to 1.7.36
2022-07-14 14:22:10 +02:00
Sebastian Nagel
b07f4e0172
Merge pull request #354 from crawler-commons/dependabot/maven/org.slf4j-slf4j-log4j12-1.7.33
...
Bump slf4j-log4j12 from 1.7.32 to 1.7.33
2022-07-14 14:22:03 +02:00
Sebastian Nagel
4e94ff52dc
Merge pull request #373 from crawler-commons/dependabot/maven/de.thetaphi-forbiddenapis-3.3
...
Bump forbiddenapis from 3.2 to 3.3
2022-07-14 14:21:46 +02:00
Sebastian Nagel
6925d4dac3
Merge pull request #374 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-javadoc-plugin-3.4.0
...
Bump maven-javadoc-plugin from 3.3.2 to 3.4.0
2022-07-14 14:21:34 +02:00
dependabot[bot]
0ea8aea915
Bump maven-javadoc-plugin from 3.3.2 to 3.4.0
...
Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin ) from 3.3.2 to 3.4.0.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases )
- [Commits](https://github.com/apache/maven-javadoc-plugin/compare/maven-javadoc-plugin-3.3.2...maven-javadoc-plugin-3.4.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-04-25 20:17:35 +00:00
dependabot[bot]
b577f0519b
Bump forbiddenapis from 3.2 to 3.3
...
Bumps [forbiddenapis](https://github.com/policeman-tools/forbidden-apis ) from 3.2 to 3.3.
- [Release notes](https://github.com/policeman-tools/forbidden-apis/releases )
- [Commits](https://github.com/policeman-tools/forbidden-apis/compare/3.2...3.3 )
---
updated-dependencies:
- dependency-name: de.thetaphi:forbiddenapis
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-03-28 20:09:57 +00:00
dependabot[bot]
25385042e2
Bump maven-compiler-plugin from 3.10.0 to 3.10.1
...
Bumps [maven-compiler-plugin](https://github.com/apache/maven-compiler-plugin ) from 3.10.0 to 3.10.1.
- [Release notes](https://github.com/apache/maven-compiler-plugin/releases )
- [Commits](https://github.com/apache/maven-compiler-plugin/compare/maven-compiler-plugin-3.10.0...maven-compiler-plugin-3.10.1 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-compiler-plugin
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-03-15 11:09:56 +01:00
Sebastian Nagel
5e67b46b11
Merge pull request #371 from ebx/1.3-SNAPSHOT-EBX
2022-03-02 16:13:48 +01:00
Sebastian Nagel
23ee0634dc
[Sitemaps] Disable support for DTDs in sitemaps by default
...
- update change log
- apply code formatting
- add support for parsing sitemaps with DTD in SiteMapTester
2022-03-02 16:03:13 +01:00
kennethwong-hc
273ac6ac7e
Allow set option for allow DTD, instead of system setting
2022-03-02 13:15:13 +00:00