dependabot[bot]
804d909e09
Bump junit.version from 5.5.0 to 5.8.1
...
Bumps `junit.version` from 5.5.0 to 5.8.1.
Updates `junit-jupiter-engine` from 5.5.0 to 5.8.1
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.5.0...r5.8.1 )
Updates `junit-jupiter-params` from 5.5.0 to 5.8.1
- [Release notes](https://github.com/junit-team/junit5/releases )
- [Commits](https://github.com/junit-team/junit5/compare/r5.5.0...r5.8.1 )
---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
dependency-type: direct:development
update-type: version-update:semver-minor
- dependency-name: org.junit.jupiter:junit-jupiter-params
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:42:37 +00:00
Sebastian Nagel
eaeae620d0
Merge pull request #346 from crawler-commons/dependabot/maven/org.slf4j-slf4j-log4j12-1.7.32
...
Bump slf4j-log4j12 from 1.7.7 to 1.7.32
2021-10-19 15:41:48 +02:00
dependabot[bot]
9877ad255a
Bump slf4j-log4j12 from 1.7.7 to 1.7.32
...
Bumps [slf4j-log4j12](https://github.com/qos-ch/slf4j ) from 1.7.7 to 1.7.32.
- [Release notes](https://github.com/qos-ch/slf4j/releases )
- [Commits](https://github.com/qos-ch/slf4j/compare/v1.7.7...v_1.7.32 )
---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-log4j12
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:37:17 +00:00
Sebastian Nagel
306aa31554
Merge pull request #347 from crawler-commons/dependabot/maven/commons-io-commons-io-2.7
...
Bump commons-io from 2.4 to 2.7
2021-10-19 15:36:53 +02:00
dependabot[bot]
6dad0dc9c1
Bump commons-io from 2.4 to 2.7
...
Bumps commons-io from 2.4 to 2.7.
---
updated-dependencies:
- dependency-name: commons-io:commons-io
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:30:27 +00:00
Sebastian Nagel
92fb496aa6
Merge pull request #337 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-deploy-plugin-2.8.2
...
Bump maven-deploy-plugin from 2.5 to 2.8.2
2021-10-19 15:30:11 +02:00
Sebastian Nagel
ea160e2f3a
Merge pull request #336 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-gpg-plugin-3.0.1
...
Bump maven-gpg-plugin from 1.4 to 3.0.1
2021-10-19 15:29:34 +02:00
Sebastian Nagel
adb09121ec
Merge pull request #335 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-release-plugin-2.5.3
...
Bump maven-release-plugin from 2.5.1 to 2.5.3
2021-10-19 15:29:03 +02:00
Sebastian Nagel
36dcf55de4
Merge pull request #339 from crawler-commons/dependabot/maven/org.slf4j-slf4j-api-1.7.32
...
Bump slf4j-api from 1.7.7 to 1.7.32
2021-10-19 15:28:37 +02:00
Sebastian Nagel
c692c3a637
Merge pull request #333 from valfirst/master
...
Migrate CI from Travis to GitHub Actions
2021-10-19 14:51:30 +02:00
dependabot[bot]
49e5f810e5
Bump maven-gpg-plugin from 1.4 to 3.0.1
...
Bumps [maven-gpg-plugin](https://github.com/apache/maven-gpg-plugin ) from 1.4 to 3.0.1.
- [Release notes](https://github.com/apache/maven-gpg-plugin/releases )
- [Commits](https://github.com/apache/maven-gpg-plugin/compare/maven-gpg-plugin-1.4...maven-gpg-plugin-3.0.1 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-gpg-plugin
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 12:46:45 +00:00
Sebastian Nagel
67b0971121
Merge pull request #343 from rzo1/fix-javadoc-generation
...
Upgrades JavaDoc Plugin to version 3.3.1
2021-10-19 14:45:56 +02:00
Richard Zowalla
5e922e4d9d
Fixes two JavaDoc warnings
2021-10-19 14:09:58 +02:00
Richard Zowalla
1ebccbca6d
Updates Maven JavaDoc Plugin to 3.3.1
2021-10-19 14:05:52 +02:00
Sebastian Nagel
2dc0210614
Merge pull request #341 from rzo1/fix-jar-plugin
...
Fixes wrong jar-plugin version
2021-10-19 13:58:38 +02:00
Richard Zowalla
1004fe51fd
Fixes wrong jar-plugin version, updates jar-plugin to 3.2.0
...
Converts http to https
2021-10-19 13:48:52 +02:00
Sebastian Nagel
94d7347d76
Bump maven-compiler-plugin from 3.2.0 to 3.8.1 ( #340 )
2021-10-19 14:27:22 +03:00
Sebastian Nagel
6e2c5c4e87
Release 1.2 - add Javadoc link
2021-10-14 16:40:42 +02:00
dependabot[bot]
c4f5deaa3c
Bump slf4j-api from 1.7.7 to 1.7.32
...
Bumps [slf4j-api](https://github.com/qos-ch/slf4j ) from 1.7.7 to 1.7.32.
- [Release notes](https://github.com/qos-ch/slf4j/releases )
- [Commits](https://github.com/qos-ch/slf4j/compare/v1.7.7...v_1.7.32 )
---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-api
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 20:11:20 +00:00
dependabot[bot]
c42bdaea55
Bump maven-deploy-plugin from 2.5 to 2.8.2
...
Bumps maven-deploy-plugin from 2.5 to 2.8.2.
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-deploy-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 20:11:09 +00:00
dependabot[bot]
6a38638a27
Bump maven-release-plugin from 2.5.1 to 2.5.3
...
Bumps maven-release-plugin from 2.5.1 to 2.5.3.
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-release-plugin
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 20:11:00 +00:00
Valery Yatsynovich
8a23385b19
Migrate CI from Travis to GitHub Actions
2021-10-11 14:36:25 +03:00
dependabot[bot]
22ce3703fd
Bump mockito-core from 1.8.0 to 4.0.0 ( #334 )
...
Bumps [mockito-core](https://github.com/mockito/mockito ) from 1.8.0 to 4.0.0.
- [Release notes](https://github.com/mockito/mockito/releases )
- [Commits](https://github.com/mockito/mockito/compare/v1.8.0...v4.0.0 )
---
updated-dependencies:
- dependency-name: org.mockito:mockito-core
dependency-type: direct:development
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:36:17 +03:00
dependabot[bot]
a6545f6610
Bump maven-compiler-plugin.version from 2.3.2 to 3.2.0 ( #331 )
...
Bumps `maven-compiler-plugin.version` from 2.3.2 to 3.2.0.
Updates `maven-jar-plugin` from 2.3.2 to 3.2.0
- [Release notes](https://github.com/apache/maven-jar-plugin/releases )
- [Commits](https://github.com/apache/maven-jar-plugin/compare/maven-jar-plugin-2.3.2...maven-jar-plugin-3.2.0 )
Updates `maven-compiler-plugin` from 2.3.2 to 3.2.0
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-jar-plugin
dependency-type: direct:production
update-type: version-update:semver-major
- dependency-name: org.apache.maven.plugins:maven-compiler-plugin
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:33:37 +03:00
dependabot[bot]
b07fde3194
Bump checksum-maven-plugin from 1.0.1 to 1.4 ( #330 )
...
Bumps [checksum-maven-plugin](https://github.com/nicoulaj/checksum-maven-plugin ) from 1.0.1 to 1.4.
- [Release notes](https://github.com/nicoulaj/checksum-maven-plugin/releases )
- [Commits](https://github.com/nicoulaj/checksum-maven-plugin/compare/1.0.1...1.4 )
---
updated-dependencies:
- dependency-name: net.ju-n.maven.plugins:checksum-maven-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:33:27 +03:00
dependabot[bot]
fd21b3f493
Bump maven-source-plugin from 2.1.2 to 3.2.1 ( #329 )
...
Bumps [maven-source-plugin](https://github.com/apache/maven-source-plugin ) from 2.1.2 to 3.2.1.
- [Release notes](https://github.com/apache/maven-source-plugin/releases )
- [Commits](https://github.com/apache/maven-source-plugin/compare/maven-source-plugin-2.1.2...maven-source-plugin-3.2.1 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-source-plugin
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:32:48 +03:00
dependabot[bot]
a5fc56a307
Bump download-maven-plugin from 1.6.0 to 1.6.7 ( #328 )
...
Bumps [download-maven-plugin](https://github.com/maven-download-plugin/maven-download-plugin ) from 1.6.0 to 1.6.7.
- [Release notes](https://github.com/maven-download-plugin/maven-download-plugin/releases )
- [Commits](https://github.com/maven-download-plugin/maven-download-plugin/compare/1.6.0...1.6.7 )
---
updated-dependencies:
- dependency-name: com.googlecode.maven-download-plugin:download-maven-plugin
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:30:45 +03:00
Ken Krugler
dacda63b8c
Remove Sonatype repo from gradle description, only needed for RC builds
2021-10-09 11:48:11 -07:00
Ken Krugler
f5ad86a58f
Add Gradle info
2021-10-09 11:43:34 -07:00
Sebastian Nagel
e66579ba74
Merge pull request #327 from valfirst/patch-1
...
Enable Dependabot
2021-10-07 12:30:54 +02:00
Valery Yatsynovich
12bd46b5a3
Enable Dependabot
2021-10-07 09:55:50 +03:00
Sebastian Nagel
24da43e4c2
[maven-release-plugin] prepare for next development iteration
2021-10-06 22:24:07 +02:00
Sebastian Nagel
b5b500f58b
[maven-release-plugin] prepare release crawler-commons-1.2
2021-10-06 22:24:00 +02:00
Sebastian Nagel
1f9e238db4
Prepare release of crawler-commons-1.1
...
- update CHANGES.txt
- complete KEYS
2021-10-06 21:41:41 +02:00
Sebastian Nagel
0493878f80
Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions ( #326 )
...
* Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions
(fixes #322 )
- compare URL strings to avoid that java.net.URL::equals triggers unwanted and potentially slow
DNS lookups to resolve the host part. Replace:
- Objects::equals in equals methods of sitemap extensions
- URL::equals and URL::hashCode in SiteMapIndex and SiteMapURL
- enable check for URL::equals and URL::hashCode in Forbidden API Checker
* Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions
- avoid NPEs in equals and hashCode methods
* Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions
- avoid NPE, return null as before if null is passed to SitemapIndex::getSitemap
2021-10-06 12:07:02 +03:00
Sebastian Nagel
ec1f2e54ec
Merge pull request #324 from aecio/issue-321-builder
...
Add a builder API for configuring the BasicURLNormalizer
2021-10-05 10:22:14 +02:00
Sebastian Nagel
4b45097441
Add a builder API for configuring the BasicURLNormalizer
...
- allow to normalize host names to Unicode (add to changelog)
2021-10-05 10:21:34 +02:00
Sebastian Nagel
10d3021055
Add a builder API for configuring the BasicURLNormalizer
...
- allow to normalize host names to Unicode
2021-10-04 17:24:26 +02:00
Aécio Santos
12e2c389b2
Add a builder API for configuring the BasicURLNormalizer
...
Usage example:
```
normalizer = BasicURLNormalizer.newBuilder()
.idnNormalization(IdnNormalization.PUNYCODE)
.queryParamsToRemove(
asList("sid", "phpsessid", "sessionid", "jsessionid")
)
.build();
```
Closes #321 .
2021-10-04 10:15:09 -04:00
Sebastian Nagel
47ee966024
Merge branch 'kovyrin/sitemap-xxe'
...
Fix XXE vulnerability in Sitemap parser #323
2021-10-01 10:10:54 +02:00
Sebastian Nagel
4841242390
Fix XXE vulnerability in Sitemap parser
...
- add unit test to verify that the parser is not vulnerable
to XInclude attacks
- apply code formatter
- add changelog entry
2021-10-01 10:07:14 +02:00
Oleksiy Kovyrin
2b66ad2060
Do not use a temporary file
2021-09-30 17:38:35 -04:00
Oleksiy Kovyrin
7555bcbbbe
Disable entity resolution features in Java SAX XML parser to avoid XXE vulnerabilities while parsing Sitemaps
2021-09-29 12:56:17 -04:00
Sebastian Nagel
a10cf2540a
Merge branch 'aecio:aecio/query-params-normalization', fixes #246 , closes #309
...
- rebase to master and squash commits
- fix failing sitemaps unit tests with URL filtering using BasicURLNormalizer
(sort query params in test sitemap)
- CHANGES.txt: updated to follow style, added missing entry for preceding commit
2021-09-21 12:34:39 +02:00
Aécio Santos
94bac65639
Query parameters normalization
...
- Sort query parameters (fix #246 )
- Allows to (optionally) remove common irrelevant query parameters
2021-09-21 12:02:00 +02:00
Sebastian Nagel
7a8bbb6ba3
Merge pull request #307 from sebastian-nagel/cc-305-sitemaps-normalize-urls
...
Allow to normalize URLs in sitemaps, resolves #305
2021-08-14 13:45:21 +02:00
Avi Hayun
0ea45f4c5c
Normalizing CHANGES.txt ( #313 )
...
* This normalization basically adds the [Unit_Name] in front of the issue when it is obvious and when it is missing
Added the [Domains] unit name (as in the java package name)
Didn't touch the issues changelog prior to v0.7
This resolves #270
* Updated according to Sebastian's code review
2021-08-11 17:16:22 +03:00
Avi Hayun
44304581bc
Readme.md Overhaul ( #312 )
...
Added Table-of-Contents
Removed issue tracking section
Added Maven installation
Added License
2021-08-09 09:00:06 +03:00
Sebastian Nagel
386608f7e8
Allow to normalize URLs in sitemaps, resolves #305
...
- extend SiteMapParser by methods to register a URLFilter (function)
used to normalize or filter (if null is returned) URLs found in
sitemaps
- implement URL filtering in sitemap parsers / XML handlers
- add unit tests to verify URL filtering for text and XML sitemaps
2020-12-08 15:28:58 +01:00
Sebastian Nagel
9630f4c09c
Merge pull request #306 from sebastian-nagel/cc-271-urlnormalizer-basic-url-without-scheme
...
Normalize URL without a scheme, resolves #271
2020-11-13 12:15:04 +01:00