1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-11 08:16:04 +02:00
Commit Graph

503 Commits

Author SHA1 Message Date
dependabot[bot] 804d909e09
Bump junit.version from 5.5.0 to 5.8.1
Bumps `junit.version` from 5.5.0 to 5.8.1.

Updates `junit-jupiter-engine` from 5.5.0 to 5.8.1
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.5.0...r5.8.1)

Updates `junit-jupiter-params` from 5.5.0 to 5.8.1
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.5.0...r5.8.1)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter-engine
  dependency-type: direct:development
  update-type: version-update:semver-minor
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:42:37 +00:00
Sebastian Nagel eaeae620d0
Merge pull request #346 from crawler-commons/dependabot/maven/org.slf4j-slf4j-log4j12-1.7.32
Bump slf4j-log4j12 from 1.7.7 to 1.7.32
2021-10-19 15:41:48 +02:00
dependabot[bot] 9877ad255a
Bump slf4j-log4j12 from 1.7.7 to 1.7.32
Bumps [slf4j-log4j12](https://github.com/qos-ch/slf4j) from 1.7.7 to 1.7.32.
- [Release notes](https://github.com/qos-ch/slf4j/releases)
- [Commits](https://github.com/qos-ch/slf4j/compare/v1.7.7...v_1.7.32)

---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-log4j12
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:37:17 +00:00
Sebastian Nagel 306aa31554
Merge pull request #347 from crawler-commons/dependabot/maven/commons-io-commons-io-2.7
Bump commons-io from 2.4 to 2.7
2021-10-19 15:36:53 +02:00
dependabot[bot] 6dad0dc9c1
Bump commons-io from 2.4 to 2.7
Bumps commons-io from 2.4 to 2.7.

---
updated-dependencies:
- dependency-name: commons-io:commons-io
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:30:27 +00:00
Sebastian Nagel 92fb496aa6
Merge pull request #337 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-deploy-plugin-2.8.2
Bump maven-deploy-plugin from 2.5 to 2.8.2
2021-10-19 15:30:11 +02:00
Sebastian Nagel ea160e2f3a
Merge pull request #336 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-gpg-plugin-3.0.1
Bump maven-gpg-plugin from 1.4 to 3.0.1
2021-10-19 15:29:34 +02:00
Sebastian Nagel adb09121ec
Merge pull request #335 from crawler-commons/dependabot/maven/org.apache.maven.plugins-maven-release-plugin-2.5.3
Bump maven-release-plugin from 2.5.1 to 2.5.3
2021-10-19 15:29:03 +02:00
Sebastian Nagel 36dcf55de4
Merge pull request #339 from crawler-commons/dependabot/maven/org.slf4j-slf4j-api-1.7.32
Bump slf4j-api from 1.7.7 to 1.7.32
2021-10-19 15:28:37 +02:00
Sebastian Nagel c692c3a637
Merge pull request #333 from valfirst/master
Migrate CI from Travis to GitHub Actions
2021-10-19 14:51:30 +02:00
dependabot[bot] 49e5f810e5
Bump maven-gpg-plugin from 1.4 to 3.0.1
Bumps [maven-gpg-plugin](https://github.com/apache/maven-gpg-plugin) from 1.4 to 3.0.1.
- [Release notes](https://github.com/apache/maven-gpg-plugin/releases)
- [Commits](https://github.com/apache/maven-gpg-plugin/compare/maven-gpg-plugin-1.4...maven-gpg-plugin-3.0.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-gpg-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 12:46:45 +00:00
Sebastian Nagel 67b0971121
Merge pull request #343 from rzo1/fix-javadoc-generation
Upgrades JavaDoc Plugin to version 3.3.1
2021-10-19 14:45:56 +02:00
Richard Zowalla 5e922e4d9d Fixes two JavaDoc warnings 2021-10-19 14:09:58 +02:00
Richard Zowalla 1ebccbca6d Updates Maven JavaDoc Plugin to 3.3.1 2021-10-19 14:05:52 +02:00
Sebastian Nagel 2dc0210614
Merge pull request #341 from rzo1/fix-jar-plugin
Fixes wrong jar-plugin version
2021-10-19 13:58:38 +02:00
Richard Zowalla 1004fe51fd Fixes wrong jar-plugin version, updates jar-plugin to 3.2.0
Converts http to https
2021-10-19 13:48:52 +02:00
Sebastian Nagel 94d7347d76
Bump maven-compiler-plugin from 3.2.0 to 3.8.1 (#340) 2021-10-19 14:27:22 +03:00
Sebastian Nagel 6e2c5c4e87 Release 1.2 - add Javadoc link 2021-10-14 16:40:42 +02:00
dependabot[bot] c4f5deaa3c
Bump slf4j-api from 1.7.7 to 1.7.32
Bumps [slf4j-api](https://github.com/qos-ch/slf4j) from 1.7.7 to 1.7.32.
- [Release notes](https://github.com/qos-ch/slf4j/releases)
- [Commits](https://github.com/qos-ch/slf4j/compare/v1.7.7...v_1.7.32)

---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-api
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 20:11:20 +00:00
dependabot[bot] c42bdaea55
Bump maven-deploy-plugin from 2.5 to 2.8.2
Bumps maven-deploy-plugin from 2.5 to 2.8.2.

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-deploy-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 20:11:09 +00:00
dependabot[bot] 6a38638a27
Bump maven-release-plugin from 2.5.1 to 2.5.3
Bumps maven-release-plugin from 2.5.1 to 2.5.3.

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-release-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 20:11:00 +00:00
Valery Yatsynovich 8a23385b19
Migrate CI from Travis to GitHub Actions 2021-10-11 14:36:25 +03:00
dependabot[bot] 22ce3703fd
Bump mockito-core from 1.8.0 to 4.0.0 (#334)
Bumps [mockito-core](https://github.com/mockito/mockito) from 1.8.0 to 4.0.0.
- [Release notes](https://github.com/mockito/mockito/releases)
- [Commits](https://github.com/mockito/mockito/compare/v1.8.0...v4.0.0)

---
updated-dependencies:
- dependency-name: org.mockito:mockito-core
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:36:17 +03:00
dependabot[bot] a6545f6610
Bump maven-compiler-plugin.version from 2.3.2 to 3.2.0 (#331)
Bumps `maven-compiler-plugin.version` from 2.3.2 to 3.2.0.

Updates `maven-jar-plugin` from 2.3.2 to 3.2.0
- [Release notes](https://github.com/apache/maven-jar-plugin/releases)
- [Commits](https://github.com/apache/maven-jar-plugin/compare/maven-jar-plugin-2.3.2...maven-jar-plugin-3.2.0)

Updates `maven-compiler-plugin` from 2.3.2 to 3.2.0

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-jar-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
- dependency-name: org.apache.maven.plugins:maven-compiler-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:33:37 +03:00
dependabot[bot] b07fde3194
Bump checksum-maven-plugin from 1.0.1 to 1.4 (#330)
Bumps [checksum-maven-plugin](https://github.com/nicoulaj/checksum-maven-plugin) from 1.0.1 to 1.4.
- [Release notes](https://github.com/nicoulaj/checksum-maven-plugin/releases)
- [Commits](https://github.com/nicoulaj/checksum-maven-plugin/compare/1.0.1...1.4)

---
updated-dependencies:
- dependency-name: net.ju-n.maven.plugins:checksum-maven-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:33:27 +03:00
dependabot[bot] fd21b3f493
Bump maven-source-plugin from 2.1.2 to 3.2.1 (#329)
Bumps [maven-source-plugin](https://github.com/apache/maven-source-plugin) from 2.1.2 to 3.2.1.
- [Release notes](https://github.com/apache/maven-source-plugin/releases)
- [Commits](https://github.com/apache/maven-source-plugin/compare/maven-source-plugin-2.1.2...maven-source-plugin-3.2.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-source-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:32:48 +03:00
dependabot[bot] a5fc56a307
Bump download-maven-plugin from 1.6.0 to 1.6.7 (#328)
Bumps [download-maven-plugin](https://github.com/maven-download-plugin/maven-download-plugin) from 1.6.0 to 1.6.7.
- [Release notes](https://github.com/maven-download-plugin/maven-download-plugin/releases)
- [Commits](https://github.com/maven-download-plugin/maven-download-plugin/compare/1.6.0...1.6.7)

---
updated-dependencies:
- dependency-name: com.googlecode.maven-download-plugin:download-maven-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-11 09:30:45 +03:00
Ken Krugler dacda63b8c
Remove Sonatype repo from gradle description, only needed for RC builds 2021-10-09 11:48:11 -07:00
Ken Krugler f5ad86a58f
Add Gradle info 2021-10-09 11:43:34 -07:00
Sebastian Nagel e66579ba74
Merge pull request #327 from valfirst/patch-1
Enable Dependabot
2021-10-07 12:30:54 +02:00
Valery Yatsynovich 12bd46b5a3
Enable Dependabot 2021-10-07 09:55:50 +03:00
Sebastian Nagel 24da43e4c2 [maven-release-plugin] prepare for next development iteration 2021-10-06 22:24:07 +02:00
Sebastian Nagel b5b500f58b [maven-release-plugin] prepare release crawler-commons-1.2 2021-10-06 22:24:00 +02:00
Sebastian Nagel 1f9e238db4 Prepare release of crawler-commons-1.1
- update CHANGES.txt
- complete KEYS
2021-10-06 21:41:41 +02:00
Sebastian Nagel 0493878f80
Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions (#326)
* Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions
(fixes #322)
- compare URL strings to avoid that java.net.URL::equals triggers unwanted and potentially slow
  DNS lookups to resolve the host part. Replace:
  - Objects::equals in equals methods of sitemap extensions
  - URL::equals and URL::hashCode in SiteMapIndex and SiteMapURL
- enable check for URL::equals and URL::hashCode in Forbidden API Checker

* Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions
- avoid NPEs in equals and hashCode methods

* Sitemaps: avoid calling java.net.URL::equals in equals method of sitemaps and sitemap extensions
- avoid NPE, return null as before if null is passed to SitemapIndex::getSitemap
2021-10-06 12:07:02 +03:00
Sebastian Nagel ec1f2e54ec
Merge pull request #324 from aecio/issue-321-builder
Add a builder API for configuring the BasicURLNormalizer
2021-10-05 10:22:14 +02:00
Sebastian Nagel 4b45097441 Add a builder API for configuring the BasicURLNormalizer
- allow to normalize host names to Unicode (add to changelog)
2021-10-05 10:21:34 +02:00
Sebastian Nagel 10d3021055 Add a builder API for configuring the BasicURLNormalizer
- allow to normalize host names to Unicode
2021-10-04 17:24:26 +02:00
Aécio Santos 12e2c389b2
Add a builder API for configuring the BasicURLNormalizer
Usage example:
```
normalizer = BasicURLNormalizer.newBuilder()
  .idnNormalization(IdnNormalization.PUNYCODE)
  .queryParamsToRemove(
    asList("sid", "phpsessid", "sessionid", "jsessionid")
  )
  .build();
```

Closes #321.
2021-10-04 10:15:09 -04:00
Sebastian Nagel 47ee966024 Merge branch 'kovyrin/sitemap-xxe'
Fix XXE vulnerability in Sitemap parser #323
2021-10-01 10:10:54 +02:00
Sebastian Nagel 4841242390 Fix XXE vulnerability in Sitemap parser
- add unit test to verify that the parser is not vulnerable
  to XInclude attacks
- apply code formatter
- add changelog entry
2021-10-01 10:07:14 +02:00
Oleksiy Kovyrin 2b66ad2060 Do not use a temporary file 2021-09-30 17:38:35 -04:00
Oleksiy Kovyrin 7555bcbbbe Disable entity resolution features in Java SAX XML parser to avoid XXE vulnerabilities while parsing Sitemaps 2021-09-29 12:56:17 -04:00
Sebastian Nagel a10cf2540a Merge branch 'aecio:aecio/query-params-normalization', fixes #246, closes #309
- rebase to master and squash commits
- fix failing sitemaps unit tests with URL filtering using BasicURLNormalizer
  (sort query params in test sitemap)
- CHANGES.txt: updated to follow style, added missing entry for preceding commit
2021-09-21 12:34:39 +02:00
Aécio Santos 94bac65639 Query parameters normalization
- Sort query parameters (fix #246)
- Allows to (optionally) remove common irrelevant query parameters
2021-09-21 12:02:00 +02:00
Sebastian Nagel 7a8bbb6ba3
Merge pull request #307 from sebastian-nagel/cc-305-sitemaps-normalize-urls
Allow to normalize URLs in sitemaps, resolves #305
2021-08-14 13:45:21 +02:00
Avi Hayun 0ea45f4c5c
Normalizing CHANGES.txt (#313)
* This normalization basically adds the [Unit_Name] in front of the issue when it is obvious and when it is missing
Added the [Domains] unit name (as in the java package name)
Didn't touch the issues changelog prior to v0.7

This resolves #270

* Updated according to Sebastian's code review
2021-08-11 17:16:22 +03:00
Avi Hayun 44304581bc
Readme.md Overhaul (#312)
Added Table-of-Contents
Removed issue tracking section
Added Maven installation
Added License
2021-08-09 09:00:06 +03:00
Sebastian Nagel 386608f7e8 Allow to normalize URLs in sitemaps, resolves #305
- extend SiteMapParser by methods to register a URLFilter (function)
  used to normalize or filter (if null is returned) URLs found in
  sitemaps
- implement URL filtering in sitemap parsers / XML handlers
- add unit tests to verify URL filtering for text and XML sitemaps
2020-12-08 15:28:58 +01:00
Sebastian Nagel 9630f4c09c
Merge pull request #306 from sebastian-nagel/cc-271-urlnormalizer-basic-url-without-scheme
Normalize URL without a scheme, resolves #271
2020-11-13 12:15:04 +01:00