1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-29 12:46:04 +02:00
crawler-commons/src/test
Sebastian Nagel d98a3f14cf Allow for legacy URIs when checking sitemap namespaces (#211)
* Allow for legacy URIs when checking sitemap namespaces
- e.g., allow legacy namespace URI but ignore URLs
  from image and video sitemap extensions
- resolve relative namespace URIs
- add namespace URIs of sitemap extensions (news, images, videos)

* Address kkrugler's review comments:
- document addition of sitemap namespace required by sitemap
  protocol specification when calling setStrictNamespace(true)
- remove early return on <rss> root element
2018-06-05 11:20:26 +01:00
..
java/crawlercommons Allow for legacy URIs when checking sitemap namespaces (#211) 2018-06-05 11:20:26 +01:00
resources Improve sitemap parsing 2018-04-25 09:36:27 +02:00