1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-06-03 05:56:04 +02:00
crawler-commons/src/test/resources/sitemaps/sitemap.badns.xml
Julien Nioche 6adb771b72 Add namespace aware DOM/SAX parsing for XML Sitemaps (#176)
* Add namespace aware DOM/SAX parsing for XML Sitemaps.  RSS and Atom parsing is also namespace aware, but finding elements is left "relaxed" by only matching on the element "localName".

* Lenient namespacing in non strict mode + applied formatting

* Introduced separate field strictNamespace to sitemapparsers + added test to saxparser

* Fixes Javadoc

* Fixes the fix for the Javadoc

* Allow to set strictNamespace in SiteMapTester

- Fix strict namespace handling in SitemapParserSAX:
- pass strictNamespace from DelegatorHandler to delegates
- ignore text if inside an element of invalid namespace
- use SAX parser in unit test
- set exception and pass it to calling DelegatorHandler if namespace
  does not match
2017-10-17 10:47:17 +01:00

17 lines
338 B
XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>http://www.example.com/1</loc>
<changefreq>daily</changefreq>
</url>
<url>
<loc>
http://www.example.com/2
</loc>
<changefreq>
daily
</changefreq>
</url>
</urlset>