mirror of
https://github.com/crawler-commons/crawler-commons
synced 2024-06-03 05:56:04 +02:00
6adb771b72
* Add namespace aware DOM/SAX parsing for XML Sitemaps. RSS and Atom parsing is also namespace aware, but finding elements is left "relaxed" by only matching on the element "localName". * Lenient namespacing in non strict mode + applied formatting * Introduced separate field strictNamespace to sitemapparsers + added test to saxparser * Fixes Javadoc * Fixes the fix for the Javadoc * Allow to set strictNamespace in SiteMapTester - Fix strict namespace handling in SitemapParserSAX: - pass strictNamespace from DelegatorHandler to delegates - ignore text if inside an element of invalid namespace - use SAX parser in unit test - set exception and pass it to calling DelegatorHandler if namespace does not match
17 lines
338 B
XML
17 lines
338 B
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<urlset xmlns="http://www.google.com/schemas/sitemap/0.9"
|
|
xmlns:xhtml="http://www.w3.org/1999/xhtml">
|
|
<url>
|
|
<loc>http://www.example.com/1</loc>
|
|
<changefreq>daily</changefreq>
|
|
</url>
|
|
<url>
|
|
<loc>
|
|
http://www.example.com/2
|
|
</loc>
|
|
<changefreq>
|
|
daily
|
|
</changefreq>
|
|
</url>
|
|
</urlset>
|