1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-03 22:26:15 +02:00

Merge pull request #421 from sebastian-nagel/cc-308-url-normalizer-empty-query

[BasicNormalizer] Query parameters normalization in BasicURLNormalizer
This commit is contained in:
Sebastian Nagel 2023-06-13 13:56:52 +02:00 committed by GitHub
commit 7a95069f0e
Signed by: GitHub
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 4 additions and 0 deletions

View File

@ -1,6 +1,7 @@
Crawler-Commons Change Log
Current Development 1.4-SNAPSHOT (yyyy-mm-dd)
- [BasicNormalizer] Query parameters normalization in BasicURLNormalizer (aecio, sebastian-nagel) #308
- [Robots.txt] Handle allow/disallow directives containing unescaped Unicode characters (sebastian-nagel, Richard Zowalla, aecio) #389
- Improve readability of robots.txt unit tests (sebastian-nagel, Richard Zowalla) #383
- Upgrade project to use Java 11 (Avi Hayun, Richard Zowalla, aecio, sebastian-nagel) #320

View File

@ -142,6 +142,9 @@ http:///////, http:/
http://example.com?,http://example.com/
http://example.com?a=1,http://example.com/?a=1
# empty query #308
http://example.com/?,http://example.com/
# normalizing percent escapes #263
https://www.last.fm/music/Prefuse+73/_/90%+of+My+Mind+Is+With+You,https://www.last.fm/music/Prefuse+73/_/90%25+of+My+Mind+Is+With+You

Can't render this file because it has a wrong number of fields in line 3.