From e5563c3049e8bcb7215adbd0f2edaf8a19f73dae Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Mon, 12 Jun 2023 22:29:54 +0200 Subject: [PATCH] [BasicNormalizer] Query parameters normalization in BasicURLNormalizer, closes #308 - add unit test to prove that an empty query is removed --- CHANGES.txt | 1 + src/test/resources/normalizer/weirdToNormalizedUrls.csv | 3 +++ 2 files changed, 4 insertions(+) diff --git a/CHANGES.txt b/CHANGES.txt index 2fb2a09..0ca41cd 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,6 +1,7 @@ Crawler-Commons Change Log Current Development 1.4-SNAPSHOT (yyyy-mm-dd) +- [BasicNormalizer] Query parameters normalization in BasicURLNormalizer (aecio, sebastian-nagel) #308 - [Robots.txt] Handle allow/disallow directives containing unescaped Unicode characters (sebastian-nagel, Richard Zowalla, aecio) #389 - Improve readability of robots.txt unit tests (sebastian-nagel, Richard Zowalla) #383 - Upgrade project to use Java 11 (Avi Hayun, Richard Zowalla, aecio, sebastian-nagel) #320 diff --git a/src/test/resources/normalizer/weirdToNormalizedUrls.csv b/src/test/resources/normalizer/weirdToNormalizedUrls.csv index ad38772..84de942 100644 --- a/src/test/resources/normalizer/weirdToNormalizedUrls.csv +++ b/src/test/resources/normalizer/weirdToNormalizedUrls.csv @@ -142,6 +142,9 @@ http:///////, http:/ http://example.com?,http://example.com/ http://example.com?a=1,http://example.com/?a=1 +# empty query #308 +http://example.com/?,http://example.com/ + # normalizing percent escapes #263 https://www.last.fm/music/Prefuse+73/_/90%+of+My+Mind+Is+With+You,https://www.last.fm/music/Prefuse+73/_/90%25+of+My+Mind+Is+With+You