Julien Nioche
0dc699f3e9
Updated CHANGES pre-1.0 release
2019-03-19 22:04:31 +00:00
Sebastian Nagel
e8b598b2e8
[Sitemaps] Unit tests depend on system timezone, fixes #238 ( #239 )
...
- fix unit test to format data in time zone UTC
- improve documentation of `convertToZonedDateTime`:
add note that UTC is assumed if no time zone is contained in
date string
2019-03-19 15:00:04 +00:00
Sebastian Nagel
40531efe25
EffectiveTldFinder: upgrade public suffix list, implements #219 ( #235 )
...
- upgrade the public suffix list to the latest version retrieved right now from
https://publicsuffix.org/list/public_suffix_list.dat
resp.
802c469416/public_suffix_list.dat
2019-03-14 11:22:50 +00:00
Sebastian Nagel
0349fbe1f0
Update changelog for #144/#234
2019-03-14 10:29:17 +01:00
Sebastian Nagel
eb74336bd3
Update changelog for #225 and #226
2019-02-21 23:01:32 +01:00
Sebastian Nagel
916415d262
Merge branch 'master' into cc-231-etld-invalid-idns
2019-02-21 22:16:48 +01:00
Sebastian Nagel
40b1c44d68
Update changelog for #231
2019-02-21 22:15:40 +01:00
Sebastian Nagel
67db8bf1be
[Sitemaps] Trim Unicode whitespace around URLs, fixes #224
2019-02-20 16:27:16 +01:00
Sebastian Nagel
78e935f83b
Update changelog for #213
2019-02-03 13:49:11 +01:00
Sebastian Nagel
ab9e33a5f9
Update changelog for #220 and #221
2019-01-18 17:35:33 +01:00
Sebastian Nagel
862af9416f
Sitemap extension support
...
- add extension support to SiteMapTester
- list extension attributes in SiteMapURL.toString()
- update change log
2018-09-28 12:14:02 +02:00
Sebastian Nagel
9318de951f
Use the Java 8 date and time API (java.time.*) to parse dates in sitemaps ( #217 )
...
* Use the Java 8 date and time API (java.time.*) to parse dates in sitemaps
- use thread-safe DateTimeFormatter instead of ThreadLocal<DateFormat>
- simplify parsing of RSS publication dates
- remove obsolete regex pattern to catch dates with time zone
but without seconds (covered by DateTimeFormatter.ISO_OFFSET_DATE_TIME)
- extend unit tests
* Fix Javadoc error and warnings, update change log
* Remove obsolete dependency to jaxb-api
- import of javax.xml.bind.DatatypeConverter has been removed
by updating to Java 8 date and time API
2018-09-24 10:09:58 +01:00
Ken Krugler
a5c5091d64
Update CHANGES.txt
2018-07-31 17:14:27 -07:00
Julien Nioche
6213784e8b
Updated README for 0.10 release
2018-06-07 09:20:43 +01:00
Julien Nioche
0da1b8b8b5
Minor changes + applied formatting pre 0.10 release
2018-06-05 11:33:27 +01:00
Julien Nioche
8195140e21
Update CHANGES.txt
...
added #211
2018-06-05 11:23:00 +01:00
Julien Nioche
a8b474551a
Update CHANGES.txt
...
Add main to SimpleRobotRulesParser for testing (#193 )
2018-06-04 21:28:09 +01:00
Ken Krugler
d99c034dd0
Merge branch 'master' into issue-134
2018-05-14 11:20:17 -07:00
Aecio Santos
47c2cad8b8
Add getters/setters and update CHANGES.txt
2018-05-14 12:00:02 -04:00
Aecio Santos
aaa3113e55
Update CHANGES.txt
2018-05-14 11:51:43 -04:00
Aecio Santos
fd1e7fcffe
SimpleRobotRulesParser: Expose MAX_CRAWL_DELAY #194
...
- Makes MAX_CRAWL_DELAY configurable through class constructor
2018-05-13 20:10:55 -04:00
Aecio Santos
7bef14d386
Make RobotRules accessible #134
...
- Makes SimpleRobotRulesParser._rules property protected
and adds getters for SimpleRobotRulesParser._rules and
RobotRules's properties
- Changes SimpleRobotRulesParser return type from BaseRobotRules
to SimpleRobotRules to allow access to concrete class without
nasty type casts while still obeying super class contract
2018-05-13 20:07:54 -04:00
Julien Nioche
e25309d26c
Add JAX-B dependencies to POM ( #207 )
...
* Add JAX-B dependencies to POM, fixes #196
* mentioned in CHANGES.txt
2018-05-03 11:04:03 +01:00
Sebastian Nagel
7d3eccfa63
Add changelog entry and fix unit test
2018-04-25 14:06:33 +02:00
Sebastian Nagel
0ef7cf87fa
Improve sitemap parsing
...
- ignore query part of URL to determine sitemap location prefix
for URL validation, fixes #202
- resolve relative links in RSS feeds, fixes #203
- allow non-continuous content (containing XML entities or CDATA)
when parsing links in RSS feeds, fixes #204
- extract links from <guid> elements in RSS feeds, fixes #201
2018-04-25 09:36:27 +02:00
Sebastian Nagel
a9277acde2
Merge pull request #200 from sebastian-nagel/cc-198-fix-regressions
...
Improve MIME detection for sitemaps
2018-04-25 09:19:27 +02:00
Sebastian Nagel
a6b3178fc7
Simplify MIME detection:
...
- handle BOM and leading white space together
- remove parameter to detect patterns at a specific offset
2018-04-24 14:32:28 +02:00
Sebastian Nagel
907be2343f
Format fix: add braces, complete CHANGES.txt
2018-04-16 13:36:06 +02:00
Ken Krugler
12155888bc
Add reference to issue #199
2018-04-02 12:59:17 -07:00
Sebastian Nagel
49bf37c6d9
Update CHANGES.txt
2017-12-08 09:42:52 +01:00
Ken Krugler
2b58c5050c
merge with master
2017-11-05 14:55:30 -08:00
Ken Krugler
aeb0cb91a2
Update CHANGES.txt
2017-11-05 14:53:58 -08:00
Julien Nioche
af0a013776
Released 0.9
2017-10-31 09:42:57 +00:00
Julien Nioche
f3e37f37da
Updated change log prior to 0.9 release
2017-10-27 11:08:47 +01:00
Sebastian Nagel
2afdf5b04d
Sitemap SAX parser mangles sitemap URLs in sitemap index, fixes #169
...
- completely add sitemap URLs from sitemap index if URL contains
XML entities or CDATA
2017-08-12 17:28:08 +02:00
Ken Krugler
7e08c1da49
Update CHANGES.txt
2017-06-20 15:18:47 -07:00
Julien Nioche
694e74207b
release notes for 0.8
2017-06-09 10:15:44 +01:00
Julien Nioche
2c72ba8708
Update CHANGES.txt
...
added tika 1.15 to changes
2017-06-02 15:02:20 +01:00
Sebastian Nagel
02e62c12cb
Disable XML resolvers: update changelog
2017-05-04 22:36:40 +02:00
Julien Nioche
4ba1295c17
Update forbiddenapis to v2.0. Fixes #99
2017-03-20 15:58:55 +00:00
Sebastian Nagel
772f02fcb0
Fix parsing of gzipped text sitemaps, fixes #143
...
- detect gzip embedded media type to decide
whether to parse as text or XML
2017-03-20 16:24:46 +01:00
Sebastian Nagel
f7c7cab7a8
Merge branch 'matt-deboer-master'
...
- provide SAX parser optionally to DOM-based parser
- SiteMapTester: trigger usage of SAX parser by property sitemap.useSax
2017-02-27 23:09:45 +01:00
Sebastian Nagel
61a500ad21
Use constants from StandardCharsets where applicable, fixes #141
2017-02-02 14:59:14 +01:00
Sebastian Nagel
49b3097083
Increase size limit of sitemaps (10MB -> 50MB), fixes #132
2017-02-02 12:00:47 +01:00
Julien Nioche
eefeda558c
Update CHANGES.txt
...
#137
2017-01-13 17:31:10 +00:00
Sebastian Nagel
cb38a5fc8f
BasicURLNormalizer: NPE for URLs without authority
...
- check whether URL.getAuthority() returns null
- recompose URLs without authority with empty authority/host
2017-01-11 17:05:53 +01:00
Sebastian Nagel
e39aa60373
BasicURLNormalizer to remove empty port
2016-12-09 14:54:00 +01:00
Ken Krugler
5783046f4e
Issue #96
...
Remove fetcher support
2016-11-27 09:18:21 -08:00
Julien Nioche
1a6c1b0dce
Released 0.7
2016-11-24 10:07:27 +00:00
Julien Nioche
9ea4f1b514
added ref to #126
...
and changed presentation of issue number
2016-09-30 12:33:06 +01:00