Julien Nioche
9ea4f1b514
added ref to #126
...
and changed presentation of issue number
2016-09-30 12:33:06 +01:00
Julien Nioche
36a4bd420e
Updated CHANGES with 124
2016-09-21 14:59:34 +01:00
Julien Nioche
4625a358f2
Update CHANGES.txt
...
added #117 and #113
2016-09-20 10:30:31 +01:00
Julien Nioche
0775bb216e
Fix license headers + applied formatting. Fixes #108
2016-06-30 11:45:08 +01:00
Julien Nioche
be52b770ff
Rename package crawlercommons.url Fixes #107
2016-06-30 11:11:49 +01:00
Ken Krugler
31a6c80ea7
Fix sitemap extraction from robots.txt
2015-12-30 22:03:49 -08:00
Julien Nioche
f3f34844d4
Deprecate fetcher classes #97
2015-12-02 10:30:54 +00:00
Julien Nioche
c1b3f4b086
Added URLFilter interface + BasicURLNormalizer borrowed from Nutch #106
2015-11-13 10:58:48 +00:00
Ken Krugler
4c43c48ef7
Merged conflict with CHANGES.txt
2015-10-20 07:50:58 -07:00
Ken Krugler
940cbfd0e8
Merged with aecio
2015-10-20 07:48:51 -07:00
Aecio Santos
f2bf9300e6
Upgrades httpclient to version 4.5.1 ( fixes #84 )
...
and do not ignore test failures during maven build
2015-10-09 14:08:39 -04:00
Julien Nioche
98316a51fc
issue #100 in CHANGES.txt
2015-10-06 18:48:58 +01:00
Ken Krugler
9e9f5df884
Fixed up CHANGES.txt file
2015-09-15 07:57:13 -07:00
Julien Nioche
f0d71b4729
mentioned issue 89 in CHANGES
2015-09-15 11:38:40 +01:00
Lewis John McGibbney
f2e41af53c
Trivial commit to update CHANGES.txt for recent commits.
2015-09-14 22:40:04 -07:00
Julien Nioche
20861baf47
Issue 75: [Sitemaps] more robust parsing of XML elements (jnioche, kkrugler)
2015-05-22 11:08:21 +01:00
Julien Nioche
63a837d5d7
updated CHANGES.txt for #76
2015-05-06 10:10:25 +01:00
Ken Krugler
ee23e1fb0d
Update CHANGES.txt
2015-04-22 07:31:20 -07:00
Ken Krugler
798dc59839
Update CHANGES.txt
2015-04-22 07:30:57 -07:00
Ken Krugler
53375168ab
Update CHANGES.txt
2015-04-22 07:30:25 -07:00
Julien Nioche
1647b90f7d
Upgraded to Tika 1.8 fixes #71
2015-04-22 13:19:02 +01:00
kkrugler_lists@transpac.com
10c4dfbd99
Update with info about issues 67 & 68
2015-03-27 22:12:14 +00:00
kkrugler_lists@transpac.com
b2a92ce442
Add note about issue 59 being fixed
2015-01-26 13:21:29 +00:00
avraham2@gmail.com
c8ef5e1083
Issue 65: [Sitemaps] Make SiteMapTool simpler by removing the Recursive flag
...
Adding the CHANGES file
2015-01-25 09:19:06 +00:00
digitalpebble@googlemail.com
92408e37d4
Issue 64: Upgraded to Tika 1.7 (jnioche)
2015-01-22 14:43:34 +00:00
digitalpebble@googlemail.com
4f8614c85e
Issue 32:[Robots] Resolve relative URL for sitemaps
2015-01-22 10:54:14 +00:00
digitalpebble@googlemail.com
8a0034c1f1
Issue 62:[Sitemaps] Add new parseSiteMap method
2015-01-21 08:59:01 +00:00
avraham2@gmail.com
34195de153
Issue 57: [Sitemaps] SiteMap should contain a list of SitemapUrls instead of a table of them
2015-01-12 10:53:20 +00:00
avraham2@gmail.com
546b9ff60e
Issue51: Upgrade httpclient to the latest version
2015-01-12 10:30:30 +00:00
avraham2@gmail.com
823ea3221c
Issue 61: [Sitemaps] Sitemap Parser changes the processed flag unnecessarily
2014-11-25 12:06:53 +00:00
avraham2@gmail.com
bcc2c7fe26
Issue 56: [Sitemaps] SiteMap.setBaseUrl(...) causes the domain name to be lowered case which shouldn't happen
2014-11-21 12:01:55 +00:00
avraham2@gmail.com
c8261cbbc4
Issue55: fix setPriority
2014-10-26 11:48:57 +00:00
lewis.mcgibbney@gmail.com
fa18129bcf
Issue 50: Add Fetch Report to FetchedResult
2014-10-19 18:59:15 +00:00
avraham2@gmail.com
87331c6bd6
2014-10-10 10:49:28 +00:00
lewis.mcgibbney@gmail.com
7dfda7e46e
Update CHANGES ready for 0.5 releaae
2014-10-10 04:15:23 +00:00
digitalpebble@googlemail.com
fef6d41ef8
Issue 45:[Sitemaps] Upgrade code after release of Tika v1.6
2014-09-24 13:43:09 +00:00
digitalpebble@googlemail.com
64530bc52b
Issue 48:Upgraded to Tika 1.6
2014-09-10 12:37:17 +00:00
avraham2@gmail.com
983cce7c07
Issue 47: [Sitemaps] SiteMapParser Tika detection doesn't work well on some cases
...
new Tika().detect(URL) -- Will solve the mentioned problem.
BUT it will cause out library to fetch the sitemap twice.
A better solution should be sought.
Maybe use new Tika().detect(bytes, filename);
2014-08-19 19:08:27 +00:00
avraham2@gmail.com
19e2918aca
Change the Mime type parsing to use Tika's MediaType.
...
I want to Identify the mediaType:
MediaType mediaType = MediaType.parse(contentType);
And then to process as follows:
1. By recursing through the mediatype supertypes till we get to the root and compare to the XML media type (or others)
2. If not found we should check the Aliases (for example text/xml is an alias of application/xml which is the more accurate form)
3. If not found then it is a bad MediaType and the exception should be thrown.
2014-08-06 19:06:45 +00:00
lewis.mcgibbney@gmail.com
4793307adb
Issue 39: [Sitemaps] Add the Parser a conviniece method with only a URL argument
2014-07-07 14:27:49 +00:00
lewis.mcgibbney@gmail.com
01e4feef8b
Issue 42 [Sitemaps] Add more JUnit tests
2014-07-01 05:11:27 +00:00
kkrugler_lists@transpac.com
59344e878a
Issue 37: Upgrade slf4j to v1.7.7
2014-06-24 02:49:01 +00:00
lewis.mcgibbney@gmail.com
e1c264f1df
prepare CHANGES.txt for 0.4 release
2014-03-20 21:50:05 +00:00
lewis.mcgibbney@gmail.com
efaf0aec6c
update CHANGES.txt
2014-03-19 19:15:05 +00:00
lewis.mcgibbney@gmail.com
51b0593b75
Port all code changes to CHANGES.txt
2014-03-16 21:53:28 +00:00
digitalpebble@googlemail.com
9b6bf65b1a
cleanup of ANT build remnants [lib and lib-ext]
2013-10-21 15:31:14 +00:00
digitalpebble@googlemail.com
baed790af1
upgraded version of Tika + reverted to 0.3-SNAPSHOT
2013-10-11 10:40:00 +00:00
digitalpebble@googlemail.com
14919f77f0
marking version 0.3 in CHANGES
2013-10-03 09:12:38 +00:00
digitalpebble@googlemail.com
4ce4b358b6
issue 29 : more robust parsing when loc element is missing
2013-10-02 13:40:50 +00:00
digitalpebble@googlemail.com
d9e3cb4cbb
Issue 25:Robots.txt parser should not lowercase sitemap URLs
2013-09-06 12:33:02 +00:00