lewis.mcgibbney@gmail.com
429db36c35
Update release plugin version
2014-10-10 04:23:06 +00:00
lewis.mcgibbney@gmail.com
7dfda7e46e
Update CHANGES ready for 0.5 releaae
2014-10-10 04:15:23 +00:00
kkrugler_lists@transpac.com
6fe3770889
Fix for issue #53 - handle spaces in comma-separated list of agent names
2014-10-04 16:32:12 +00:00
digitalpebble@googlemail.com
fef6d41ef8
Issue 45:[Sitemaps] Upgrade code after release of Tika v1.6
2014-09-24 13:43:09 +00:00
digitalpebble@googlemail.com
64530bc52b
Issue 48:Upgraded to Tika 1.6
2014-09-10 12:37:17 +00:00
avraham2@gmail.com
5823288428
Removed commented out code I wrongfully put there in the past
2014-08-19 19:14:00 +00:00
avraham2@gmail.com
983cce7c07
Issue 47: [Sitemaps] SiteMapParser Tika detection doesn't work well on some cases
...
new Tika().detect(URL) -- Will solve the mentioned problem.
BUT it will cause out library to fetch the sitemap twice.
A better solution should be sought.
Maybe use new Tika().detect(bytes, filename);
2014-08-19 19:08:27 +00:00
avraham2@gmail.com
19e2918aca
Change the Mime type parsing to use Tika's MediaType.
...
I want to Identify the mediaType:
MediaType mediaType = MediaType.parse(contentType);
And then to process as follows:
1. By recursing through the mediatype supertypes till we get to the root and compare to the XML media type (or others)
2. If not found we should check the Aliases (for example text/xml is an alias of application/xml which is the more accurate form)
3. If not found then it is a bad MediaType and the exception should be thrown.
2014-08-06 19:06:45 +00:00
avraham2@gmail.com
dc98cbd57c
Added myself (Avi Hayun) as a developer
2014-07-14 18:44:31 +00:00
lewis.mcgibbney@gmail.com
4793307adb
Issue 39: [Sitemaps] Add the Parser a conviniece method with only a URL argument
2014-07-07 14:27:49 +00:00
lewis.mcgibbney@gmail.com
01e4feef8b
Issue 42 [Sitemaps] Add more JUnit tests
2014-07-01 05:11:27 +00:00
kkrugler_lists@transpac.com
59344e878a
Issue 37: Upgrade slf4j to v1.7.7
2014-06-24 02:49:01 +00:00
lewis.mcgibbney@gmail.com
94c3ed4068
Upgrade to JUnit v4 conventions in SiteMapParser
2014-05-29 21:02:16 +00:00
lewis.mcgibbney@gmail.com
cb71c5502a
Upgrade the Slf4j logging in SiteMaps
2014-05-29 20:28:37 +00:00
lewis.mcgibbney@gmail.com
25a317e5e6
[maven-release-plugin] prepare for next development iteration
2014-03-20 22:02:17 +00:00
lewis.mcgibbney@gmail.com
1131b36a3b
[maven-release-plugin] prepare release crawler-commons-0.4
2014-03-20 22:02:04 +00:00
lewis.mcgibbney@gmail.com
e1c264f1df
prepare CHANGES.txt for 0.4 release
2014-03-20 21:50:05 +00:00
lewis.mcgibbney@gmail.com
efaf0aec6c
update CHANGES.txt
2014-03-19 19:15:05 +00:00
kkrugler_lists@transpac.com
cc67a3d2c8
Merge patch for issue #13 from Lewis
2014-03-17 00:37:26 +00:00
lewis.mcgibbney@gmail.com
51b0593b75
Port all code changes to CHANGES.txt
2014-03-16 21:53:28 +00:00
kkrugler_lists@transpac.com
a6ac57e354
Issue 21: allow has higher precendence than disallow, if both rules are the same length
2014-03-14 00:02:38 +00:00
kkrugler_lists@transpac.com
c1f050d33f
Add missing file from previous commit
2014-03-14 00:01:44 +00:00
kkrugler_lists@transpac.com
ea67b56e42
Add tests for wildcards (via alparslanavci), and sorting rules
2014-03-13 23:50:17 +00:00
kkrugler_lists@transpac.com
af74ccf44d
Add support for wildcards (via alparslanavci), and sorting rules
2014-03-13 23:49:49 +00:00
kkrugler_lists@transpac.com
300d6ebdb7
Roll in patch from Lewis for issue #23 ( http://code.google.com/p/crawler-commons/issues/detail?id=23 )
2014-01-24 21:16:38 +00:00
kkrugler_lists@transpac.com
dc8f241782
Fix up tests to match latest data file
2014-01-24 21:05:46 +00:00
kkrugler_lists@transpac.com
aa4d410223
Make setProcessed public, was implicitly package private
2014-01-24 20:51:33 +00:00
kkrugler_lists@transpac.com
dbae7e20df
Updated comments w/link to actual data Mozilla data file
2014-01-24 20:44:51 +00:00
kkrugler_lists@transpac.com
16e46b0d50
Added a few more suffixes
2014-01-24 20:44:31 +00:00
kkrugler_lists@transpac.com
a98bb030af
Updated to latest from http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
2014-01-24 20:44:12 +00:00
digitalpebble@googlemail.com
9b6bf65b1a
cleanup of ANT build remnants [lib and lib-ext]
2013-10-21 15:31:14 +00:00
digitalpebble@googlemail.com
816832b10b
[maven-release-plugin] prepare for next development iteration
2013-10-11 15:21:59 +00:00
digitalpebble@googlemail.com
1389cf0066
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 15:21:52 +00:00
digitalpebble@googlemail.com
2e08419852
Fixed scm info in pom
2013-10-11 15:20:53 +00:00
digitalpebble@googlemail.com
ee88e20e4a
[maven-release-plugin] rollback the release of crawler-commons-0.3
2013-10-11 15:18:50 +00:00
digitalpebble@googlemail.com
45975212ad
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 15:13:19 +00:00
digitalpebble@googlemail.com
464d5c7956
[maven-release-plugin] rollback the release of crawler-commons-0.3
2013-10-11 12:48:50 +00:00
digitalpebble@googlemail.com
6ed2b2da50
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 12:40:15 +00:00
digitalpebble@googlemail.com
315a208b95
re-trying the release
2013-10-11 12:38:24 +00:00
digitalpebble@googlemail.com
92fb22c2a3
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 11:42:18 +00:00
digitalpebble@googlemail.com
097a927868
[maven-release-plugin] rollback the release of crawler-commons-0.3
2013-10-11 11:35:34 +00:00
digitalpebble@googlemail.com
704bf5ba8b
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 11:06:20 +00:00
digitalpebble@googlemail.com
add77028cc
[maven-release-plugin] rollback the release of crawler-commons-0.3
2013-10-11 10:59:37 +00:00
digitalpebble@googlemail.com
c7554efdcb
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 10:58:00 +00:00
digitalpebble@googlemail.com
644254769e
[maven-release-plugin] rollback the release of crawler-commons-0.3
2013-10-11 10:48:40 +00:00
digitalpebble@googlemail.com
dea86d57ea
[maven-release-plugin] prepare for next development iteration
2013-10-11 10:46:23 +00:00
digitalpebble@googlemail.com
68106fd316
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-11 10:46:11 +00:00
digitalpebble@googlemail.com
baed790af1
upgraded version of Tika + reverted to 0.3-SNAPSHOT
2013-10-11 10:40:00 +00:00
digitalpebble@googlemail.com
ecdf47221e
[maven-release-plugin] prepare for next development iteration
2013-10-03 09:31:50 +00:00
digitalpebble@googlemail.com
4e2b0bac6f
[maven-release-plugin] prepare release crawler-commons-0.3
2013-10-03 09:31:44 +00:00