Julien Nioche
63a837d5d7
updated CHANGES.txt for #76
2015-05-06 10:10:25 +01:00
Julien Nioche
d22a9d0617
removed properties file + 1.6 compliant formatting
2015-05-06 09:33:31 +01:00
Julien Nioche
0a5d9d338a
maven-java-formatter-plugin
2015-04-30 13:52:44 +01:00
Ken Krugler
e42c268e03
Add news about project moving to GitHub
2015-04-22 07:59:43 -07:00
Ken Krugler
ee23e1fb0d
Update CHANGES.txt
2015-04-22 07:31:20 -07:00
Ken Krugler
798dc59839
Update CHANGES.txt
2015-04-22 07:30:57 -07:00
Ken Krugler
53375168ab
Update CHANGES.txt
2015-04-22 07:30:25 -07:00
Julien Nioche
1647b90f7d
Upgraded to Tika 1.8 fixes #71
2015-04-22 13:19:02 +01:00
Julien Nioche
a28a78c942
changed groupId to crawler-commons
2015-04-22 11:02:49 +01:00
Julien Nioche
2195fb7f7e
replaced references to code.google with github equivalents
2015-04-17 10:25:52 +01:00
Julien Nioche
76bc563b21
Merge pull request #72 from crawler-commons/mimetypeRegistry
...
Get mediaTypeRegistry with MediaTypeRegistry.getDefaultRegistry
2015-04-13 20:42:52 +01:00
Julien Nioche
dafbbdd2bf
SiteMapParser use UPPERCASE for static finals
2015-04-13 20:41:47 +01:00
Julien Nioche
152a2446c2
Get mediaTypeRegistry with MediaTypeRegistry.getDefaultRegistry; instanciate Tika from the start
2015-04-13 16:36:39 +01:00
Avi Hayun
e1c7955389
Updated links to point to GitHub
...
I have changed several links to point to GitHub instead of GoogleCode.
For now, I am pointing all releases of CHANGES.TXT to the latest at Master, as we don't have the code of the last releases, and actually the CHANGES.TXT file does contain all changes of all releases, so it seems like a good compromise for now...
2015-04-13 12:54:06 +03:00
Avi Hayun
84cc3bf2d0
First stab at recreating front page like the original
2015-04-13 12:46:57 +03:00
Ken Krugler
a7ee1bfa84
Delete README
2015-04-09 08:29:18 -07:00
kkrugler_lists@transpac.com
10c4dfbd99
Update with info about issues 67 & 68
2015-03-27 22:12:14 +00:00
kkrugler_lists@transpac.com
7f7f915b0b
Issue 68: Case-sensitive path matching
2015-03-27 22:09:32 +00:00
kkrugler_lists@transpac.com
6eb1459345
Issue 67: time in lastMod string not extracted during parse
2015-03-27 21:15:43 +00:00
kkrugler_lists@transpac.com
b2a92ce442
Add note about issue 59 being fixed
2015-01-26 13:21:29 +00:00
kkrugler_lists@transpac.com
aeafa263e5
Make SimpleRobotRules serializable (issue #59 )
2015-01-26 13:18:54 +00:00
avraham2@gmail.com
c8ef5e1083
Issue 65: [Sitemaps] Make SiteMapTool simpler by removing the Recursive flag
...
Adding the CHANGES file
2015-01-25 09:19:06 +00:00
avraham2@gmail.com
19bc879d91
Issue 65: [Sitemaps] Make SiteMapTool simpler by removing the Recursive flag
...
Fixed the NPE issue
Removed the recursive flag
Upgraded javadocs
2015-01-25 09:18:01 +00:00
digitalpebble@googlemail.com
92408e37d4
Issue 64: Upgraded to Tika 1.7 (jnioche)
2015-01-22 14:43:34 +00:00
digitalpebble@googlemail.com
4f8614c85e
Issue 32:[Robots] Resolve relative URL for sitemaps
2015-01-22 10:54:14 +00:00
digitalpebble@googlemail.com
8a0034c1f1
Issue 62:[Sitemaps] Add new parseSiteMap method
2015-01-21 08:59:01 +00:00
avraham2@gmail.com
34195de153
Issue 57: [Sitemaps] SiteMap should contain a list of SitemapUrls instead of a table of them
2015-01-12 10:53:20 +00:00
avraham2@gmail.com
546b9ff60e
Issue51: Upgrade httpclient to the latest version
2015-01-12 10:30:30 +00:00
avraham2@gmail.com
823ea3221c
Issue 61: [Sitemaps] Sitemap Parser changes the processed flag unnecessarily
2014-11-25 12:06:53 +00:00
avraham2@gmail.com
bcc2c7fe26
Issue 56: [Sitemaps] SiteMap.setBaseUrl(...) causes the domain name to be lowered case which shouldn't happen
2014-11-21 12:01:55 +00:00
avraham2@gmail.com
c8261cbbc4
Issue55: fix setPriority
2014-10-26 11:48:57 +00:00
lewis.mcgibbney@gmail.com
8dda18a77c
Issue 50: Add Fetch Report to FetchedResult
2014-10-20 14:07:10 +00:00
lewis.mcgibbney@gmail.com
fa18129bcf
Issue 50: Add Fetch Report to FetchedResult
2014-10-19 18:59:15 +00:00
avraham2@gmail.com
87331c6bd6
2014-10-10 10:49:28 +00:00
lewis.mcgibbney@gmail.com
21de0241ef
Update pom.xml for dist management
2014-10-10 04:59:16 +00:00
lewis.mcgibbney@gmail.com
0e0146faf1
[maven-release-plugin] prepare for next development iteration
2014-10-10 04:40:39 +00:00
lewis.mcgibbney@gmail.com
aea0015d12
[maven-release-plugin] prepare release crawler-commons-0.5
2014-10-10 04:40:34 +00:00
lewis.mcgibbney@gmail.com
429db36c35
Update release plugin version
2014-10-10 04:23:06 +00:00
lewis.mcgibbney@gmail.com
7dfda7e46e
Update CHANGES ready for 0.5 releaae
2014-10-10 04:15:23 +00:00
kkrugler_lists@transpac.com
6fe3770889
Fix for issue #53 - handle spaces in comma-separated list of agent names
2014-10-04 16:32:12 +00:00
digitalpebble@googlemail.com
fef6d41ef8
Issue 45:[Sitemaps] Upgrade code after release of Tika v1.6
2014-09-24 13:43:09 +00:00
digitalpebble@googlemail.com
64530bc52b
Issue 48:Upgraded to Tika 1.6
2014-09-10 12:37:17 +00:00
avraham2@gmail.com
5823288428
Removed commented out code I wrongfully put there in the past
2014-08-19 19:14:00 +00:00
avraham2@gmail.com
983cce7c07
Issue 47: [Sitemaps] SiteMapParser Tika detection doesn't work well on some cases
...
new Tika().detect(URL) -- Will solve the mentioned problem.
BUT it will cause out library to fetch the sitemap twice.
A better solution should be sought.
Maybe use new Tika().detect(bytes, filename);
2014-08-19 19:08:27 +00:00
avraham2@gmail.com
19e2918aca
Change the Mime type parsing to use Tika's MediaType.
...
I want to Identify the mediaType:
MediaType mediaType = MediaType.parse(contentType);
And then to process as follows:
1. By recursing through the mediatype supertypes till we get to the root and compare to the XML media type (or others)
2. If not found we should check the Aliases (for example text/xml is an alias of application/xml which is the more accurate form)
3. If not found then it is a bad MediaType and the exception should be thrown.
2014-08-06 19:06:45 +00:00
avraham2@gmail.com
dc98cbd57c
Added myself (Avi Hayun) as a developer
2014-07-14 18:44:31 +00:00
lewis.mcgibbney@gmail.com
4793307adb
Issue 39: [Sitemaps] Add the Parser a conviniece method with only a URL argument
2014-07-07 14:27:49 +00:00
lewis.mcgibbney@gmail.com
01e4feef8b
Issue 42 [Sitemaps] Add more JUnit tests
2014-07-01 05:11:27 +00:00
kkrugler_lists@transpac.com
59344e878a
Issue 37: Upgrade slf4j to v1.7.7
2014-06-24 02:49:01 +00:00
lewis.mcgibbney@gmail.com
94c3ed4068
Upgrade to JUnit v4 conventions in SiteMapParser
2014-05-29 21:02:16 +00:00