Ken Krugler
|
49d71ad295
|
Issue #130
Remove unused HttpClient code & resources
|
2016-11-28 08:45:09 -08:00 |
|
Julien Nioche
|
c39f917340
|
Merge pull request #129 from crawler-commons/Remove_HTTP_fetcher_support_96
Issue #96
|
2016-11-28 10:38:27 +00:00 |
|
Ken Krugler
|
5783046f4e
|
Issue #96
Remove fetcher support
|
2016-11-27 09:18:21 -08:00 |
|
Julien Nioche
|
1a6c1b0dce
|
Released 0.7
|
2016-11-24 10:07:27 +00:00 |
|
Julien Nioche
|
2df1374eb4
|
[maven-release-plugin] prepare for next development iteration
|
2016-11-21 14:49:47 +00:00 |
|
Julien Nioche
|
7ef3599487
|
[maven-release-plugin] prepare release crawler-commons-0.7
|
2016-11-21 14:49:43 +00:00 |
|
Lewis John McGibbney
|
99b85b4f42
|
Add license and badge to codebase
|
2016-10-25 18:01:17 -07:00 |
|
Julien Nioche
|
9ea4f1b514
|
added ref to #126
and changed presentation of issue number
|
2016-09-30 12:33:06 +01:00 |
|
Julien Nioche
|
d0c1221846
|
Merge pull request #126 from lewismc/ISSUE-125
Upgrade to JDK 1.8
|
2016-09-30 12:28:08 +01:00 |
|
Lewis John McGibbney
|
18bbae908c
|
Upgrade to JDK 1.8
|
2016-09-29 21:39:24 -07:00 |
|
Lewis John McGibbney
|
fc3378cb95
|
Merge branch 'master' into ISSUE-125
|
2016-09-29 21:36:05 -07:00 |
|
Julien Nioche
|
f4b76c70c6
|
Merge pull request #128 from echoboxapp/further-sitemapparser-method-visibility-changes
Further changes to SiteMapParser and AbstractSiteMap method visibilit…
|
2016-09-29 11:11:02 +01:00 |
|
Michael Lavelle
|
58608c485d
|
Further changes to SiteMapParser and AbstractSiteMap method visibility. Previous pull request to change certain methods to protected did not cover all required methods. The changes in this commit allow for the addition of GoogleNews site maps
|
2016-09-29 09:07:45 +01:00 |
|
Lewis John McGibbney
|
a9eb31e9db
|
Update TravisCi config to accomodate JDK8
|
2016-09-26 15:38:16 -07:00 |
|
Lewis John McGibbney
|
8814bed160
|
Upgrade to JDK 1.8
|
2016-09-26 15:20:39 -07:00 |
|
Julien Nioche
|
36a4bd420e
|
Updated CHANGES with 124
|
2016-09-21 14:59:34 +01:00 |
|
Julien Nioche
|
145ff5ceaa
|
Merge pull request #124 from echoboxapp/site-map-parser-protected-methods
Modifying parsing methods of SiteMapParser so they are protected rath…
|
2016-09-21 14:49:27 +01:00 |
|
Julien Nioche
|
4625a358f2
|
Update CHANGES.txt
added #117 and #113
|
2016-09-20 10:30:31 +01:00 |
|
Julien Nioche
|
3d6d76d82d
|
Merge pull request #120 from crawler-commons/117
Faster parsing of dates. Fixes #117
|
2016-09-20 10:28:15 +01:00 |
|
Michael Lavelle
|
b26f7fd6f9
|
Modifying parsing methods of SiteMapParser so they are protected rather than private
|
2016-09-19 10:12:51 +01:00 |
|
Julien Nioche
|
4eec816179
|
Upgraded Tika core to 1.13, fixes #122
|
2016-09-19 09:23:43 +01:00 |
|
Julien Nioche
|
5f997c37e4
|
Faster parsing of dates. Fixes #117
|
2016-09-12 15:41:23 +01:00 |
|
Ken Krugler
|
a3b2a587fa
|
Merge pull request #119 from jmveramaiquez/hp-serialization-adjustement
Hp serialization adjustement
|
2016-07-28 15:16:43 -07:00 |
|
I
|
c24a297836
|
RobotRule inner class, changed from protected to public static, for easy serialization for high performance serializers like protoStuff or google protocol buffers.
|
2016-07-28 18:19:05 +02:00 |
|
I
|
6882ff4103
|
Cleaned unused import.
|
2016-07-28 18:16:41 +02:00 |
|
I
|
c367bc2424
|
Added Intellij project files.
|
2016-07-28 18:16:07 +02:00 |
|
Julien Nioche
|
8467517745
|
Fixed XML handler for sitemaps - can handle urls split over multiple calls to characters()
|
2016-07-06 15:39:11 +01:00 |
|
Julien Nioche
|
d200346510
|
Fixed SiteMapParserTest after merge with master
|
2016-07-06 15:02:12 +01:00 |
|
Julien Nioche
|
81aefc118e
|
Improved sitemap tests : check that the URLs returned correspond to the input
|
2016-07-06 14:23:29 +01:00 |
|
Julien Nioche
|
b9d51283b7
|
Renamed handler classes and put in sub package. Fixed some issues @kkrugler
|
2016-07-06 11:43:23 +01:00 |
|
Julien Nioche
|
feb2ef7f69
|
Merge branch 'master' into matt-deboer-master
|
2016-07-06 10:18:04 +01:00 |
|
Julien Nioche
|
96bc1bbf6b
|
Added test class and resource for #29; See #116
|
2016-07-06 10:16:07 +01:00 |
|
Julien Nioche
|
eb05248f40
|
Merged with master, fixed a couple of issues and applied formatting
|
2016-07-01 14:19:35 +01:00 |
|
Julien Nioche
|
9f478e3e0f
|
Update .travis.yml
Changes Travis email notifications to mailing list
|
2016-06-30 11:56:36 +01:00 |
|
Julien Nioche
|
0d79acd0f2
|
Update README.md
Added badge for Travis to README
|
2016-06-30 11:54:55 +01:00 |
|
Julien Nioche
|
805734f197
|
Added Travis file; fixes #109
|
2016-06-30 11:49:02 +01:00 |
|
Julien Nioche
|
0775bb216e
|
Fix license headers + applied formatting. Fixes #108
|
2016-06-30 11:45:08 +01:00 |
|
Julien Nioche
|
be52b770ff
|
Rename package crawlercommons.url Fixes #107
|
2016-06-30 11:11:49 +01:00 |
|
Ken Krugler
|
3ab00d736d
|
Merge pull request #115 from aecio/fetcher-bug
Fixes bug introduced in pull request #98
|
2016-05-06 15:35:23 -07:00 |
|
Aecio Santos
|
22ad611aef
|
Fixes bug introduced in pull request #98
and adds ability to configure a new timeout introduced in httpclient
4.5.1
|
2016-05-04 19:50:33 -04:00 |
|
Ken Krugler
|
51ce12bc78
|
Move news to bottom, add missing link to 0.6 docs
|
2016-01-09 10:52:36 -08:00 |
|
Ken Krugler
|
b5704684ff
|
Clarify which method is preferred
Generally better to call parseSiteMap w/o passing an explicit
contentType, as web servers lie all the time - so let Tika figure it
out.
|
2015-12-30 22:14:21 -08:00 |
|
Ken Krugler
|
31a6c80ea7
|
Fix sitemap extraction from robots.txt
|
2015-12-30 22:03:49 -08:00 |
|
Julien Nioche
|
a809f7abac
|
Added organization and inception year to pom; changed details for jnioche
|
2015-12-03 11:33:55 +00:00 |
|
Julien Nioche
|
f3f34844d4
|
Deprecate fetcher classes #97
|
2015-12-02 10:30:54 +00:00 |
|
Julien Nioche
|
c1b3f4b086
|
Added URLFilter interface + BasicURLNormalizer borrowed from Nutch #106
|
2015-11-13 10:58:48 +00:00 |
|
Ken Krugler
|
4c43c48ef7
|
Merged conflict with CHANGES.txt
|
2015-10-20 07:50:58 -07:00 |
|
Ken Krugler
|
cdb51a5c8c
|
Merge branch 'aecio-master'
|
2015-10-20 07:49:19 -07:00 |
|
Ken Krugler
|
940cbfd0e8
|
Merged with aecio
|
2015-10-20 07:48:51 -07:00 |
|
Aecio Santos
|
f2bf9300e6
|
Upgrades httpclient to version 4.5.1 (fixes #84)
and do not ignore test failures during maven build
|
2015-10-09 14:08:39 -04:00 |
|