Julien Nioche
|
7ef3599487
|
[maven-release-plugin] prepare release crawler-commons-0.7
|
2016-11-21 14:49:43 +00:00 |
|
Lewis John McGibbney
|
99b85b4f42
|
Add license and badge to codebase
|
2016-10-25 18:01:17 -07:00 |
|
Julien Nioche
|
9ea4f1b514
|
added ref to #126
and changed presentation of issue number
|
2016-09-30 12:33:06 +01:00 |
|
Julien Nioche
|
d0c1221846
|
Merge pull request #126 from lewismc/ISSUE-125
Upgrade to JDK 1.8
|
2016-09-30 12:28:08 +01:00 |
|
Lewis John McGibbney
|
18bbae908c
|
Upgrade to JDK 1.8
|
2016-09-29 21:39:24 -07:00 |
|
Lewis John McGibbney
|
fc3378cb95
|
Merge branch 'master' into ISSUE-125
|
2016-09-29 21:36:05 -07:00 |
|
Julien Nioche
|
f4b76c70c6
|
Merge pull request #128 from echoboxapp/further-sitemapparser-method-visibility-changes
Further changes to SiteMapParser and AbstractSiteMap method visibilit…
|
2016-09-29 11:11:02 +01:00 |
|
Michael Lavelle
|
58608c485d
|
Further changes to SiteMapParser and AbstractSiteMap method visibility. Previous pull request to change certain methods to protected did not cover all required methods. The changes in this commit allow for the addition of GoogleNews site maps
|
2016-09-29 09:07:45 +01:00 |
|
Lewis John McGibbney
|
a9eb31e9db
|
Update TravisCi config to accomodate JDK8
|
2016-09-26 15:38:16 -07:00 |
|
Lewis John McGibbney
|
8814bed160
|
Upgrade to JDK 1.8
|
2016-09-26 15:20:39 -07:00 |
|
Julien Nioche
|
36a4bd420e
|
Updated CHANGES with 124
|
2016-09-21 14:59:34 +01:00 |
|
Julien Nioche
|
145ff5ceaa
|
Merge pull request #124 from echoboxapp/site-map-parser-protected-methods
Modifying parsing methods of SiteMapParser so they are protected rath…
|
2016-09-21 14:49:27 +01:00 |
|
Julien Nioche
|
4625a358f2
|
Update CHANGES.txt
added #117 and #113
|
2016-09-20 10:30:31 +01:00 |
|
Julien Nioche
|
3d6d76d82d
|
Merge pull request #120 from crawler-commons/117
Faster parsing of dates. Fixes #117
|
2016-09-20 10:28:15 +01:00 |
|
Michael Lavelle
|
b26f7fd6f9
|
Modifying parsing methods of SiteMapParser so they are protected rather than private
|
2016-09-19 10:12:51 +01:00 |
|
Julien Nioche
|
4eec816179
|
Upgraded Tika core to 1.13, fixes #122
|
2016-09-19 09:23:43 +01:00 |
|
Julien Nioche
|
5f997c37e4
|
Faster parsing of dates. Fixes #117
|
2016-09-12 15:41:23 +01:00 |
|
Ken Krugler
|
a3b2a587fa
|
Merge pull request #119 from jmveramaiquez/hp-serialization-adjustement
Hp serialization adjustement
|
2016-07-28 15:16:43 -07:00 |
|
I
|
c24a297836
|
RobotRule inner class, changed from protected to public static, for easy serialization for high performance serializers like protoStuff or google protocol buffers.
|
2016-07-28 18:19:05 +02:00 |
|
I
|
6882ff4103
|
Cleaned unused import.
|
2016-07-28 18:16:41 +02:00 |
|
I
|
c367bc2424
|
Added Intellij project files.
|
2016-07-28 18:16:07 +02:00 |
|
Julien Nioche
|
81aefc118e
|
Improved sitemap tests : check that the URLs returned correspond to the input
|
2016-07-06 14:23:29 +01:00 |
|
Julien Nioche
|
96bc1bbf6b
|
Added test class and resource for #29; See #116
|
2016-07-06 10:16:07 +01:00 |
|
Julien Nioche
|
9f478e3e0f
|
Update .travis.yml
Changes Travis email notifications to mailing list
|
2016-06-30 11:56:36 +01:00 |
|
Julien Nioche
|
0d79acd0f2
|
Update README.md
Added badge for Travis to README
|
2016-06-30 11:54:55 +01:00 |
|
Julien Nioche
|
805734f197
|
Added Travis file; fixes #109
|
2016-06-30 11:49:02 +01:00 |
|
Julien Nioche
|
0775bb216e
|
Fix license headers + applied formatting. Fixes #108
|
2016-06-30 11:45:08 +01:00 |
|
Julien Nioche
|
be52b770ff
|
Rename package crawlercommons.url Fixes #107
|
2016-06-30 11:11:49 +01:00 |
|
Ken Krugler
|
3ab00d736d
|
Merge pull request #115 from aecio/fetcher-bug
Fixes bug introduced in pull request #98
|
2016-05-06 15:35:23 -07:00 |
|
Aecio Santos
|
22ad611aef
|
Fixes bug introduced in pull request #98
and adds ability to configure a new timeout introduced in httpclient
4.5.1
|
2016-05-04 19:50:33 -04:00 |
|
Ken Krugler
|
51ce12bc78
|
Move news to bottom, add missing link to 0.6 docs
|
2016-01-09 10:52:36 -08:00 |
|
Ken Krugler
|
b5704684ff
|
Clarify which method is preferred
Generally better to call parseSiteMap w/o passing an explicit
contentType, as web servers lie all the time - so let Tika figure it
out.
|
2015-12-30 22:14:21 -08:00 |
|
Ken Krugler
|
31a6c80ea7
|
Fix sitemap extraction from robots.txt
|
2015-12-30 22:03:49 -08:00 |
|
Julien Nioche
|
a809f7abac
|
Added organization and inception year to pom; changed details for jnioche
|
2015-12-03 11:33:55 +00:00 |
|
Julien Nioche
|
f3f34844d4
|
Deprecate fetcher classes #97
|
2015-12-02 10:30:54 +00:00 |
|
Julien Nioche
|
c1b3f4b086
|
Added URLFilter interface + BasicURLNormalizer borrowed from Nutch #106
|
2015-11-13 10:58:48 +00:00 |
|
Ken Krugler
|
4c43c48ef7
|
Merged conflict with CHANGES.txt
|
2015-10-20 07:50:58 -07:00 |
|
Ken Krugler
|
cdb51a5c8c
|
Merge branch 'aecio-master'
|
2015-10-20 07:49:19 -07:00 |
|
Ken Krugler
|
940cbfd0e8
|
Merged with aecio
|
2015-10-20 07:48:51 -07:00 |
|
Aecio Santos
|
f2bf9300e6
|
Upgrades httpclient to version 4.5.1 (fixes #84)
and do not ignore test failures during maven build
|
2015-10-09 14:08:39 -04:00 |
|
Julien Nioche
|
b7ccc8d1f1
|
Fixed test for domains #103
|
2015-10-07 10:21:28 +01:00 |
|
Julien Nioche
|
98316a51fc
|
issue #100 in CHANGES.txt
|
2015-10-06 18:48:58 +01:00 |
|
Julien Nioche
|
a7728c6733
|
Merge pull request #100 from DigitalPebble/tld_names
updated tld names from publicsuffix.org
|
2015-10-06 18:44:33 +01:00 |
|
Julien Nioche
|
9e93037e79
|
updated tld names from publicsuffix.org
|
2015-10-05 13:38:10 +01:00 |
|
Ken Krugler
|
9e9f5df884
|
Fixed up CHANGES.txt file
|
2015-09-15 07:57:13 -07:00 |
|
Julien Nioche
|
f0d71b4729
|
mentioned issue 89 in CHANGES
|
2015-09-15 11:38:40 +01:00 |
|
Lewis John McGibbney
|
f2e41af53c
|
Trivial commit to update CHANGES.txt for recent commits.
|
2015-09-14 22:40:04 -07:00 |
|
Ken Krugler
|
2c687d1bba
|
Roll in fix for issue #87 w/RSS 1.0 site maps
|
2015-09-11 15:16:12 -07:00 |
|
Ken Krugler
|
d08f396576
|
Tweaked Javadoc update from Michael Roeder
|
2015-09-11 11:49:20 -07:00 |
|
Ken Krugler
|
ba7f22c811
|
Merge branch 'MichaelRoeder-additionalJavadoc'
|
2015-09-11 11:23:07 -07:00 |
|