1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-09 23:56:04 +02:00
Commit Graph

226 Commits

Author SHA1 Message Date
Julien Nioche 7ef3599487 [maven-release-plugin] prepare release crawler-commons-0.7 2016-11-21 14:49:43 +00:00
Lewis John McGibbney 99b85b4f42 Add license and badge to codebase 2016-10-25 18:01:17 -07:00
Julien Nioche 9ea4f1b514 added ref to #126
and changed presentation of issue number
2016-09-30 12:33:06 +01:00
Julien Nioche d0c1221846 Merge pull request #126 from lewismc/ISSUE-125
Upgrade to JDK 1.8
2016-09-30 12:28:08 +01:00
Lewis John McGibbney 18bbae908c Upgrade to JDK 1.8 2016-09-29 21:39:24 -07:00
Lewis John McGibbney fc3378cb95 Merge branch 'master' into ISSUE-125 2016-09-29 21:36:05 -07:00
Julien Nioche f4b76c70c6 Merge pull request #128 from echoboxapp/further-sitemapparser-method-visibility-changes
Further changes to SiteMapParser and AbstractSiteMap method visibilit…
2016-09-29 11:11:02 +01:00
Michael Lavelle 58608c485d Further changes to SiteMapParser and AbstractSiteMap method visibility. Previous pull request to change certain methods to protected did not cover all required methods. The changes in this commit allow for the addition of GoogleNews site maps 2016-09-29 09:07:45 +01:00
Lewis John McGibbney a9eb31e9db Update TravisCi config to accomodate JDK8 2016-09-26 15:38:16 -07:00
Lewis John McGibbney 8814bed160 Upgrade to JDK 1.8 2016-09-26 15:20:39 -07:00
Julien Nioche 36a4bd420e Updated CHANGES with 124 2016-09-21 14:59:34 +01:00
Julien Nioche 145ff5ceaa Merge pull request #124 from echoboxapp/site-map-parser-protected-methods
Modifying parsing methods of SiteMapParser so they are protected rath…
2016-09-21 14:49:27 +01:00
Julien Nioche 4625a358f2 Update CHANGES.txt
added #117 and #113
2016-09-20 10:30:31 +01:00
Julien Nioche 3d6d76d82d Merge pull request #120 from crawler-commons/117
Faster parsing of dates. Fixes #117
2016-09-20 10:28:15 +01:00
Michael Lavelle b26f7fd6f9 Modifying parsing methods of SiteMapParser so they are protected rather than private 2016-09-19 10:12:51 +01:00
Julien Nioche 4eec816179 Upgraded Tika core to 1.13, fixes #122 2016-09-19 09:23:43 +01:00
Julien Nioche 5f997c37e4 Faster parsing of dates. Fixes #117 2016-09-12 15:41:23 +01:00
Ken Krugler a3b2a587fa Merge pull request #119 from jmveramaiquez/hp-serialization-adjustement
Hp serialization adjustement
2016-07-28 15:16:43 -07:00
I c24a297836 RobotRule inner class, changed from protected to public static, for easy serialization for high performance serializers like protoStuff or google protocol buffers. 2016-07-28 18:19:05 +02:00
I 6882ff4103 Cleaned unused import. 2016-07-28 18:16:41 +02:00
I c367bc2424 Added Intellij project files. 2016-07-28 18:16:07 +02:00
Julien Nioche 81aefc118e Improved sitemap tests : check that the URLs returned correspond to the input 2016-07-06 14:23:29 +01:00
Julien Nioche 96bc1bbf6b Added test class and resource for #29; See #116 2016-07-06 10:16:07 +01:00
Julien Nioche 9f478e3e0f Update .travis.yml
Changes Travis email notifications to mailing list
2016-06-30 11:56:36 +01:00
Julien Nioche 0d79acd0f2 Update README.md
Added badge for Travis to README
2016-06-30 11:54:55 +01:00
Julien Nioche 805734f197 Added Travis file; fixes #109 2016-06-30 11:49:02 +01:00
Julien Nioche 0775bb216e Fix license headers + applied formatting. Fixes #108 2016-06-30 11:45:08 +01:00
Julien Nioche be52b770ff Rename package crawlercommons.url Fixes #107 2016-06-30 11:11:49 +01:00
Ken Krugler 3ab00d736d Merge pull request #115 from aecio/fetcher-bug
Fixes bug introduced in pull request #98
2016-05-06 15:35:23 -07:00
Aecio Santos 22ad611aef Fixes bug introduced in pull request #98
and adds ability to configure a new timeout introduced in httpclient
4.5.1
2016-05-04 19:50:33 -04:00
Ken Krugler 51ce12bc78 Move news to bottom, add missing link to 0.6 docs 2016-01-09 10:52:36 -08:00
Ken Krugler b5704684ff Clarify which method is preferred
Generally better to call parseSiteMap w/o passing an explicit
contentType, as web servers lie all the time - so let Tika figure it
out.
2015-12-30 22:14:21 -08:00
Ken Krugler 31a6c80ea7 Fix sitemap extraction from robots.txt 2015-12-30 22:03:49 -08:00
Julien Nioche a809f7abac Added organization and inception year to pom; changed details for jnioche 2015-12-03 11:33:55 +00:00
Julien Nioche f3f34844d4 Deprecate fetcher classes #97 2015-12-02 10:30:54 +00:00
Julien Nioche c1b3f4b086 Added URLFilter interface + BasicURLNormalizer borrowed from Nutch #106 2015-11-13 10:58:48 +00:00
Ken Krugler 4c43c48ef7 Merged conflict with CHANGES.txt 2015-10-20 07:50:58 -07:00
Ken Krugler cdb51a5c8c Merge branch 'aecio-master' 2015-10-20 07:49:19 -07:00
Ken Krugler 940cbfd0e8 Merged with aecio 2015-10-20 07:48:51 -07:00
Aecio Santos f2bf9300e6 Upgrades httpclient to version 4.5.1 (fixes #84)
and do not ignore test failures during maven build
2015-10-09 14:08:39 -04:00
Julien Nioche b7ccc8d1f1 Fixed test for domains #103 2015-10-07 10:21:28 +01:00
Julien Nioche 98316a51fc issue #100 in CHANGES.txt 2015-10-06 18:48:58 +01:00
Julien Nioche a7728c6733 Merge pull request #100 from DigitalPebble/tld_names
updated tld names from publicsuffix.org
2015-10-06 18:44:33 +01:00
Julien Nioche 9e93037e79 updated tld names from publicsuffix.org 2015-10-05 13:38:10 +01:00
Ken Krugler 9e9f5df884 Fixed up CHANGES.txt file 2015-09-15 07:57:13 -07:00
Julien Nioche f0d71b4729 mentioned issue 89 in CHANGES 2015-09-15 11:38:40 +01:00
Lewis John McGibbney f2e41af53c Trivial commit to update CHANGES.txt for recent commits. 2015-09-14 22:40:04 -07:00
Ken Krugler 2c687d1bba Roll in fix for issue #87 w/RSS 1.0 site maps 2015-09-11 15:16:12 -07:00
Ken Krugler d08f396576 Tweaked Javadoc update from Michael Roeder 2015-09-11 11:49:20 -07:00
Ken Krugler ba7f22c811 Merge branch 'MichaelRoeder-additionalJavadoc' 2015-09-11 11:23:07 -07:00