1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-22 19:06:03 +02:00
Commit Graph

389 Commits

Author SHA1 Message Date
Aecio Santos f2bf9300e6 Upgrades httpclient to version 4.5.1 (fixes #84)
and do not ignore test failures during maven build
2015-10-09 14:08:39 -04:00
Julien Nioche b7ccc8d1f1 Fixed test for domains #103 2015-10-07 10:21:28 +01:00
Julien Nioche 98316a51fc issue #100 in CHANGES.txt 2015-10-06 18:48:58 +01:00
Julien Nioche a7728c6733 Merge pull request #100 from DigitalPebble/tld_names
updated tld names from publicsuffix.org
2015-10-06 18:44:33 +01:00
Julien Nioche 9e93037e79 updated tld names from publicsuffix.org 2015-10-05 13:38:10 +01:00
Ken Krugler 9e9f5df884 Fixed up CHANGES.txt file 2015-09-15 07:57:13 -07:00
Julien Nioche f0d71b4729 mentioned issue 89 in CHANGES 2015-09-15 11:38:40 +01:00
Lewis John McGibbney f2e41af53c Trivial commit to update CHANGES.txt for recent commits. 2015-09-14 22:40:04 -07:00
Ken Krugler 2c687d1bba Roll in fix for issue #87 w/RSS 1.0 site maps 2015-09-11 15:16:12 -07:00
Ken Krugler d08f396576 Tweaked Javadoc update from Michael Roeder 2015-09-11 11:49:20 -07:00
Ken Krugler ba7f22c811 Merge branch 'MichaelRoeder-additionalJavadoc' 2015-09-11 11:23:07 -07:00
MichaelRoeder e8f38fda03 Added a javadoc comment to the SimpleRobotRulesParser class explaining its behavior. 2015-09-10 13:03:10 +02:00
Ken Krugler 16e13bedc4 Improve Javadoc on robot name matching
And add a .gitignore
2015-09-08 16:12:31 -07:00
Lewis John McGibbney 24d24b3de1 Merge pull request #94 from lewismc/jdk1.7
Upgrade to Jdk1.7
2015-09-07 20:22:59 -04:00
Lewis John McGibbney d7ed6a742c Upgrade to Jdk1.7 - remove license header at pom.xml and improve logging implementations. 2015-09-07 15:20:00 -04:00
Lewis John McGibbney c385883ec3 Merge branch 'master' into jdk1.7 2015-09-07 14:17:29 -04:00
Avi Hayun 478a7d7240 Merge pull request #82 from crawler-commons/validSitemaps
[Sitemaps] Upgrade Valid / Legal / Strict SitemapUrls
2015-09-07 21:11:58 +03:00
Lewis John McGibbney ba5906ec40 Upgrade to JDK 1.7 compiler version and introduce Maven forbidden API's plugin 2015-09-06 13:55:26 -04:00
Lewis John McGibbney 827b073d12 Merge branch 'master' into validSitemaps 2015-08-26 09:50:26 -07:00
Julien Nioche f155148216 Upgraded Tika 1.10 #89 2015-08-20 15:35:38 +01:00
matt-deboer d0e1f1f124 Added docker-rest script (helpful on OSX) to re-create boot2ocker vm and update env. vars in ~/.profile 2015-07-23 14:41:21 -07:00
matt-deboer d203f0d4ac Reworked sitemap parser to use SAX for optional parsing of partial docs.
Traded Stack for LinkedList for performance improvement.

Fix to getParentElement();
Added test for case referenced by issues #79 and #75.
2015-06-27 22:33:15 -07:00
Lewis John McGibbney cd06d834a6 Update README.md 2015-06-15 12:40:53 -07:00
Julien Nioche 0f24082dc0 Applied formatting with mvn java-formatter:format 2015-06-11 10:47:19 +01:00
Julien Nioche feb40af519 Applied formatting with mvn java-formatter:format 2015-06-11 10:45:06 +01:00
Julien Nioche 37c13c8465 Update README.md
added link to javadoc
2015-06-11 10:39:45 +01:00
Julien Nioche b77fa0052a Update README.md
Announcing 0.6 release
2015-06-11 10:37:52 +01:00
Chaiavi 5cf62ab7d5 Fix for Issue 60
SitemapUrls can be not valid when they are referenced in a sitemap which
it's
directory is on a completely different path than the referenced
SitemapUrl.

All as indicated here:
http://www.sitemaps.org/protocol.html#location

In order to clarify the validity aspect we need to upgrade the following
1. Add a little more explanations as javadocs and as logs
2. Rename "Legal" (I think only one occurrence) to "valid" (in the
parser)
3. Add to the Sitemap class a new method to get all *valid* SitemapUrls
4. When dropping a URL due to invalidity a log should be shown, a URL
shouldn't
be dropped quietly.
2015-06-08 23:41:56 +03:00
Julien Nioche 504c207488 Added Julien's public key to KEYS 2015-06-04 10:50:14 +01:00
Lewis John McGibbney 9d45376336 Add KEYS file to CC 2015-06-01 21:55:49 -07:00
Julien Nioche 22206f3a43 [maven-release-plugin] prepare for next development iteration 2015-05-27 16:38:05 +01:00
Julien Nioche 39d076a13b [maven-release-plugin] prepare release crawler-commons-0.6 2015-05-27 16:38:01 +01:00
Julien Nioche 2394b6713a Removed tagBase from maven-release-plugin configuration 2015-05-27 16:36:05 +01:00
Julien Nioche ee4a936066 Revert "[maven-release-plugin] prepare release crawler-commons-0.6"
This reverts commit 3b09a9ba52.
2015-05-27 16:16:54 +01:00
Julien Nioche 3b09a9ba52 [maven-release-plugin] prepare release crawler-commons-0.6 2015-05-27 16:05:02 +01:00
Julien Nioche a41ab43c41 README Fixed URL for changes file release 0.5
was pointing to the 'live' file
2015-05-27 12:18:40 +01:00
Julien Nioche e8ec75e019 Reverted failed release + changed groupId 2015-05-27 12:16:18 +01:00
Julien Nioche d115f158b2 [maven-release-plugin] prepare for next development iteration 2015-05-26 10:58:35 +01:00
Julien Nioche 8328e554d4 [maven-release-plugin] prepare release crawler-commons-0.6 2015-05-26 10:58:31 +01:00
Julien Nioche 20861baf47 Issue 75: [Sitemaps] more robust parsing of XML elements (jnioche, kkrugler) 2015-05-22 11:08:21 +01:00
Julien Nioche 40731c3304 applied formatting with mvn java-formatter:format 2015-05-15 09:03:24 +01:00
Julien Nioche 8de545ccdc Merge pull request #78 from lewismc/CC-77
simplify pom file #77
2015-05-15 08:55:45 +01:00
Lewis John McGibbney e8065d5372 simplify pom file #77 2015-05-14 12:05:37 -07:00
Julien Nioche 47e30b5c22 Merge pull request #76 from crawler-commons/formatter
maven-java-formatter-plugin
2015-05-06 10:10:56 +01:00
Julien Nioche 63a837d5d7 updated CHANGES.txt for #76 2015-05-06 10:10:25 +01:00
Julien Nioche d22a9d0617 removed properties file + 1.6 compliant formatting 2015-05-06 09:33:31 +01:00
Julien Nioche 0a5d9d338a maven-java-formatter-plugin 2015-04-30 13:52:44 +01:00
Ken Krugler e42c268e03 Add news about project moving to GitHub 2015-04-22 07:59:43 -07:00
Ken Krugler ee23e1fb0d Update CHANGES.txt 2015-04-22 07:31:20 -07:00
Ken Krugler 798dc59839 Update CHANGES.txt 2015-04-22 07:30:57 -07:00