Lewis John McGibbney
d7ed6a742c
Upgrade to Jdk1.7 - remove license header at pom.xml and improve logging implementations.
2015-09-07 15:20:00 -04:00
Lewis John McGibbney
c385883ec3
Merge branch 'master' into jdk1.7
2015-09-07 14:17:29 -04:00
Avi Hayun
478a7d7240
Merge pull request #82 from crawler-commons/validSitemaps
...
[Sitemaps] Upgrade Valid / Legal / Strict SitemapUrls
2015-09-07 21:11:58 +03:00
Lewis John McGibbney
ba5906ec40
Upgrade to JDK 1.7 compiler version and introduce Maven forbidden API's plugin
2015-09-06 13:55:26 -04:00
Lewis John McGibbney
827b073d12
Merge branch 'master' into validSitemaps
2015-08-26 09:50:26 -07:00
Julien Nioche
f155148216
Upgraded Tika 1.10 #89
2015-08-20 15:35:38 +01:00
matt-deboer
d0e1f1f124
Added docker-rest script (helpful on OSX) to re-create boot2ocker vm and update env. vars in ~/.profile
2015-07-23 14:41:21 -07:00
matt-deboer
d203f0d4ac
Reworked sitemap parser to use SAX for optional parsing of partial docs.
...
Traded Stack for LinkedList for performance improvement.
Fix to getParentElement();
Added test for case referenced by issues #79 and #75 .
2015-06-27 22:33:15 -07:00
Lewis John McGibbney
cd06d834a6
Update README.md
2015-06-15 12:40:53 -07:00
Julien Nioche
0f24082dc0
Applied formatting with mvn java-formatter:format
2015-06-11 10:47:19 +01:00
Julien Nioche
feb40af519
Applied formatting with mvn java-formatter:format
2015-06-11 10:45:06 +01:00
Julien Nioche
37c13c8465
Update README.md
...
added link to javadoc
2015-06-11 10:39:45 +01:00
Julien Nioche
b77fa0052a
Update README.md
...
Announcing 0.6 release
2015-06-11 10:37:52 +01:00
Chaiavi
5cf62ab7d5
Fix for Issue 60
...
SitemapUrls can be not valid when they are referenced in a sitemap which
it's
directory is on a completely different path than the referenced
SitemapUrl.
All as indicated here:
http://www.sitemaps.org/protocol.html#location
In order to clarify the validity aspect we need to upgrade the following
1. Add a little more explanations as javadocs and as logs
2. Rename "Legal" (I think only one occurrence) to "valid" (in the
parser)
3. Add to the Sitemap class a new method to get all *valid* SitemapUrls
4. When dropping a URL due to invalidity a log should be shown, a URL
shouldn't
be dropped quietly.
2015-06-08 23:41:56 +03:00
Julien Nioche
504c207488
Added Julien's public key to KEYS
2015-06-04 10:50:14 +01:00
Lewis John McGibbney
9d45376336
Add KEYS file to CC
2015-06-01 21:55:49 -07:00
Julien Nioche
22206f3a43
[maven-release-plugin] prepare for next development iteration
2015-05-27 16:38:05 +01:00
Julien Nioche
39d076a13b
[maven-release-plugin] prepare release crawler-commons-0.6
2015-05-27 16:38:01 +01:00
Julien Nioche
2394b6713a
Removed tagBase from maven-release-plugin configuration
2015-05-27 16:36:05 +01:00
Julien Nioche
ee4a936066
Revert "[maven-release-plugin] prepare release crawler-commons-0.6"
...
This reverts commit 3b09a9ba52
.
2015-05-27 16:16:54 +01:00
Julien Nioche
3b09a9ba52
[maven-release-plugin] prepare release crawler-commons-0.6
2015-05-27 16:05:02 +01:00
Julien Nioche
a41ab43c41
README Fixed URL for changes file release 0.5
...
was pointing to the 'live' file
2015-05-27 12:18:40 +01:00
Julien Nioche
e8ec75e019
Reverted failed release + changed groupId
2015-05-27 12:16:18 +01:00
Julien Nioche
d115f158b2
[maven-release-plugin] prepare for next development iteration
2015-05-26 10:58:35 +01:00
Julien Nioche
8328e554d4
[maven-release-plugin] prepare release crawler-commons-0.6
2015-05-26 10:58:31 +01:00
Julien Nioche
20861baf47
Issue 75: [Sitemaps] more robust parsing of XML elements (jnioche, kkrugler)
2015-05-22 11:08:21 +01:00
Julien Nioche
40731c3304
applied formatting with mvn java-formatter:format
2015-05-15 09:03:24 +01:00
Julien Nioche
8de545ccdc
Merge pull request #78 from lewismc/CC-77
...
simplify pom file #77
2015-05-15 08:55:45 +01:00
Lewis John McGibbney
e8065d5372
simplify pom file #77
2015-05-14 12:05:37 -07:00
Julien Nioche
47e30b5c22
Merge pull request #76 from crawler-commons/formatter
...
maven-java-formatter-plugin
2015-05-06 10:10:56 +01:00
Julien Nioche
63a837d5d7
updated CHANGES.txt for #76
2015-05-06 10:10:25 +01:00
Julien Nioche
d22a9d0617
removed properties file + 1.6 compliant formatting
2015-05-06 09:33:31 +01:00
Julien Nioche
0a5d9d338a
maven-java-formatter-plugin
2015-04-30 13:52:44 +01:00
Ken Krugler
e42c268e03
Add news about project moving to GitHub
2015-04-22 07:59:43 -07:00
Ken Krugler
ee23e1fb0d
Update CHANGES.txt
2015-04-22 07:31:20 -07:00
Ken Krugler
798dc59839
Update CHANGES.txt
2015-04-22 07:30:57 -07:00
Ken Krugler
53375168ab
Update CHANGES.txt
2015-04-22 07:30:25 -07:00
Julien Nioche
1647b90f7d
Upgraded to Tika 1.8 fixes #71
2015-04-22 13:19:02 +01:00
Julien Nioche
a28a78c942
changed groupId to crawler-commons
2015-04-22 11:02:49 +01:00
Julien Nioche
2195fb7f7e
replaced references to code.google with github equivalents
2015-04-17 10:25:52 +01:00
Julien Nioche
76bc563b21
Merge pull request #72 from crawler-commons/mimetypeRegistry
...
Get mediaTypeRegistry with MediaTypeRegistry.getDefaultRegistry
2015-04-13 20:42:52 +01:00
Julien Nioche
dafbbdd2bf
SiteMapParser use UPPERCASE for static finals
2015-04-13 20:41:47 +01:00
Julien Nioche
152a2446c2
Get mediaTypeRegistry with MediaTypeRegistry.getDefaultRegistry; instanciate Tika from the start
2015-04-13 16:36:39 +01:00
Avi Hayun
e1c7955389
Updated links to point to GitHub
...
I have changed several links to point to GitHub instead of GoogleCode.
For now, I am pointing all releases of CHANGES.TXT to the latest at Master, as we don't have the code of the last releases, and actually the CHANGES.TXT file does contain all changes of all releases, so it seems like a good compromise for now...
2015-04-13 12:54:06 +03:00
Avi Hayun
84cc3bf2d0
First stab at recreating front page like the original
2015-04-13 12:46:57 +03:00
Ken Krugler
a7ee1bfa84
Delete README
2015-04-09 08:29:18 -07:00
kkrugler_lists@transpac.com
10c4dfbd99
Update with info about issues 67 & 68
2015-03-27 22:12:14 +00:00
kkrugler_lists@transpac.com
7f7f915b0b
Issue 68: Case-sensitive path matching
2015-03-27 22:09:32 +00:00
kkrugler_lists@transpac.com
6eb1459345
Issue 67: time in lastMod string not extracted during parse
2015-03-27 21:15:43 +00:00
kkrugler_lists@transpac.com
b2a92ce442
Add note about issue 59 being fixed
2015-01-26 13:21:29 +00:00