1
0
Fork 0
mirror of https://github.com/crawler-commons/crawler-commons synced 2024-05-09 23:56:04 +02:00

Readme.md Overhaul (#312)

Added Table-of-Contents
Removed issue tracking section
Added Maven installation
Added License
This commit is contained in:
Avi Hayun 2021-08-09 09:00:06 +03:00 committed by GitHub
parent 9630f4c09c
commit 44304581bc
Signed by: GitHub
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 23 additions and 4 deletions

View File

@ -1,6 +1,7 @@
Crawler-Commons Change Log
Current Development 1.2-SNAPSHOT (yyyy-mm-dd)
- Readme.MD Overhaul of TOC, Installation, License (Avi Hayun) #311
- [BasicNormalizer] Normalize URL without a scheme (Avi Hayun, sebastian-nagel) #271
- EffectiveTldFinder: upgrade public suffix list / Download latest effective_tld_names.dat during Maven build (Richard Zowalla) #295, #302
- [BasicNormalizer] decode percent-encoded host names (sebastian-nagel) #303

View File

@ -3,7 +3,14 @@
# Overview
Crawler-Commons is a set of reusable Java components that implement functionality common to any web crawler. These components benefit from collaboration among various existing web crawler projects, and reduce duplication of effort.
Crawler-Commons is a set of reusable Java components that implement functionality common to any web crawler.
These components benefit from collaboration among various existing web crawler projects, and reduce duplication of effort.
# Table of Contents
- [Documentation](#user-documentation)
- [Mailing List](#mailing-list)
- [Installation](#installation)
- [News](#news)
# User Documentation
@ -20,11 +27,19 @@ Crawler-Commons is a set of reusable Java components that implement functionalit
There is a mailing list on [Google Groups](https://groups.google.com/forum/?fromgroups#!forum/crawler-commons).
# Issue Tracking
# Installation
If you find an issue, please file a report [here](https://github.com/crawler-commons/crawler-commons/issues)
Using Maven
Add the following dependency to your pom.xml:
~~~xml
<dependency>
<groupId>com.github.crawler-commons</groupId>
<artifactId>crawler-commons</artifactId>
<version>1.1</version>
</dependency>
~~~
# Crawler-Commons News
# News
## 29th June 2020 - crawler-commons 1.1 released
@ -115,3 +130,6 @@ See [Apache Nutch v2.2 Released](http://nutch.apache.org/#08+June+2013+-+Apache+
This release improves robots.txt and sitemap parsing support.
See the [CHANGES.txt](https://github.com/crawler-commons/crawler-commons/blob/master/CHANGES.txt) file included with the release for a full list of details.
# License
Published under [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0), see [LICENSE](https://github.com/crawler-commons/crawler-commons/blob/master/LICENSE)