Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Pascal Essiembre 0adfef8b51 Can now send deletion requests to committers based on "rejected" events 2 days ago
.github Update codeql-analysis.yml 2 months ago
src Can now send deletion requests to committers based on "rejected" events 2 days ago
.gitignore Improved browser-based unit tests. 1 year ago
CHANGES.xml Can now send deletion requests to committers based on "rejected" events 2 days ago
LICENSE.txt Renamed license file. 1 year ago
README.md Milestone release. 7 months ago
TODO.txt Fixed NullPointerException when resolving sitemaps. 6 months ago
pom.xml New DOMLinkExtractor repeatable "extractSelector" and 2 months ago

README.md

Norconex HTTP Collector

Norconex HTTP Collector

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable. Can be used command-line with file-based configuration on any OS, or can be embedded into Java applications using well documented APIs.

Visit the web site for binary downloads and documentation:

https://opensource.norconex.com/collectors/http/