java web crawler
|
||
---|---|---|
didyousayspiderman | ||
.gitignore | ||
Makefile | ||
README.md | ||
urls2crawl.txt |
didyousayspiderman
this repo holds sawce of the awesome web crawler written in java as part of NST lessons
building and runnnig
build using
make build
run a test run using
make runtest
clean build files using
make clean
do all of the above at once using
make test
if you want, you can also run the program after building directly with java
java -classpath didyousayspiderman/out didyousayspiderman.crawler -u https://git.dotya.ml
flags
there are a couple of flags you can use to tweak the behaviour of the program
-f
/--fileurls <./path/to/a/file/with/urls>
specify a path to file with URLs (one per line)-v
/--verbose
turn on verbose printing to stderr-m
/--maxdepth <maxdepthlevel>
the maximum level of recursive URL grabbing (starting with 0)-u
/--urllist <url,url,url>
takes a single URL or a list of comma-separated URLs