java web crawler
Go to file
surtur 1967ee3f3c
chore: update README.md
* show the whole flag lines as code
2020-12-13 23:53:53 +01:00
didyousayspiderman chore: add --fileurls flag + update README.md 2020-12-13 23:06:41 +01:00
.gitignore chore: add .gitignore file 2020-11-10 22:51:49 +01:00
Makefile chore: update Makefile - add variables and targets 2020-12-13 22:50:01 +01:00
README.md chore: update README.md 2020-12-13 23:53:53 +01:00
urls2crawl.txt chore: extend argparse + reword exception messages 2020-12-12 21:38:16 +01:00

didyousayspiderman

this repo holds sawce of the awesome web crawler written in java as part of NST lessons

building and runnnig

build using

make build

run a test run using

make runtest

clean build files using

make clean

do all of the above at once using

make test

if you want, you can also run the program after building directly with java

java -classpath didyousayspiderman/out didyousayspiderman.crawler -u https://git.dotya.ml

flags

there are a couple of flags you can use to tweak the behaviour of the program

  • -f/--fileurls <./path/to/a/file/with/urls>
    specify a path to file with URLs (one per line)
  • -v/--verbose
    turn on verbose printing to stderr
  • -m/--maxdepth <maxdepthlevel>
    the maximum level of recursive URL grabbing (starting with 0)
  • -u/--urllist <url,url,url>
    takes a single URL or a list of comma-separated URLs