I just released a new version (0.6.1) of Spydey today.
Spydey is a link checker, basically. Written in Python. The reason I initially wrote it (a long time ago) was for quickly finding broken links on sites that I develop. And then I got interested in thinking about different strategies for choosing in which order to traverse all the pages on a website. For more about that, see the README.
Or just download it and play with it! It's on Pypi so should be trivial to
install with your python packaging tool of choice (I still use Pip, eg pip
install spydey)
Quick example¶
Are there any broken links on the public Recurse website? Let's find out!
MacBookAir spydey:(main)$ time spydey -r -p --traversal=pattern --stop-on-error https://www.recurse.com
INFO:spydey:1. 200 https://www.recurse.com
INFO:spydey:2. 200 https://www.recurse.com/faq
INFO:spydey:3. 200 https://www.recurse.com/blog
INFO:spydey:4. 200 https://www.recurse.com/hire
INFO:spydey:5. 200 https://www.recurse.com/login
INFO:spydey:6. 200 https://www.recurse.com/manual
INFO:spydey:7. 200 https://www.recurse.com/who
...
INFO:spydey:844. 200 https://www.recurse.com/apply?r=p17#sec-retreat-length
INFO:spydey:845. 200 https://www.recurse.com/apply?r=p67#sec-conversational-interview
real 0m51.754s
Nope, it's good!
What about this maker of fine effects pedals?
MacBookAir ~:$ spydey -r --stop-on-error https://www.wamplerpedals.com/
INFO:spydey:1. 200 https://www.wamplerpedals.com/
INFO:spydey:2. 200 https://www.wamplerpedals.com/products/
INFO:spydey:3. 200 https://www.wamplerpedals.com/products/c/distortion-overdrive/
INFO:spydey:4. 200 https://www.wamplerpedals.com/products/c/compression/
ERROR:spydey:5. 403 https://www.wamplerpedals.com/products/c/fuzz/
WARNING:spydey:Bailing out on first HTTP error
Screenshots¶