You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
13 lines
682 B
13 lines
682 B
This is a simple web scraper in Python using [scrapy](https://docs.scrapy.org/) that writes all the markdown from https://basement.woodbine.nyc/ to disk.
|
|
|
|
Appending `/download` to the end of any hedgedoc page url will return a text file with the markdown. The scraper starts at the markdown version of the homepage and scrapes `[text](hyperlink)` style markdown links. If there are wiki pages that are not linked to from anywhere else this script will not find them.
|
|
|
|
Run like this:
|
|
|
|
$ python -m .venv venv
|
|
$ source .venv/bin/activate
|
|
$ pip install -r requirements
|
|
$ scrapy crawl pages
|
|
|
|
The markdown output will appear in the `None/basement.woodbine.nyc` directory.
|