Installation
Requirements
Python 3.10+
pip
Install from PyPI
pip install websweep
By default, pip install websweep installs google-re2 on supported
Python versions (3.10+). If unavailable, WebSweep falls back to regex.
WebSweep also installs and uses lxml as the default HTML parser for faster
crawling/extraction parsing, with runtime fallback to html.parser when
lxml is unavailable.
Verify installation:
websweep --version
Install from Source (Developers)
git clone https://github.com/sodascience/websweep.git
cd websweep
uv sync --group test --group docs --group dev
Run tests:
uv run pytest -q
Build docs:
uv run make docs