Convert your Pocket export or any URL list into organized PDFs for offline backup and archiving.
A simple Python tool to convert your Pocket export or any URL+label list into PDFs, organized by labels for easy offline backup.
Pocket is sunsetting — export your saved articles now and archive them as PDFs, sorted in folders by tags.
Supports fallback to Wayback Machine if the original page is inaccessible.
- Parse Pocket CSV export to extract URLs and labels.
- Download web pages as PDFs.
- Organize PDFs in folders named by label. - This script uses only the first tag for folder. (Todo: You can extend to multiple labels.)
- Fallback to Wayback Machine archive if original URL is inaccessible.
- Configurable PDF file naming. - PDF naming uses title + domain + index for uniqueness.
- Tags delimiters are
,
and|
. - Tries to download URLs directly, if previous attempts fail. Some pages block indirect attempts. Some pages really don't exist anymore.
- Logs unsuccessful and doubtful downloads into url_retrieval.log.
- Coloured output for better orientation (BLUE = start, RED = wrong, GREEN = correct).
- Doesn't try to download already downloaded URLs. So you can delete PDFs that contain some kind of error (medium.com trying to figure out, you're human...) and try again. Only those you deleted will by retried.
- Python 3.7+
- Chrome (used for headless rendering; wkhtmltopdf could be used instead)
- Clone the repository or download the script.
- Install Python dependencies:
pip install -r requirements.txt
python pocket_export_pdf.py --input getpocket_sample.csv --output ./pdf_getpocket [--chrome "/path/to/chrome"]
- Windows: "C:/Program Files/Google/Chrome/Application/chrome.exe"
- macOS: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
- Linux: usually just "google-chrome" or "chrome"