Web Pages To PDF

Convert your Pocket export or any URL list into organized PDFs for offline backup and archiving.

A simple Python tool to convert your Pocket export or any URL+label list into PDFs, organized by labels for easy offline backup.

Why?

Pocket is sunsetting — export your saved articles now and archive them as PDFs, sorted in folders by tags.
Supports fallback to Wayback Machine if the original page is inaccessible.

Features

Parse Pocket CSV export to extract URLs and labels.
Download web pages as PDFs.
Organize PDFs in folders named by label. - This script uses only the first tag for folder. (Todo: You can extend to multiple labels.)
Fallback to Wayback Machine archive if original URL is inaccessible.
Configurable PDF file naming. - PDF naming uses title + domain + index for uniqueness.
Tags delimiters are , and |.
Tries to download URLs directly, if previous attempts fail. Some pages block indirect attempts. Some pages really don't exist anymore.
Logs unsuccessful and doubtful downloads into url_retrieval.log.
Coloured output for better orientation (BLUE = start, RED = wrong, GREEN = correct).
Doesn't try to download already downloaded URLs. So you can delete PDFs that contain some kind of error (medium.com trying to figure out, you're human...) and try again. Only those you deleted will by retried.

Requirements

Python 3.7+
Chrome (used for headless rendering; wkhtmltopdf could be used instead)

Installation

Clone the repository or download the script.
Install Python dependencies:

pip install -r requirements.txt

Usage

python pocket_export_pdf.py --input getpocket_sample.csv --output ./pdf_getpocket [--chrome "/path/to/chrome"]

Path to Chrome

Windows: "C:/Program Files/Google/Chrome/Application/chrome.exe"
macOS: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
Linux: usually just "google-chrome" or "chrome"

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
getpocket_sample.csv		getpocket_sample.csv
pocket_export_pdf.py		pocket_export_pdf.py
requirements.txt		requirements.txt
sample_export.csv		sample_export.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Pages To PDF

Why?

Features

Requirements

Installation

Usage

Path to Chrome

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

WorkOfStan/web-pages-to-pdf

Folders and files

Latest commit

History

Repository files navigation

Web Pages To PDF

Why?

Features

Requirements

Installation

Usage

Path to Chrome

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages