Turn a list of URLs into styled PDFs with a few lines of Python. This tutorial covers single requests, batch processing, error handling, and scheduling for automated documentation and report workflows.
API access on Pro+ ($12/mo). Full docs at api.prettypdfprinter.com/docs.
Manual saving is fine for a few pages. But when you need to archive 50 pages of documentation, generate weekly reports from dashboards, or snapshot competitor pages on a schedule, automation is the answer.
Python combined with the Pretty PDF API gives you a scriptable pipeline that converts any URL or HTML into a professionally styled PDF. Instead of clicking through a browser extension one page at a time, you write a script once and run it whenever you need fresh PDFs.
Automation eliminates repetitive manual work, reduces errors from copy-paste workflows, and ensures consistent formatting across every document. Whether you are archiving documentation before a version update, generating compliance evidence on a schedule, or building a reporting pipeline that runs in CI, the pattern is the same: a list of URLs goes in, a folder of styled PDFs comes out.
The rest of this tutorial walks through the complete workflow — from a single API call to a production-ready batch script with error handling and scheduling.
You need Python 3.7 or later and the requests library. The setup takes less than a minute.
pip install requests
Never hardcode API keys in your scripts. Use an environment variable instead. Generate your API key from the Pretty PDF dashboard under Settings, then export it in your shell:
# Linux / macOS
export PRETTYPDF_API_KEY="your-api-key-here"
# Windows (PowerShell)
$env:PRETTYPDF_API_KEY = "your-api-key-here"
Create a Python file with the base configuration that every script in this tutorial will use:
import os
import requests
API_KEY = os.environ["PRETTYPDF_API_KEY"]
API_BASE = "https://api.prettypdfprinter.com/v1"
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json",
}
If the PRETTYPDF_API_KEY environment variable is not set, the script will raise a KeyError immediately rather than failing silently later. This is intentional — you want to know right away if the key is missing.
Start with the simplest case: convert one URL to a PDF and save it to disk.
The POST /v1/generate/url endpoint accepts a URL and returns a document record with a download link. Send the URL, then download the resulting PDF:
import os
import requests
API_KEY = os.environ["PRETTYPDF_API_KEY"]
API_BASE = "https://api.prettypdfprinter.com/v1"
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json",
}
def convert_url_to_pdf(url, output_path, template="clean"):
"""Convert a single URL to PDF and save to disk."""
# Step 1: Request PDF generation
response = requests.post(
f"{API_BASE}/generate/url",
headers=HEADERS,
json={
"url": url,
"template": template,
},
)
response.raise_for_status()
result = response.json()
# Step 2: Download the generated PDF
doc_id = result["id"]
download_url = f"{API_BASE}/files/{doc_id}"
pdf_response = requests.get(download_url, headers=HEADERS)
pdf_response.raise_for_status()
# Step 3: Save to disk
with open(output_path, "wb") as f:
f.write(pdf_response.content)
print(f"Saved: {output_path} ({len(pdf_response.content)} bytes)")
return result
# Usage
convert_url_to_pdf(
"https://docs.python.org/3/tutorial/index.html",
"python-tutorial.pdf",
template="clean",
)
The API receives your URL, fetches the page server-side, runs the content extraction pipeline to strip navigation and ads, applies the template you specified, and renders the result to PDF with WeasyPrint. The response includes a document ID that you use to download the file. The entire round trip typically takes 2 to 5 seconds depending on page complexity.
Process a list of URLs with rate limiting, progress tracking, and meaningful filenames.
When you have a list of URLs to convert, iterate through them sequentially with a delay between requests to respect rate limits. The function below reads URLs from a list, generates a filename from each URL, and tracks progress:
import os
import re
import time
import requests
API_KEY = os.environ["PRETTYPDF_API_KEY"]
API_BASE = "https://api.prettypdfprinter.com/v1"
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json",
}
DELAY_BETWEEN_REQUESTS = 3 # seconds (safe for Pro+)
def url_to_filename(url):
"""Convert a URL to a safe filename."""
name = url.split("//")[-1]
name = re.sub(r"[^\w\-.]", "_", name)
return name[:100] + ".pdf"
def batch_convert(urls, output_dir="pdfs", template="clean"):
"""Convert a list of URLs to PDFs with rate limiting."""
os.makedirs(output_dir, exist_ok=True)
results = {"success": [], "failed": []}
for i, url in enumerate(urls, 1):
print(f"[{i}/{len(urls)}] Processing: {url}")
try:
# Generate PDF
response = requests.post(
f"{API_BASE}/generate/url",
headers=HEADERS,
json={"url": url, "template": template},
)
response.raise_for_status()
result = response.json()
# Download PDF
doc_id = result["id"]
pdf_response = requests.get(
f"{API_BASE}/files/{doc_id}",
headers=HEADERS,
)
pdf_response.raise_for_status()
# Save to disk
filename = url_to_filename(url)
filepath = os.path.join(output_dir, filename)
with open(filepath, "wb") as f:
f.write(pdf_response.content)
print(f" Saved: {filepath}")
results["success"].append(url)
except requests.RequestException as e:
print(f" Failed: {e}")
results["failed"].append({"url": url, "error": str(e)})
# Rate limit delay (skip after last URL)
if i < len(urls):
time.sleep(DELAY_BETWEEN_REQUESTS)
print(f"\nDone: {len(results['success'])} succeeded, "
f"{len(results['failed'])} failed")
return results
# Usage
urls = [
"https://docs.python.org/3/tutorial/index.html",
"https://docs.python.org/3/library/os.html",
"https://docs.python.org/3/library/pathlib.html",
"https://docs.python.org/3/library/json.html",
]
batch_convert(urls, output_dir="python-docs")
For larger batches, keep your URLs in a text file (one per line) and read them in:
def load_urls(filepath):
"""Load URLs from a text file, one URL per line."""
with open(filepath) as f:
return [line.strip() for line in f if line.strip()]
urls = load_urls("urls.txt")
batch_convert(urls)
A production script needs to handle rate limits, server errors, and network failures gracefully. This section adds retry logic to keep your batch running.
The function below handles three failure modes: HTTP 429 (rate limited) by reading the Retry-After header, HTTP 5xx (server error) by retrying with exponential backoff, and network errors by retrying with a timeout. Permanent failures (4xx other than 429) are logged and skipped.
import time
import requests
def request_with_retry(method, url, max_retries=3, **kwargs):
"""Make an HTTP request with retry logic for rate limits and errors."""
for attempt in range(max_retries + 1):
try:
response = method(url, timeout=30, **kwargs)
# Success
if response.status_code < 400:
return response
# Rate limited: wait and retry
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 10))
print(f" Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
continue
# Server error: retry with backoff
if response.status_code >= 500:
if attempt < max_retries:
wait = 2 ** attempt # 1s, 2s, 4s
print(f" Server error {response.status_code}. "
f"Retrying in {wait}s...")
time.sleep(wait)
continue
# Permanent client error: do not retry
response.raise_for_status()
except requests.ConnectionError:
if attempt < max_retries:
wait = 2 ** attempt
print(f" Connection error. Retrying in {wait}s...")
time.sleep(wait)
continue
raise
except requests.Timeout:
if attempt < max_retries:
wait = 2 ** attempt
print(f" Timeout. Retrying in {wait}s...")
time.sleep(wait)
continue
raise
return response
Replace direct requests.post() and requests.get() calls with request_with_retry():
# Instead of:
response = requests.post(f"{API_BASE}/generate/url", headers=HEADERS, json=payload)
# Use:
response = request_with_retry(
requests.post,
f"{API_BASE}/generate/url",
headers=HEADERS,
json=payload,
)
With retry logic in place, transient failures no longer stop the entire batch. The script waits when rate limited, backs off on server errors, and only skips URLs that fail permanently.
Run your script on a schedule to automate recurring PDF generation. Archive documentation every Monday, snapshot dashboards every morning, or back up content before monthly releases.
A cron job runs in a minimal environment, so the script should be self-contained with absolute paths and logging to a file:
#!/usr/bin/env python3
"""Weekly documentation archival script.
Run via cron: 0 6 * * 1 /usr/bin/python3 /home/user/scripts/archive_docs.py
"""
import os
import sys
import time
import logging
from datetime import datetime
import requests
# Configuration
API_KEY = os.environ.get("PRETTYPDF_API_KEY")
if not API_KEY:
sys.exit("PRETTYPDF_API_KEY environment variable not set")
API_BASE = "https://api.prettypdfprinter.com/v1"
HEADERS = {"X-API-Key": API_KEY, "Content-Type": "application/json"}
OUTPUT_DIR = "/home/user/archives/docs"
URL_FILE = "/home/user/scripts/doc_urls.txt"
DELAY = 3
# Logging
date_str = datetime.now().strftime("%Y-%m-%d")
log_file = f"/home/user/logs/archive_{date_str}.log"
os.makedirs(os.path.dirname(log_file), exist_ok=True)
logging.basicConfig(filename=log_file, level=logging.INFO,
format="%(asctime)s %(message)s")
def main():
# Create dated output directory
out_dir = os.path.join(OUTPUT_DIR, date_str)
os.makedirs(out_dir, exist_ok=True)
# Load URLs
with open(URL_FILE) as f:
urls = [l.strip() for l in f if l.strip()]
logging.info(f"Starting archive of {len(urls)} URLs")
success, failed = 0, 0
for i, url in enumerate(urls, 1):
try:
resp = requests.post(f"{API_BASE}/generate/url",
headers=HEADERS,
json={"url": url, "template": "clean"},
timeout=30)
resp.raise_for_status()
doc_id = resp.json()["id"]
pdf = requests.get(f"{API_BASE}/files/{doc_id}",
headers=HEADERS, timeout=30)
pdf.raise_for_status()
filename = f"{i:03d}.pdf"
filepath = os.path.join(out_dir, filename)
with open(filepath, "wb") as f:
f.write(pdf.content)
logging.info(f"OK: {url} -> {filepath}")
success += 1
except Exception as e:
logging.error(f"FAIL: {url} -> {e}")
failed += 1
if i < len(urls):
time.sleep(DELAY)
logging.info(f"Done: {success} succeeded, {failed} failed")
if __name__ == "__main__":
main()
Edit your crontab with crontab -e and add the following line to run the script every Monday at 6:00 AM:
# Archive documentation every Monday at 6 AM
0 6 * * 1 PRETTYPDF_API_KEY="your-key" /usr/bin/python3 /home/user/scripts/archive_docs.py
On Windows, create a scheduled task that runs python C:\scripts\archive_docs.py on your preferred schedule. Set the PRETTYPDF_API_KEY environment variable in the task's environment settings or in your system environment variables.
Python automation with the Pretty PDF API fits a range of workflows where you need PDFs generated consistently and on schedule.
Point the script at your internal dashboards and reporting pages. Every Monday morning, the cron job fetches each URL, generates a styled PDF, and saves it to a shared drive or uploads it to Slack. Stakeholders get consistent, readable reports without anyone manually exporting pages.
Before a major version update, run the batch script against your documentation URLs to create a complete PDF archive of the current version. Store the snapshots alongside your release artifacts so you always have a historical record of what the docs said at each version.
Track competitor pricing pages, feature announcements, and landing pages by snapshotting them weekly. The dated output directories give you a timeline of changes. PDFs preserve the full layout and content in a format that is easy to share and reference in strategy meetings.
Regulatory and legal teams need proof of what was published on a specific date. Schedule the script to capture policy pages, terms of service, and public disclosures on a recurring basis. Each run produces timestamped PDFs that serve as evidence in audits and legal proceedings.
Before migrating a website to a new CMS or redesigning a site, run the batch script against every page to create a complete PDF backup. If anything goes wrong during migration, you have a full archive of the original content and layout for reference and recovery.
python --version in your terminal. If you are using a virtual environment, make sure the environment is activated before installing dependencies.requests.get() to download the file and write it to disk with open(filename, 'wb'). Make sure to use binary write mode ('wb') since PDFs are binary files. You can derive meaningful filenames from the URL or the document title returned in the API response metadata.Get your API key, install requests, and turn any list of URLs into styled PDFs in minutes.