downloads cedarville publishing books as pdf
Python 65.9%
Shell 34.1%
2 1 0

Clone this repository

https://tangled.org/dunkirk.sh/cedrus
git@knot.dunkirk.sh:dunkirk.sh/cedrus

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Cedarville Cybersecurity Textbook PDF Creator#

Automated tool to download and convert the Cedarville "Invitation to Cybersecurity" textbook to PDF format.

Features#

  • Downloads all 340 pages (SVG text layers + high-res WebP images)
  • Creates true vector PDF with embedded custom fonts using Playwright Print-to-PDF
  • Text is perfectly sharp at any zoom level and fully selectable/searchable
  • Fast: ~15-20 minutes total (download + PDF creation)
  • Final PDF: ~62 MB with 340 high-quality pages

Quick Start#

./build.sh

That's it! The script will:

  1. Create a Python virtual environment
  2. Install dependencies
  3. Download all page layers (~10-15 min)
  4. Create the PDF (~8-10 min)
  5. Optionally add OCR for selectable text (~30-60 min)

Manual Steps#

If you prefer to run steps individually:

1. Setup Environment#

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m playwright install chromium

2. Download Layers#

python download_layers.py

Downloads 340 pages:

  • SVG layers (text, vector graphics) → svg_layers/
  • High-res WebP images (1045x1350) → webp_highres/

3. Create PDF#

python create_pdf.py

Creates Invitation_to_Cybersecurity.pdf using Playwright's Print-to-PDF:

  • Each page rendered as vector PDF with embedded fonts
  • High-res WebP images as backgrounds
  • SVG text preserved as vectors (sharp at any zoom!)
  • Fully selectable and searchable text
  • Takes ~10-15 minutes

Requirements#

  • Python 3.9+
  • macOS (tested on Apple Silicon)
  • Homebrew (for OCR step)

Output#

  • Invitation_to_Cybersecurity.pdf - 340 pages, ~62 MB, true vector text with embedded custom fonts!

File Structure#

cedrus/
├── build.sh                    # Main build script
├── requirements.txt            # Python dependencies
├── download_layers.py          # Download SVG + WebP
├── create_pdf.py              # Composite and create PDF
├── svg_layers/                # Downloaded SVG files
├── webp_highres/              # Downloaded WebP files
├── merged_pages/              # Temporary composited PNGs
└── Invitation_to_Cybersecurity.pdf

Troubleshooting#

"Command not found: python3"

  • Install Python 3: brew install python3

"ocrmypdf not found"

  • OCR step is optional. Install with: brew install ocrmypdf

Fonts look wrong

  • The script uses Playwright (Chromium) which properly renders embedded fonts
  • If issues persist, check that Playwright browser installed: python -m playwright install chromium

Notes#

  • Total time: ~20-30 minutes (without OCR)
  • With OCR: ~50-90 minutes total
  • Disk space needed: ~500 MB temporary files
  • The script downloads from the official Cedarville publication server
  • Be patient - high-quality rendering takes time!