downloads cedarville publishing books as pdf
Python 60.6%
Shell 39.4%
1 1 0

Clone this repository

https://tangled.org/dunkirk.sh/cedrus
git@knot.dunkirk.sh:dunkirk.sh/cedrus

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Cedarville Cybersecurity Textbook PDF Creator#

Automated tool to download and convert the Cedarville "Invitation to Cybersecurity" textbook to PDF format.

Features#

  • Downloads all 340 pages (SVG text layers + high-res WebP images)
  • Composites layers with proper font rendering
  • Creates high-quality PDF (1045x1350 pixels per page)
  • Optional: Add searchable text with OCR

Quick Start#

./build.sh

That's it! The script will:

  1. Create a Python virtual environment
  2. Install dependencies
  3. Download all page layers (~10-15 min)
  4. Create the PDF (~8-10 min)
  5. Optionally add OCR for selectable text (~30-60 min)

Manual Steps#

If you prefer to run steps individually:

1. Setup Environment#

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m playwright install chromium

2. Download Layers#

python download_layers.py

Downloads 340 pages:

  • SVG layers (text, vector graphics) → svg_layers/
  • High-res WebP images (1045x1350) → webp_highres/

3. Create PDF#

python create_pdf.py

Composites SVG + WebP and creates Invitation_to_Cybersecurity.pdf

4. Add OCR (Optional)#

brew install ocrmypdf
ocrmypdf Invitation_to_Cybersecurity.pdf Invitation_to_Cybersecurity_OCR.pdf

Creates a version with selectable/searchable text.

Requirements#

  • Python 3.9+
  • macOS (tested on Apple Silicon)
  • Homebrew (for OCR step)

Output#

  • Invitation_to_Cybersecurity.pdf - 340 pages, ~70-80 MB, high quality
  • Invitation_to_Cybersecurity_OCR.pdf - Same as above + searchable text (optional)

File Structure#

cedrus/
├── build.sh                    # Main build script
├── requirements.txt            # Python dependencies
├── download_layers.py          # Download SVG + WebP
├── create_pdf.py              # Composite and create PDF
├── svg_layers/                # Downloaded SVG files
├── webp_highres/              # Downloaded WebP files
├── merged_pages/              # Temporary composited PNGs
└── Invitation_to_Cybersecurity.pdf

Troubleshooting#

"Command not found: python3"

  • Install Python 3: brew install python3

"ocrmypdf not found"

  • OCR step is optional. Install with: brew install ocrmypdf

Fonts look wrong

  • The script uses Playwright (Chromium) which properly renders embedded fonts
  • If issues persist, check that Playwright browser installed: python -m playwright install chromium

Notes#

  • Total time: ~20-30 minutes (without OCR)
  • With OCR: ~50-90 minutes total
  • Disk space needed: ~500 MB temporary files
  • The script downloads from the official Cedarville publication server
  • Be patient - high-quality rendering takes time!