Background paper texture mobile
python

Read PDF with Python

Extract text from PDF files using pdfplumber, with auto-install if the library is missing.

Author avatar

Peter Shaan

May 18, 2026


22 Views

Install

pip install pdfplumber

Or let the script install it automatically (see below).

Script

import sys

try:
    import pdfplumber
except ImportError:
    import subprocess
    subprocess.check_call([sys.executable, "-m", "pip", "install", "pdfplumber", "-q"])
    import pdfplumber

pdf_path = r"<path/to/your/file.pdf>"
out_path = r"<path/to/output.txt>"

with pdfplumber.open(pdf_path) as pdf:
    with open(out_path, "w", encoding="utf-8") as f:
        f.write(f"Total pages: {len(pdf.pages)}\n")
        for i, page in enumerate(pdf.pages):
            f.write(f"\n{'='*60}\n")
            f.write(f"PAGE {i+1}\n")
            f.write("=" * 60 + "\n")
            text = page.extract_text()
            if text:
                f.write(text + "\n")
            else:
                f.write("[Empty page or image-only]\n")

print(f"Done! Output saved to: {out_path}")

Replace <path/to/your/file.pdf> and <path/to/output.txt> with your actual file paths.

Notes

  • Pages that contain only images or scans will not have extractable text
  • Output is saved as .txt with UTF-8 encoding
  • pdfplumber is more accurate than PyPDF2 for PDFs with tables

Back to Notes