Background paper texture mobile
python

Read PDF with Python

Extract text from PDF files using pdfplumber, with auto-install if the library is missing.

Author avatar

Peter Shaan

May 18, 2026


7 Views

Install

pip install pdfplumber

Or let the script install it automatically (see below).

Script

import sys

try:
    import pdfplumber
except ImportError:
    import subprocess
    subprocess.check_call([sys.executable, "-m", "pip", "install", "pdfplumber", "-q"])
    import pdfplumber

pdf_path = r"<path/to/your/file.pdf>"
out_path = r"<path/to/output.txt>"

with pdfplumber.open(pdf_path) as pdf:
    with open(out_path, "w", encoding="utf-8") as f:
        f.write(f"Total pages: {len(pdf.pages)}\n")
        for i, page in enumerate(pdf.pages):
            f.write(f"\n{'='*60}\n")
            f.write(f"PAGE {i+1}\n")
            f.write("=" * 60 + "\n")
            text = page.extract_text()
            if text:
                f.write(text + "\n")
            else:
                f.write("[Empty page or image-only]\n")

print(f"Done! Output saved to: {out_path}")

Replace <path/to/your/file.pdf> and <path/to/output.txt> with your actual file paths.

Notes

  • Pages that contain only images or scans will not have extractable text
  • Output is saved as .txt with UTF-8 encoding
  • pdfplumber is more accurate than PyPDF2 for PDFs with tables

Back to Notes