Install
pip install pdfplumber
Or let the script install it automatically (see below).
Script
import sys
try:
import pdfplumber
except ImportError:
import subprocess
subprocess.check_call([sys.executable, "-m", "pip", "install", "pdfplumber", "-q"])
import pdfplumber
pdf_path = r"<path/to/your/file.pdf>"
out_path = r"<path/to/output.txt>"
with pdfplumber.open(pdf_path) as pdf:
with open(out_path, "w", encoding="utf-8") as f:
f.write(f"Total pages: {len(pdf.pages)}\n")
for i, page in enumerate(pdf.pages):
f.write(f"\n{'='*60}\n")
f.write(f"PAGE {i+1}\n")
f.write("=" * 60 + "\n")
text = page.extract_text()
if text:
f.write(text + "\n")
else:
f.write("[Empty page or image-only]\n")
print(f"Done! Output saved to: {out_path}")
Replace
<path/to/your/file.pdf>and<path/to/output.txt>with your actual file paths.
Notes
- Pages that contain only images or scans will not have extractable text
- Output is saved as
.txtwith UTF-8 encoding pdfplumberis more accurate thanPyPDF2for PDFs with tables