pdfid
pdfid.py is Didier Stevens’ lightweight PDF inspector (public domain). It scans a file for keywords associated with attack-prone features — /JS, /JavaScript, /AA, /OpenAction, /Launch — and reports their counts. The validation here doesn’t ban those keys outright in arbitrary PDFs; it asserts that phpdftk’s own generated output never produces files containing them, so the library’s writer surface stays clear of the patterns malware authors exploit.
What it catches
Section titled “What it catches”/JSand/JavaScript— embedded JavaScript that some PDF readers will execute/AA— additional-actions dictionaries that fire on document/page events/OpenAction— actions that fire automatically when the document opens/Launch— actions that launch external applications or scripts
A clean run reports zero counts for all five. Any non-zero count fails the assertion.
Installation
Section titled “Installation”The Docker image is built locally from the project’s docker/ setup:
# Docker (recommended — no local install needed)cd docker && docker compose build pdfid
# Local fallback (download Didier Stevens' script directly)curl -O https://didierstevens.com/files/software/pdfid_v0_2_8.zipunzip pdfid_v0_2_8.zipchmod +x pdfid.pymv pdfid.py /usr/local/bin/pdfid.pyHow it works
Section titled “How it works”The PdfIdValidationTrait provides two methods:
// Assert no suspicious indicators are present (markTestSkipped if pdfid missing)$this->assertPdfIdClean('/path/to/file.pdf');
// Get raw output for custom assertions$output = $this->runPdfIdRaw('/path/to/file.pdf');assertPdfIdClean():
- Tries Docker first via
DockerToolRunner(image:phpdftk/pdfid) - Falls back to local script via
ExternalToolLocator::find('pdfid.py') - If neither is available, calls
markTestSkipped() - Parses the output table for the five suspicious indicators
- Fails if any indicator count is greater than zero, listing each violation in the assertion message
Trait source
Section titled “Trait source”tests/Support/PdfIdValidationTrait.php (Phpdftk\Tests\Support\PdfIdValidationTrait)
Test source
Section titled “Test source”packages/pdf/core/tests/Conformance/Tier4PdfIdTest.php — Tier 4 integration test that generates representative fixtures and asserts each one passes pdfid’s clean check.
CI configuration
Section titled “CI configuration”The pdfid image is built in the test and compliance jobs:
- name: Build pdfid and PDFBox Preflight images run: cd docker && docker compose build pdfid pdfbox-preflightManual usage
Section titled “Manual usage”Scan a PDF directly:
# Dockerdocker run --rm -v "$(pwd):/data" phpdftk/pdfid /data/file.pdf
# Local scriptpdfid.py file.pdfA typical clean output:
PDFiD 0.2.8 file.pdf PDF Header: %PDF-1.7 obj 7 endobj 7 stream 1 endstream 1 xref 1 trailer 1 startxref 1 /Page 1 /Encrypt 0 /ObjStm 0 /JS 0 /JavaScript 0 /AA 0 /OpenAction 0 /Launch 0Any non-zero on /JS through /Launch is a finding.