JHOVE
JHOVE is the Open Preservation Foundation’s format validator (LGPL 2.1). Its PDF-hul module — the PDF Hierarchy Universal Loader — assesses PDF format conformance, returning a “Well-Formed and valid” status when the file’s structure is sound and a “Not well-formed” or “Well-Formed, but not valid” status with explanatory messages when something is off.
What it catches
Section titled “What it catches”- Malformed file structure that QPDF’s
--checkmay parse but JHOVE’s preservation-grade analyzer flags - Cross-reference inconsistencies that fall short of “valid” archival format
- Trailer/header anomalies the preservation community treats as risk factors
- PDF version mismatches between header and structural features
- Stream encoding issues that compromise long-term readability
Installation
Section titled “Installation”# Docker (recommended — no local install needed)docker pull openpreserve/jhove
# Or install locally as a fallback:brew install jhove # macOSsudo apt-get install jhove # Ubuntu/DebianHow it works
Section titled “How it works”The JhoveValidationTrait provides two methods:
// Assert a PDF is well-formed and valid (markTestSkipped if JHOVE missing)$this->assertJhoveValid('/path/to/file.pdf');
// Get raw output for custom assertions$output = $this->runJhoveRaw('/path/to/file.pdf');assertJhoveValid():
- Tries Docker first via
DockerToolRunner(image:openpreserve/jhove) - Falls back to local binary via
ExternalToolLocator::find('jhove') - If neither is available, calls
markTestSkipped()(test passes on existing assertions) - Runs
jhove -m PDF-hul -h xml <file>and parses the<status>element - Asserts the status equals
Well-Formed and valid - On failure, includes the first 2 KB of JHOVE output in the assertion message
Trait source
Section titled “Trait source”tests/Support/JhoveValidationTrait.php (Phpdftk\Tests\Support\JhoveValidationTrait)
Test source
Section titled “Test source”packages/pdf/core/tests/Conformance/Tier4JhoveTest.php — Tier 4 integration test that generates fixture PDFs and runs each through JHOVE.
CI configuration
Section titled “CI configuration”The JHOVE Docker image is pulled in the test and compliance jobs:
- name: Pull JHOVE image run: docker pull openpreserve/jhoveManual usage
Section titled “Manual usage”Validate any PDF from the command line:
# Single file (XML output)jhove -m PDF-hul -h xml docs/sample-pdfs/simple_text.pdf
# Plain text outputjhove -m PDF-hul docs/sample-pdfs/simple_text.pdf
# Look only for the status linejhove -m PDF-hul -h xml file.pdf | grep -oE '<status>[^<]+</status>'