Apache PDFBox Preflight
Apache PDFBox is a long-established Java toolkit for PDF processing (Apache 2.0). Its preflight module is a dedicated PDF/A-1b validator — an independent second opinion to veraPDF for the same ISO 19005-1 conformance level. Running both in CI catches divergent interpretations of the spec that either implementation alone might miss.
What it catches
Section titled “What it catches”- PDF/A-1b violations Apache’s interpretation of ISO 19005-1 flags but veraPDF accepts (and vice versa)
- Font embedding gaps the JBoss preflight rules detect
- Color space and OutputIntent mismatches under PDF/A-1b’s tighter constraints
- Structural anomalies (xref, page tree, streams) that pass general validators but fail archival rules
- Metadata XMP packet errors specific to PDF/A-1 identification
Installation
Section titled “Installation”The Docker image is built locally from the project’s docker/ setup:
# Docker (recommended — no local install needed)cd docker && docker compose build pdfbox-preflight
# Local fallback (if you have the Apache PDFBox preflight CLI installed)# Most distributions don't ship preflight as a standalone binary —# Docker is the practical path.How it works
Section titled “How it works”The PdfBoxPreflightValidationTrait provides two methods:
// Assert PDF/A-1b conformance (markTestSkipped if Preflight missing)$this->assertPdfBoxPreflightValid('/path/to/file.pdf');
// Get raw output for custom assertions$output = $this->runPdfBoxPreflightRaw('/path/to/file.pdf');assertPdfBoxPreflightValid():
- Tries Docker first via
DockerToolRunner(image:phpdftk/pdfbox-preflight) - Falls back to local binary via
ExternalToolLocator::find('preflight') - If neither is available, calls
markTestSkipped() - Runs the preflight container against the file and asserts exit code 0
- On failure, includes the first 2 KB of preflight output in the assertion message
Trait source
Section titled “Trait source”tests/Support/PdfBoxPreflightValidationTrait.php (Phpdftk\Tests\Support\PdfBoxPreflightValidationTrait)
Test source
Section titled “Test source”packages/pdf/core/tests/Conformance/Tier4PdfBoxPreflightTest.php — Tier 4 integration test that generates PDF/A-1b fixtures and asserts both veraPDF and PDFBox Preflight accept them.
CI configuration
Section titled “CI configuration”The PDFBox Preflight image is built in the test and compliance jobs:
- name: Build pdfid and PDFBox Preflight images run: cd docker && docker compose build pdfid pdfbox-preflightManual usage
Section titled “Manual usage”Validate a PDF/A-1b file via the Docker image directly:
docker run --rm -v "$(pwd):/data" phpdftk/pdfbox-preflight /data/file.pdfecho "exit=$?"# exit 0 means the file is valid PDF/A-1b