Skip to content

Apache PDFBox Preflight

Apache PDFBox is a long-established Java toolkit for PDF processing (Apache 2.0). Its preflight module is a dedicated PDF/A-1b validator — an independent second opinion to veraPDF for the same ISO 19005-1 conformance level. Running both in CI catches divergent interpretations of the spec that either implementation alone might miss.

  • PDF/A-1b violations Apache’s interpretation of ISO 19005-1 flags but veraPDF accepts (and vice versa)
  • Font embedding gaps the JBoss preflight rules detect
  • Color space and OutputIntent mismatches under PDF/A-1b’s tighter constraints
  • Structural anomalies (xref, page tree, streams) that pass general validators but fail archival rules
  • Metadata XMP packet errors specific to PDF/A-1 identification

The Docker image is built locally from the project’s docker/ setup:

Terminal window
# Docker (recommended — no local install needed)
cd docker && docker compose build pdfbox-preflight
# Local fallback (if you have the Apache PDFBox preflight CLI installed)
# Most distributions don't ship preflight as a standalone binary —
# Docker is the practical path.

The PdfBoxPreflightValidationTrait provides two methods:

// Assert PDF/A-1b conformance (markTestSkipped if Preflight missing)
$this->assertPdfBoxPreflightValid('/path/to/file.pdf');
// Get raw output for custom assertions
$output = $this->runPdfBoxPreflightRaw('/path/to/file.pdf');

assertPdfBoxPreflightValid():

  1. Tries Docker first via DockerToolRunner (image: phpdftk/pdfbox-preflight)
  2. Falls back to local binary via ExternalToolLocator::find('preflight')
  3. If neither is available, calls markTestSkipped()
  4. Runs the preflight container against the file and asserts exit code 0
  5. On failure, includes the first 2 KB of preflight output in the assertion message

tests/Support/PdfBoxPreflightValidationTrait.php (Phpdftk\Tests\Support\PdfBoxPreflightValidationTrait)

packages/pdf/core/tests/Conformance/Tier4PdfBoxPreflightTest.php — Tier 4 integration test that generates PDF/A-1b fixtures and asserts both veraPDF and PDFBox Preflight accept them.

The PDFBox Preflight image is built in the test and compliance jobs:

.github/workflows/ci.yml
- name: Build pdfid and PDFBox Preflight images
run: cd docker && docker compose build pdfid pdfbox-preflight

Validate a PDF/A-1b file via the Docker image directly:

Terminal window
docker run --rm -v "$(pwd):/data" phpdftk/pdfbox-preflight /data/file.pdf
echo "exit=$?"
# exit 0 means the file is valid PDF/A-1b