Skip to content

Validation Suites

phpdftk integrates multiple enterprise-grade PDF validation tools organized into four tiers. Together they cover structural integrity, spec conformance, archival compliance, accessibility, and security.

TierFocusToolsCI behavior
1Core validationQPDF, Arlington, veraPDFEvery push/PR
2Corpus stress-testingPoppler, QPDF, PDFium, PDFBox, veraPDF corporaOn demand
3AccessibilityMatterhorn Protocol (via veraPDF)On demand
4Reference & securityJHOVE, PDF 2.0 examples, pdfid, PDFBox PreflightOn demand

These run automatically in CI and are the primary quality gate.

168 assertions across 46 test files. Validates xref tables, page trees, streams, linearization, and encryption structure.

Every integration test that generates a PDF calls assertQpdfValid() — if QPDF is available (Docker or local binary), the generated file is checked for structural correctness.

use Phpdftk\Tests\Support\QpdfValidationTrait;
class MyTest extends TestCase
{
use QpdfValidationTrait;
public function testPdf(): void
{
$writer = new PdfWriter();
// ... build PDF ...
$writer->save($path);
$this->assertQpdfValid($path);
}
}

6 assertions across 5 core tests. Validates every dictionary in the generated PDF against the Arlington PDF Model — a machine-readable representation of all 613 dictionary types in the PDF specification.

Checks:

  • Required keys are present
  • Key names are valid for the dictionary type
  • Value types match the spec
  • Version constraints are satisfied

2 dedicated test classes validate PDF/A-1b and PDF/UA-1 output against veraPDF’s ISO 19005 and Matterhorn Protocol implementations.

Terminal window
# Run conformance tests (requires veraPDF Docker image)
mise run test -- --group verapdf

Large collections of real-world and edge-case PDFs from major implementations. These stress-test the reader’s error tolerance.

CorpusSourceFilesFocus
veraPDFveraPDF/veraPDF-corpus~1,500Intentionally non-conformant PDF/A (negative testing)
Popplerfreedesktop.org/poppler~80Fonts, transparency, CJK, encryption, damaged files
QPDFqpdf/qpdf~700Linearization, object/xref streams, encryption, recovery
PDFiumchromium/pdfium~300Rendering, JavaScript, XFA, annotations, CJK
PDFBoxapache/pdfbox~150Signatures, encryption, forms, fonts, incremental updates

All corpus PDFs are parsed with PdfReader in lenient mode. Encrypted and intentionally malformed files are expected to throw — unexpected exceptions are test failures.

Terminal window
# Initialize corpus submodules
git submodule update --init --depth 1 vendor-data/poppler-test vendor-data/qpdf vendor-data/pdfium vendor-data/pdfbox
# Run corpus tests
mise run test -- --group tier2

Tests PDF/UA (Universal Accessibility) compliance via veraPDF’s ua1 profile. Exercises:

  • StructTreeRoot and MarkInfo presence
  • Document language (/Lang)
  • ViewerPreferences DisplayDocTitle
  • Annotation accessibility (/Contents alt text)
  • Tab order (/Tabs /S)

Both positive tests (tagged PDFs pass) and negative tests (missing tagging fails) are included.

Terminal window
mise run test -- --group tier3

Open Preservation Foundation’s format validator. Checks structure, xref integrity, stream lengths, font embedding, and metadata. Validates “Well-Formed and Valid” status.

7 reference PDFs from the PDF Association exercising PDF 2.0 features: page-level output intents, associated files, UTF-8 strings, incremental saves.

Didier Stevens’ security scanner. Asserts zero counts for suspicious features in generated PDFs:

  • /JS and /JavaScript (embedded scripts)
  • /AA and /OpenAction (automatic actions)
  • /Launch (application execution)

PDFBox Preflight — PDF/A Cross-Validation

Section titled “PDFBox Preflight — PDF/A Cross-Validation”

Apache PDFBox’s preflight module validates PDF/A-1b as a secondary cross-validator alongside veraPDF.

Terminal window
mise run test -- --group tier4

All validation tools use a Docker-first approach with local binary fallback.

Terminal window
# Build/pull all validation tool images
cd docker && docker compose build && docker compose pull
# Initialize submodules (Arlington model + corpora)
git submodule update --init
ToolIf unavailableRationale
QPDFTest passes silentlyBonus structural check; other assertions still validate
ArlingtonTest skippedSpec validation is the primary purpose
veraPDFTest skippedPDF/A tests are intentional opt-ins
JHOVE/pdfid/PreflightTest skippedTier 4 tools are supplementary
use Phpdftk\Tests\Support\QpdfValidationTrait;
use Phpdftk\Tests\Support\Arlington\ArlingtonValidationTrait;
class MyIntegrationTest extends TestCase
{
use QpdfValidationTrait;
use ArlingtonValidationTrait;
public function testGeneratesPdf(): void
{
$writer = new PdfWriter();
// ... build PDF ...
$writer->save($path);
$this->assertQpdfValid($path);
$this->assertArlingtonValid($path);
}
}