phpdftk API Documentation

PositionedTextExtractor
in package

FinalYes

Extracts text with precise positioning from a PDF page.

Implements a full text state machine per ISO 32000-2 §9:

  • Tracks the current transformation matrix (CTM) via cm operator
  • Tracks the text matrix (Tm) and text line matrix
  • Applies character spacing (Tc), word spacing (Tw), horizontal scaling (Tz), text leading (TL), and text rise (Ts)
  • Resolves glyph widths from font /Widths arrays, /W arrays (CID fonts), embedded font data, and standard font metrics (14 built-in fonts)
  • Computes per-span bounding boxes in user space coordinates

Each text-showing operator (Tj, TJ, ', ") produces one or more TextSpan objects with the computed position and dimensions.

Table of Contents

Methods

__construct()  : mixed
extractFromPage()  : array<int, TextSpan>
Extract positioned text spans from a page dictionary.

Methods


        
On this page

Search results