phpdftk API Documentation

PositionedTextExtractor
in package

phpdftk

FinalYes

Extracts text with precise positioning from a PDF page.

Implements a full text state machine per ISO 32000-2 §9:

Tracks the current transformation matrix (CTM) via cm operator
Tracks the text matrix (Tm) and text line matrix
Applies character spacing (Tc), word spacing (Tw), horizontal scaling (Tz), text leading (TL), and text rise (Ts)
Resolves glyph widths from font /Widths arrays, /W arrays (CID fonts), embedded font data, and standard font metrics (14 built-in fonts)
Computes per-span bounding boxes in user space coordinates

Each text-showing operator (Tj, TJ, ', ") produces one or more TextSpan objects with the computed position and dimensions.

__construct() : mixed
extractFromPage() : array<int, TextSpan>: Extract positioned text spans from a page dictionary.


    public
                    __construct(ObjectResolver $resolver) : mixed

Extract positioned text spans from a page dictionary.


    public
                    extractFromPage(PdfDictionary $page) : array<int, TextSpan>

array<int, TextSpan>