PositionedTextExtractor
in package
FinalYes
Extracts text with precise positioning from a PDF page.
Implements a full text state machine per ISO 32000-2 §9:
- Tracks the current transformation matrix (CTM) via
cmoperator - Tracks the text matrix (Tm) and text line matrix
- Applies character spacing (Tc), word spacing (Tw), horizontal scaling (Tz), text leading (TL), and text rise (Ts)
- Resolves glyph widths from font /Widths arrays, /W arrays (CID fonts), embedded font data, and standard font metrics (14 built-in fonts)
- Computes per-span bounding boxes in user space coordinates
Each text-showing operator (Tj, TJ, ', ") produces one or more TextSpan objects with the computed position and dimensions.
Table of Contents
Methods
- __construct() : mixed
- extractFromPage() : array<int, TextSpan>
- Extract positioned text spans from a page dictionary.
Methods
__construct()
public
__construct(ObjectResolver $resolver) : mixed
Parameters
- $resolver : ObjectResolver
extractFromPage()
Extract positioned text spans from a page dictionary.
public
extractFromPage(PdfDictionary $page) : array<int, TextSpan>
Parameters
- $page : PdfDictionary