phpdftk API Documentation

TextExtractor
in package

phpdftk

FinalYes

Extracts text content from a PDF page by interpreting content stream operators.

Tracks text state (current font, position, spacing) and converts character codes to Unicode using:

Text positioning is used to insert spaces and newlines where the PDF moves the text cursor by significant amounts.


    public
                    __construct(ObjectResolver $resolver) : mixed

Extract text from a page dictionary.


    public
                    extractFromPage(PdfDictionary $page) : string

$page : PdfDictionary: The page dictionary (must have /Contents and /Resources)

string