phpdftk API Documentation

TextExtractor
in package

FinalYes

Extracts text content from a PDF page by interpreting content stream operators.

Tracks text state (current font, position, spacing) and converts character codes to Unicode using:

  1. /ToUnicode CMap (if present on the font)
  2. /Encoding + /Differences (if present)
  3. WinAnsi → GlyphList fallback (for standard fonts)

Text positioning is used to insert spaces and newlines where the PDF moves the text cursor by significant amounts.

Table of Contents

Methods

__construct()  : mixed
extractFromPage()  : string
Extract text from a page dictionary.

Methods

extractFromPage()

Extract text from a page dictionary.

public extractFromPage(PdfDictionary $page) : string
Parameters
$page : PdfDictionary

The page dictionary (must have /Contents and /Resources)

Return values
string

        
On this page

Search results