phpdftk API Documentation

TextExtractor
in package

FinalYes

Extract text from PDFs — per page, full document, or with search.

Wraps PdfReader's text extraction with a friendly, toolkit-level API. All page numbers are 1-based.

Usage: $text = TextExtractor::open('report.pdf')->allPages();

$results = TextExtractor::open('contract.pdf')->search('indemnification'); foreach ($results as $match) { echo "Page {$match->pageNumber}: {$match->text}\n"; }

Table of Contents

Methods

allPages()  : string
Extract text from all pages, joined by a separator.
allPagesWithPositions()  : array<int, array<int, TextSpan>>
Extract text with precise positioning from all pages.
contains()  : bool
Check if a text string appears anywhere in the document.
getPageCount()  : int
getReader()  : PdfReader
open()  : self
openString()  : self
page()  : string
Extract text from a single page.
pageWithPositions()  : array<int, TextSpan>
Extract text with precise positioning from a single page.
perPage()  : array<int, string>
Extract text per page.
search()  : TextSearchResults
Search for a text string across all pages.
searchPattern()  : TextSearchResults
Search for a regex pattern across all pages.

Methods

allPages()

Extract text from all pages, joined by a separator.

public allPages([string $separator = "\n\n" ]) : string
Parameters
$separator : string = "\n\n"
Return values
string

allPagesWithPositions()

Extract text with precise positioning from all pages.

public allPagesWithPositions() : array<int, array<int, TextSpan>>
Return values
array<int, array<int, TextSpan>>

1-based page number => spans

contains()

Check if a text string appears anywhere in the document.

public contains(string $text) : bool
Parameters
$text : string
Return values
bool

getPageCount()

public getPageCount() : int
Return values
int

open()

public static open(string $path[, string $password = '' ]) : self
Parameters
$path : string
$password : string = ''
Return values
self

openString()

public static openString(string $pdfBytes[, string $password = '' ]) : self
Parameters
$pdfBytes : string
$password : string = ''
Return values
self

page()

Extract text from a single page.

public page(int $pageNumber) : string
Parameters
$pageNumber : int

1-based page number

Return values
string

pageWithPositions()

Extract text with precise positioning from a single page.

public pageWithPositions(int $pageNumber) : array<int, TextSpan>

Returns a list of TextSpan objects, each containing the text content, position (x, y in user space), dimensions (width, height), font size, and font name.

Parameters
$pageNumber : int

1-based page number

Return values
array<int, TextSpan>

perPage()

Extract text per page.

public perPage() : array<int, string>
Return values
array<int, string>

1-based page number => text


        
On this page

Search results