TextExtractor
in package
Extract text from PDFs — per page, full document, or with search.
Wraps PdfReader's text extraction with a friendly, toolkit-level API. All page numbers are 1-based.
Usage: $text = TextExtractor::open('report.pdf')->allPages();
$results = TextExtractor::open('contract.pdf')->search('indemnification'); foreach ($results as $match) { echo "Page {$match->pageNumber}: {$match->text}\n"; }
Table of Contents
Methods
- allPages() : string
- Extract text from all pages, joined by a separator.
- allPagesWithPositions() : array<int, array<int, TextSpan>>
- Extract text with precise positioning from all pages.
- contains() : bool
- Check if a text string appears anywhere in the document.
- getPageCount() : int
- getReader() : PdfReader
- open() : self
- openString() : self
- page() : string
- Extract text from a single page.
- pageWithPositions() : array<int, TextSpan>
- Extract text with precise positioning from a single page.
- perPage() : array<int, string>
- Extract text per page.
- search() : TextSearchResults
- Search for a text string across all pages.
- searchPattern() : TextSearchResults
- Search for a regex pattern across all pages.
Methods
allPages()
Extract text from all pages, joined by a separator.
public
allPages([string $separator = "\n\n" ]) : string
Parameters
- $separator : string = "\n\n"
Return values
stringallPagesWithPositions()
Extract text with precise positioning from all pages.
public
allPagesWithPositions() : array<int, array<int, TextSpan>>
Return values
array<int, array<int, TextSpan>> —1-based page number => spans
contains()
Check if a text string appears anywhere in the document.
public
contains(string $text) : bool
Parameters
- $text : string
Return values
boolgetPageCount()
public
getPageCount() : int
Return values
intgetReader()
public
getReader() : PdfReader
Return values
PdfReaderopen()
public
static open(string $path[, string $password = '' ]) : self
Parameters
- $path : string
- $password : string = ''
Return values
selfopenString()
public
static openString(string $pdfBytes[, string $password = '' ]) : self
Parameters
- $pdfBytes : string
- $password : string = ''
Return values
selfpage()
Extract text from a single page.
public
page(int $pageNumber) : string
Parameters
- $pageNumber : int
-
1-based page number
Return values
stringpageWithPositions()
Extract text with precise positioning from a single page.
public
pageWithPositions(int $pageNumber) : array<int, TextSpan>
Returns a list of TextSpan objects, each containing the text content, position (x, y in user space), dimensions (width, height), font size, and font name.
Parameters
- $pageNumber : int
-
1-based page number
Return values
array<int, TextSpan>perPage()
Extract text per page.
public
perPage() : array<int, string>
Return values
array<int, string> —1-based page number => text
search()
Search for a text string across all pages.
public
search(string $text) : TextSearchResults
Parameters
- $text : string
Return values
TextSearchResultssearchPattern()
Search for a regex pattern across all pages.
public
searchPattern(string $regex) : TextSearchResults
Parameters
- $regex : string