Extract PDF Data Free: No AI Costs with Power Automate

C
Collab365 TeamEditorialPublished Mar 30, 2026
7

At a Glance

Target Audience
Power Automate Developers, Business Process Analysts
Problem Solved
Manual PDF data extraction from invoices/contracts and expensive paid AI services ($500+/mo).
Use Case
Batch automating invoice PDF processing to pull numbers, dates, emails, line items into lists.

PDFs bury goldmines of data in every contract and invoice. But digging it out manually? That's hours down the drain. Paid AI extractors charge $500 a month easy. Skip the bill. Grab Power Automate Desktop. It's free, sits on your machine, and chews through files like nothing else.

Head to Power Automate, hit the download in the top right. Installs in minutes.

pad install
pad install

PAD turns your desktop into a bot factory. No cloud limits. Just point it at local PDFs and watch it rip out text, tables, even images. Repeatable. Fast. Yours to tweak.

pad
pad

Start simple. Text first. Drop in the Extract text from PDF action. Feed it a file. Boom, variable full of raw content. Copy-paste days over.

Tables next. That Extract tables from PDF action spots any grid, no matter the mess. Outputs a list. Grab row 0, column "DESCRIPTION" like this: %ExtractedPDFTables[1].DataTable[0]['DESCRIPTION']%. Invoice lines? Yours in seconds.

Images too. Extract images from PDF pulls diagrams and charts clean. Pass them downstream, full res.

Data in hand, now mine it. Recognize entities in text snags dates, emails, URLs automatically. English mode. One shot.

Stuck on wonky layouts? Parse text finds "INVOICE NO", notes the position. Add 11 chars, then Get subtext pulls the number. Rule-based. Dead reliable.

Need patterns? Regex. Test on Regexstorm. Flip "Is Regular Expression" in Parse text. Matches (?<=INVOICE NO\r\n.*\r\n).+? Grabs the rest.

regex
regex

Tables deeper? Loop the list. %ExtractedPDFTables[1]% for the second one. Syntax clicks fast.

Here's a full flow. Paste into new PAD desktop flow. Grab this demonstration invoice, tweak the path. Run it. Watch variables fill on the right.

Folder.GetFiles Folder: $"C:\Users\jonm_\Collab365\Collab365 – Documents\Academy Events" FileFilter: "*.pdf" IncludeSubfolders: False FailOnAccessDenied: True SortBy1: Folder.SortBy.Name SortDescending1: False SortBy2: Folder.SortBy.NoSort SortDescending2: False SortBy3: Folder.SortBy.NoSort SortDescending3: False Files=> PDFFiles
LOOP FOREACH PDFCurrentItem IN PDFFiles
Pdf.ExtractTextFromPDF.ExtractText PDFFile: PDFCurrentItem DetectLayout: False ExtractedText=> ExtractedPDFText
# Get the items we can identify easily – email address, date of invoice
**REGION Entity Extraction**
Text.RecognizeEntitiesInText Text: ExtractedPDFText Mode: Text.RecognizerMode.DateTime Language: Text.RecognizerLanguage.English RecognizedEntities=> varDate
Text.RecognizeEntitiesInText Text: ExtractedPDFText Mode: Text.RecognizerMode.Email Language: Text.RecognizerLanguage.English RecognizedEntities=> varContact
**ENDREGION**
**REGION Parse And Get subtext**
Text.ParseText.ParseForFirstOccurrence Text: ExtractedPDFText TextToFind: "INVOICE NO" StartingPosition: 0 IgnoreCase: False OccurrencePosition=> Position
Variables.IncreaseVariable Value: Position IncrementValue: 11
Text.GetSubtext.GetSubtext Text: ExtractedPDFText CharacterPosition: Position NumberOfChars: 7 Subtext=> varInvoiceNumber
**ENDREGION**
**REGION Use RegEx**
Text.ParseText.RegexParseForFirstOccurrence Text: ExtractedPDFText TextToFind: @"(?<=INVOICE NO\r\n.*\r\n).+" StartingPosition: 0 IgnoreCase: False Match=> varInvoiceName
**ENDREGION**
**REGION Extract From a Table in PDF**
Pdf.ExtractTablesFromPDF.ExtractTables PDFFile: PDFCurrentItem MultiPageTables: True SetFirstRowAsHeader: True ExtractedPDFTables=> ExtractedPDFTables
LOOP FOREACH CurrentTable IN ExtractedPDFTables
SET DESCRIPTION TO ExtractedPDFTables[1].DataTable[0]['DESCRIPTION']
END
**ENDREGION
END

Tweak it. Loop table rows like this:

getrow
getrow

Trim junk whitespace:

trim
trim

Build a clean list. Create new list, add items in loop:

list
list

Grab from it: %ListOfItems[0]['DESCRIPTION']%.

Processes like this saved my team days weekly. No vendor lock.