Free PDF to JSON — tables, text and metadata

Extract JSON from PDF
Tables, structured text
or raw content.

Free PDF to JSON converter. Three modes: detect tables automatically, extract structured text with page layout, or get raw text per page. Powered by PDF.js — runs in your browser, no upload needed.

Your file never leaves your browser
Auto table detection
3 output modes
Metadata extraction
Always free
PDF to JSON Converter   100% client-side
Uses PDF.js (Mozilla) — the same engine that renders PDFs in Firefox and Chrome. Works with digital PDFs only — not scanned images. No data is uploaded.
Drop your PDF file here
Digital PDFs only — not scanned images
  JSON ready

      
How to extract JSON from a PDF

Three steps — upload your PDF and choose the output that fits your use case.

1
Upload a digital PDF

Drop or select a PDF created by software — exported from Excel, Google Docs, a reporting tool or generated by an application. PDF.js reads the file in your browser and shows the page count and metadata immediately. Scanned PDFs (images of text) are not supported.

2
Choose output mode

Select Table extraction to detect tabular data and output an array of objects with column headers as keys. Choose Structured text to get pages with lines and coordinates — preserving reading order. Use Raw text for the simplest output: plain text per page as a JSON array.

3
Copy or download the JSON

Click Extract and copy to clipboard or download a .json file. The output is valid JSON ready for further processing — load it into a pipeline, flatten it with the JSON to CSV tool, or inspect it in the JSON viewer.

Output modes explained

Three ways to extract content from a PDF — choose based on your data and use case.

ModeBest forOutput shape
Table extraction PDFs from Excel, reporting tools, financial exports Array of objects where each row becomes {"Column A": "value", "Column B": "value"}. Column headers are taken from the first detected row. Multiple tables on a page produce multiple arrays.
Structured text Documents, reports, articles where layout matters Array of pages, each with an array of lines. Each line contains the text content and its approximate Y position on the page — useful for preserving reading order and detecting section boundaries.
Raw text Simple text extraction, NLP pipelines, search indexing Array of page objects with a single text field containing all text from that page concatenated. The simplest output — one object per page, suitable for feeding into text processing tools.
When do you need PDF to JSON?

Common workflows where extracting JSON from a PDF is the first step.

Financial reports and statements

Bank statements, invoices, expense reports and financial summaries often arrive as PDF. Extracting to JSON lets you process the data programmatically — load it into a spreadsheet, import it into an accounting tool, or feed it to a data pipeline without manual re-entry.

Data exports from legacy systems

Older ERP, CRM and reporting systems often only export to PDF. Extracting the tabular data as JSON is the first step in migrating that data to a modern system — convert to JSON, then use the JSON to CSV tool to get a spreadsheet-ready file.

NLP and text processing

Research papers, legal documents and technical manuals in PDF format need text extraction before processing with NLP tools. Raw text mode extracts clean text per page — ready for tokenisation, embedding generation or keyword extraction.

Search and indexing

Building a search index over a document corpus requires extracting text from PDFs. Raw text mode produces a clean JSON structure with page-level text that can be indexed directly by Elasticsearch, Typesense or any full-text search engine.

Limitations — what this tool cannot do

Be aware of these constraints before using the tool.

Scanned PDFs: If your PDF was created by scanning a physical document, it contains images — not text characters. PDF.js cannot extract text from images. You will need an OCR tool (like Adobe Acrobat, Tesseract or Google Document AI) to first convert the scanned images to text.
Table detection accuracy: Table extraction works by clustering text items that share the same vertical position. This works reliably for PDFs exported from spreadsheet software. PDFs from Word, web pages or poorly formatted reports may produce misaligned columns or merged cells that the detector cannot handle correctly.
Complex layouts: Multi-column page layouts (newspapers, magazines, academic papers with sidebars) confuse the reading order reconstruction. Use Raw text mode for these — it concatenates all text items in the order PDF.js returns them, which may not match the visual reading order.
Password-protected PDFs: Encrypted PDFs cannot be opened by PDF.js without the password. The converter will report an error — you will need to remove the password protection first using a tool like Adobe Acrobat or PDFtk.
Related JSON tools

Process the extracted JSON with these tools next.

Popular searches
pdf to json converter convert pdf to json online pdf to json free extract json from pdf pdf table to json pdf to json javascript pdf to json python convert pdf to json file pdf to json online free pdf data extraction to json pdf text to json parse pdf to json pdf to json converter free

PDF parsed in
your browser. No upload.

JSONshift uses PDF.js — the open-source PDF engine developed by Mozilla, used in Firefox and Chrome to render PDFs — to extract content directly in your browser. Your PDF file is never transmitted to any server. Close the tab and it's gone.

Table extraction uses coordinate clustering: text items that share the same vertical position (within a 3-point tolerance) are grouped into rows, and significant horizontal gaps between items are used to detect column boundaries. This approach works reliably for PDFs created by software with precise text positioning.

Powered by PDF.js
Mozilla's open-source PDF engine — the same one in Firefox and Chrome. Handles complex PDF internals including embedded fonts, encoding tables and page transforms.
Coordinate-based table detection
Analyses text item positions (x, y, width) to detect rows and columns — no visual rendering needed. Works with PDFs that have no visible grid lines.
Honest about limitations
Scanned PDFs, complex layouts and encrypted files are clearly flagged — no false promises about extraction quality.
47 tools, always free
No file size limits, no watermarks, no account. Funded by non-intrusive display advertising only.
Frequently asked questions
Common questions about extracting JSON from PDF files.
How do I convert a PDF to JSON?
Upload your PDF, choose an output mode (table extraction, structured text or raw text), and click Extract. PDF.js reads the file in your browser and outputs the extracted content as JSON. No upload to any server is required.
Does the converter work with scanned PDFs?
No. Scanned PDFs contain images of text — PDF.js can only extract text from digital PDFs created by software. If your PDF was created by scanning a physical document, you need an OCR tool first (Adobe Acrobat, Tesseract, or Google Document AI) to extract the text.
What types of PDFs work best with table extraction?
PDFs exported from Excel, Google Sheets, LibreOffice Calc, Crystal Reports or similar reporting tools work best. These have precise text positioning that makes column detection reliable. PDFs from Word documents or web pages may produce less accurate results due to inconsistent text alignment.
What is the difference between the three output modes?
Table extraction detects rows and columns by coordinate analysis and outputs an array of objects — best for spreadsheet-style data. Structured text preserves page layout with lines and positions — best for documents where reading order matters. Raw text outputs plain text per page — simplest output, best for NLP and search indexing.
Is my PDF safe when using this converter?
Yes. PDF.js runs entirely in your browser. Your PDF file is never uploaded to any server. Open the Network inspector during conversion — you will see zero outbound data requests (except the one-time PDF.js CDN load on first use).
Is the PDF to JSON converter free?
Yes, completely free. No file size limits, no account required. JSONshift is funded by non-intrusive display advertising.
Go up