What hidden data can a PDF contain?
A PDF file is more than the text and images you see on screen. Embedded inside the file structure is a layer of metadata — information about the document's history, creation, and authors — that is invisible during normal reading but fully accessible to anyone who knows where to look. Before sharing a PDF externally, it's worth knowing exactly what data you're inadvertently sharing along with it.
Common hidden data found in PDFs includes:
- Author name — the full name of the person who created or last saved the document, pulled from the OS user account at the time of creation
- Company name — the organization name registered in the software license
- Creation date and time — the exact timestamp when the document was first created, down to the second and timezone
- Modification date — when the document was last saved, and potentially a history of every save
- Software and version — which application created the PDF (e.g. "Microsoft Word 16.0.14931", "Adobe InDesign 18.3") and the OS version
- GPS coordinates — images embedded in the PDF may contain EXIF location data from when they were photographed
- Tracked changes and comments — deleted text and reviewer comments may still be present in the file even if not visible on screen
- Previous authors — if the document changed hands, earlier author names may be present in revision history
- Custom XMP properties — application-specific metadata fields that can contain anything the creating software chose to store
How to scan a PDF for hidden data free
- Open ihatepdf.cv/privacy-scanner — no sign-up required
- Upload the PDF you want to check before sharing
- Click Scan for Privacy Risks
- Review the detected metadata — each field is listed with its value and a risk rating (Low / Medium / High)
- Choose which fields to strip, then download a clean copy — no watermark
The scanner reads the PDF structure entirely in your browser. Your file is never uploaded to any server.
Real-world privacy risks from PDF metadata
These aren't theoretical risks — PDF metadata has caused real problems:
- Revealed identity in anonymous submissions — academic papers, whistleblower documents, and anonymous legal filings have exposed their authors through metadata left in the PDF
- Exposed internal software stack — a PDF sent to a client or partner can reveal your internal tooling, version numbers, and infrastructure through creator application metadata
- GPS data from embedded photos — a contract or report with an embedded photo taken at your office can expose your precise address via EXIF coordinates
- Draft text in revision history — deleted or revised text that was never meant to be seen can persist in the PDF's internal object stream, visible to anyone who knows how to extract it
Who should scan their PDFs before sharing?
- Lawyers and legal professionals — filings and disclosures should not reveal drafting history or internal firm metadata
- Journalists and researchers — documents shared publicly or with sources should not contain identifying author information
- Businesses sending proposals or contracts — client-facing documents should not expose internal software, staff names, or creation timestamps
- Anyone sharing documents anonymously — job applications, whistleblower submissions, anonymous feedback — any context where identity should be protected
- Photographers embedding images in PDFs — location data in EXIF should be reviewed before sharing portfolios or client deliverables
How to remove PDF metadata
After the scanner identifies what metadata is present, you can select which fields to strip and download a clean version. The cleaned PDF looks identical on screen — the metadata removal is silent and invisible. Alternatively, the PDF Flattener with the XMP metadata option enabled will strip all embedded metadata at once along with interactive elements.
Frequently asked questions
Will stripping metadata change how the PDF looks?
No. Metadata is stored separately from the visible content. Removing it has no effect on the text, images, or layout of the document.
Does this work on password-protected PDFs?
You need to remove the password first to allow the scanner to read the full file structure.
Is my PDF uploaded to scan it?
No. The scanner reads and analyzes the PDF structure entirely in your browser. Your file never leaves your device.
Does the cleaned PDF have a watermark?
No. ihatepdf never adds watermarks to any output file.