Why convert a PDF to audio?
Reading long PDFs — research papers, legal contracts, textbooks, reports — is time-consuming and tiring on the eyes. Converting them to audio lets you listen while commuting, exercising, or doing other tasks. It's also essential for people with visual impairments, dyslexia, or anyone who absorbs information better by listening than reading.
How to convert a PDF to audio for free
- Open ihatepdf.cv/pdf-to-audio in your browser
- Wait a moment for the neural TTS model to load (about 26MB, cached after first use)
- Drop your PDF onto the upload area — text-based and scanned PDFs are both supported
- Click Generate All to start synthesis
- Press play to listen while synthesis continues in the background
- Optionally export the audio as a WAV file for offline listening
Everything runs locally in your browser. Your PDF is never uploaded to any server.
Does it work on scanned PDFs?
Yes. The tool uses Tesseract.js to run OCR (optical character recognition) on each page of a scanned PDF, extracting the text before passing it to the speech synthesiser. This means scanned books, photocopied documents, and image-only PDFs are all supported — not just PDFs created digitally.
The neural TTS engine — how it works
The tool uses the Xenova/mms-tts-eng model — a quantised neural text-to-speech model that runs entirely in the browser via WebAssembly (ONNX Runtime). Three parallel web workers synthesise batches of text simultaneously, keeping generation fast even on long documents. The audio chain includes a high-pass filter, voice EQ, de-esser, gain control, and a dynamics compressor for natural-sounding output.
Five voice styles
- Standard — clear and neutral. The default for most documents
- Deep — boosted bass frequencies for a rich, full voice. Good for long listening sessions
- Bright — emphasized high frequencies for forward presence. Helps with clarity in noisy environments
- Warm — smooth and mellow tone. Easy on the ears for extended listening
- Crisp — ultra-high clarity with boosted high shelf. Best for technical content where pronunciation clarity matters
Voice style changes apply instantly during playback — no need to re-generate.
Seamless playback — no gaps between sentences
A common problem with browser-based TTS tools is audible pauses between sentences as the audio engine switches between chunks. This tool solves that by concatenating all consecutive ready audio segments into a single buffer before playback begins. The result is gap-free, natural-sounding speech with no stuttering at sentence boundaries. The only pause occurs when playback reaches the synthesis frontier — where the model hasn't finished the next batch yet.
Playback controls
- Speed — 0.5× to 2× playback rate, adjustable while playing
- Volume — 0–100%, adjustable while playing
- Page navigation — jump to any page via the page grid or arrow buttons
- Keyboard shortcuts — Space (play/pause), ←/→ (page back/forward), Esc (stop)
Exporting audio as WAV
Once a page or the full document is synthesised, you can export the audio:
- This Page — exports the current page as a WAV file
- Full Doc — exports all synthesised pages as a single WAV file
- All Pages (ZIP) — exports each page as a separate WAV file, packaged in a ZIP
All exports are 16-bit PCM, 16 kHz mono WAV. To convert to MP3 for smaller file size, use any free audio converter after downloading. ihatepdf never adds watermarks to exported audio files.
How long does generation take?
On a modern laptop, synthesis runs roughly 2–5× faster than real-time — meaning a 10-minute audio output takes 2–5 minutes to generate. Generation speed depends on your device's CPU. You can start listening immediately after the first batch finishes (typically within 15–30 seconds) while the rest of the document synthesises in the background.
Frequently asked questions
Do I need to install any software?
No. The neural TTS model runs entirely in your browser using WebAssembly. No extensions, apps, or plugins are needed. Works in Chrome, Firefox, Edge, and Safari.
Can I use this on my phone?
Yes. The tool works on mobile browsers. Generation may be slower on phones due to limited CPU resources, but playback is fully functional.
Is there a page or file size limit?
No server limit — the constraint is your device memory. Very long PDFs (200+ pages) work fine but will take more time to fully synthesise.
Does the audio have a watermark?
No. ihatepdf never adds watermarks or audio branding to any exported file.
Does it support languages other than English?
The current model (mms-tts-eng) is optimised for English. Other language models may be added in future updates.