How text-to-speech synthesis works

This tool uses the browser’s built-in Web Speech API (SpeechSynthesis interface). The API passes your text to the operating system’s TTS engine, which applies concatenative or neural synthesis depending on the voice:

Concatenative TTS: stitches together recorded audio fragments. Sounds natural but may have audible joins at unusual words.
Neural TTS: uses a trained neural network to generate a waveform from phonemes. Sounds more natural and handles unusual words better.

Available voices

Available voices depend on your operating system and browser. macOS offers voices like Alex and Samantha. Windows offers Microsoft David, Zira, and Hazel. Chrome adds Google-voiced options. More languages are available through system language settings.

Accessibility use cases

Listen to long articles or documents while multitasking.
Proofreading: hearing your writing read aloud reveals awkward phrasing your eye skips over.
Language learning: hear pronunciation of unfamiliar words.
Accessibility support for users with dyslexia or low vision.

Voice selection tips

Browser TTS voices fall into two categories:

Concatenative voices: built from recorded speech fragments spliced together. Sound natural for short phrases but become robotic on unusual words.
Neural / expressive voices: generated by machine learning models. Notably more natural prosody, pacing, and intonation. Examples: Windows "Microsoft Aria (Natural)", "Microsoft Jenny (Natural)". macOS and iOS "Siri" voices are also neural.

To install additional neural voices on Windows 11: Settings -> Accessibility -> Narrator -> Add natural voices. On macOS: System Settings -> Accessibility -> Spoken Content -> System Voice -> Customize.

SSML support

The Web Speech API supports basic Speech Synthesis Markup Language (SSML) in some browsers, allowing you to control pauses, emphasis, and pronunciation. Example:

<speak>
  The answer is <emphasis level="strong">forty-two</emphasis>.
  <break time="500ms"/> That's the meaning of life.
</speak>

SSML support varies widely - Chrome supports a subset; Firefox and Safari support is limited.

Browser compatibility notes

Chrome (desktop): limits individual utterances to approximately 15 seconds. Longer text must be split into chunks. This tool handles chunking automatically.
iOS Safari: speech synthesis requires a user gesture (tap) to start - it cannot be triggered programmatically on page load.
Firefox: voice availability varies by OS and Firefox version; some installations may show only one or two voices.