How to Create an AI Audiobook (Full Workflow)
From manuscript to finished audiobook: a complete guide to producing audiobooks with AI text-to-speech, including ACX/Audible specs.
The Traditional Audiobook Problem
Producing an audiobook the traditional way is a significant investment. A 10-hour audiobook (roughly 80,000 words) requires 40+ hours of studio time for recording and editing. Professional narrators charge $200 to $400 per finished hour, putting the total cost between $2,000 and $4,000 for a typical novel. Add studio rental, mastering, and quality control, and you can easily reach $5,000+.
For indie authors who have already spent months writing their book, that cost is often prohibitive. Many books never become audiobooks simply because the economics do not work for titles that might sell a few hundred copies.
AI text-to-speech changes the math. With Murmur, you can generate a full audiobook on your Mac for the cost of the software ($49). The process takes hours instead of weeks, and you can iterate, re-generate, and update chapters at will.
Step 1: Prepare Your Manuscript
Good audiobook production starts with good source text. Your manuscript needs specific preparation before TTS generation:
- Remove all formatting. Bold, italic, headers, and footnotes should be stripped. The TTS engine reads plain text.
- Split into chapter files. One text file per chapter keeps your project organized and makes regeneration easier.
- Handle dialogue tags explicitly. Instead of relying on quotation marks alone, ensure "said" tags are present so the voice engine can adjust delivery.
- Spell out numbers, abbreviations, and special characters. "$2.5M" should become "two and a half million dollars." "Dr." should become "Doctor."
- Add pause markers. A blank line between paragraphs creates a natural pause in most TTS engines. For scene breaks, add "[pause]" or simply leave extra blank lines.
- Remove front matter that does not belong in audio: table of contents, dedication pages (unless you want them narrated), and any visual elements like maps or diagrams.
Step 2: Choose Your Narrator Voice
For an audiobook, voice consistency is everything. Listeners will spend 6 to 12 hours with this voice. It needs to be pleasant, clear, and sustainable across long passages. In Murmur, audition voices with a full page of text, not just a sentence. A voice that sounds great for 10 seconds might become fatiguing after 10 minutes.
For non-fiction, choose a voice with authority and clarity. Kokoro voices excel here because they maintain consistent pacing and tone. For fiction, you want more expressiveness. Fish Audio S2 Pro produces the most natural prosody, handling dialogue and description shifts with genuine nuance. Chatterbox adds emotional range that works well for dramatic fiction.
Voice Cloning Option
If you want the audiobook in your own voice (author-narrated books sell well), Murmur's voice cloning feature lets you create a voice profile from a 10-second recording. The clone captures your pitch, timbre, and speaking rhythm. It will not be an exact replica, but it will be recognizably you. This gives you the personal touch of self-narration without the 40+ hours of recording.
Step 3: Generate Chapter by Chapter
Work through your book one chapter at a time. For each chapter: paste the text, verify the voice and speed settings match your previous chapters, generate, and listen to the first minute. If it sounds right, export and move to the next chapter. If a section sounds off (mispronunciation, odd pacing), edit that portion of the script and regenerate just that chapter.
For a 10-hour audiobook, expect the generation process to take a few hours on Apple Silicon hardware. Kokoro generates fastest (30 to 45 seconds per 1,500 words). Fish Audio S2 Pro takes longer (2 to 3 minutes per 1,500 words) but produces more polished output. Choose based on your quality requirements.
Step 4: Review and Polish
Listen to each chapter fully. Mark timestamps where you hear issues: mispronunciations, awkward pauses, unnatural emphasis. For most chapters, the output will be clean. For problem sections, adjust the script text (rephrase a sentence, add a comma for a pause, spell out a tricky word) and regenerate that chapter.
After all chapters pass review, normalize the volume across all files. Audio editing tools like Audacity (free) can batch-process volume normalization. This ensures chapter 1 is not noticeably louder or quieter than chapter 20.
Step 5: Meet ACX/Audible Technical Specs
If you plan to distribute through ACX (Amazon's audiobook platform for Audible), your files must meet specific technical requirements:
- Format: MP3 at 192kbps CBR (constant bit rate) or WAV at 44.1kHz.
- Each chapter must be a separate file, named sequentially.
- Peak volume: must not exceed -3dB.
- RMS (average volume): between -23dB and -18dB.
- Noise floor: below -60dB.
- Each file must have 0.5 to 1 second of room tone (silence) at the beginning and 1 to 5 seconds at the end.
- Opening credits file: must include the book title, author name, and narrator credit.
- Closing credits file: must include an end-of-book announcement.
Murmur exports clean WAV files that meet the sample rate and format requirements. Volume normalization and room tone can be added in Audacity using its built-in tools. The ACX Check plugin for Audacity can verify all technical specs before submission.
Timeline Comparison
Traditional audiobook recording: 40+ hours for a 10-hour book. AI generation with Murmur: a few hours of generation plus review time.
The timeline difference is dramatic. A traditional 10-hour audiobook requires 40+ hours of recording, 20+ hours of editing, and weeks of back-and-forth with narrators and engineers. With AI generation, you can go from manuscript to finished audiobook in a weekend. The actual generation takes a few hours. The bulk of your time goes to review and quality checking, which is the part that should take time.
Frequently Asked Questions
Turn your manuscript into an audiobook.
Generate a full audiobook on your Mac. Chapter by chapter, one consistent voice, no studio required. $49, one time.
macOS 14+ · Apple Silicon required · 7-day refund policy