Transcripts are text versions of audio and video content that include not only spoken dialogue but also descriptions of important sounds, visual information, and context. Unlike captions, which are synchronized with the media, transcripts are standalone documents that provide complete access to multimedia content.
For many users—particularly those who are deafblind—transcripts are the only way to access audio and video content. This comprehensive guide covers when transcripts are needed, what to include, and how to present them effectively.
What Are Transcripts?
Transcripts go beyond simple dialogue transcription. A complete transcript includes:
- All spoken words: Dialogue, narration, and speech
- Speaker identification: Who is speaking at any given time
- Sound descriptions: Important audio cues, sound effects, and background sounds
- Visual descriptions: Actions, expressions, on-screen text, and visual context
- Context: Tone, mood, and atmosphere conveyed through audio or visual means
Think of a transcript as a complete written representation of the multimedia experience—everything a viewer sees and hears, translated into text.
Who Benefits from Transcripts?
People who are deaf or hard of hearing: Transcripts provide access to audio content, including sound effects and music descriptions that captions may not include.
People who are deafblind: Transcripts are the only way deafblind users can access multimedia content. They read transcripts using refreshable braille displays.
Non-native speakers: Transcripts help people learning a language understand content at their own pace, look up unfamiliar words, and improve comprehension.
People in quiet environments: Users in libraries, meetings, or other sound-sensitive locations can read transcripts instead of playing audio.
People with cognitive disabilities: Some users process written information better than audio or video, or need to review content multiple times.
People with slow internet connections: Transcripts load faster than video or audio files and use minimal bandwidth.
Search engines and SEO: Transcripts make multimedia content searchable and indexable, improving discoverability.
Everyone: Transcripts allow skimming, searching for specific information, quoting, translating, and referencing content quickly.
When Are Transcripts Required?
Prerecorded Audio-Only Content
Required: Text transcripts must accompany all prerecorded audio-only content (podcasts, audio recordings, audio lectures).
The transcript should include:
- All dialogue and narration
- Speaker identification
- Significant sounds and sound effects
- Contextual descriptions
Placement: Transcripts must be easily accessible, placed on the same page as the audio player or linked prominently near the audio.
Prerecorded Video-Only Content
Required: Video-only content must include either audio descriptions or a text transcript (transcript is better).
The transcript should describe:
- All visual actions and events
- Text that appears on screen
- Facial expressions and body language
- Scene changes and visual transitions
- Graphics, charts, or diagrams
Why transcripts are better: Audio descriptions help blind users who can hear, but transcripts help both blind users and deafblind users. Transcripts are the more inclusive option.
Prerecorded Multimedia (Video + Audio)
Recommended (WCAG Level AAA): Provide transcripts for all prerecorded multimedia content.
While not required at WCAG Level AA, transcripts are essential for deafblind users who cannot access captions (visual) or audio descriptions (auditory). Transcripts are also valuable for:
- Searchability
- Translation
- Referencing specific content
- Users who prefer reading
Best practice: Provide both captions (for synchronization with video) and transcripts (for deafblind access and other benefits).
Live Content
Optional: Transcripts can be provided after live content ends if the recording is made available.
Live transcripts during streaming are challenging and typically not required, but providing transcripts afterward when archiving live content is helpful.
What to Include in Transcripts
Dialogue and Speech
Verbatim for scripted content: When content follows a script, transcripts must include all words exactly as spoken, including intentional filler words (“um,” “uh”) that are part of the script.
NARRATOR: Well, um, let me tell you about, uh, accessibility.
Verbatim for unscripted content (with discretion): For unscripted content like interviews, broadcasts, or documentaries, transcripts should be verbatim but may omit excessive filler words that impair readability:
Good:
INTERVIEWER: Tell me about your experience.
GUEST: I started working in accessibility about five years ago.
Acceptable (edited for readability):
INTERVIEWER: So, um, tell me about, you know, your experience.
GUEST: Well, uh, I started working in accessibility, like, about five years ago.
Edited version:
INTERVIEWER: Tell me about your experience.
GUEST: I started working in accessibility about five years ago.
Use discretion to balance accuracy with readability.
Speaker Identification
Format: Use speaker names or roles in ALL CAPS followed by a colon, then the spoken text in mixed case:
SARAH: I love this project.
INTERVIEWER: What makes it special?
DR. CHEN: The research shows promising results.
NARRATOR: Three years later...
When to identify speakers:
- At the beginning of each speaking turn
- When speakers change
- When it’s not obvious who is speaking
- For off-screen voices
Visual Information
Describe important visual events: Include descriptions of visual information that contributes to understanding:
[Sarah enters the room carrying a laptop]
SARAH: I have the presentation ready.
[She connects the laptop to the projector]
On-screen text: Include any text that appears visually:
[Title card: "Five Years Later"]
[Text on screen: "Contact us at info@example.com"]
Actions and expressions: Note significant facial expressions, gestures, or body language:
JOHN [smiling]: That's the best news I've heard all day.
MARY [shaking her head]: I don't think so.
Sound Descriptions
Background sounds: Describe important background sounds in [brackets] or (parentheses):
[Phone rings]
ALEX: Excuse me, I need to take this.
[Door closes]
[Footsteps fading away]
Sound effects: Note significant sound effects that contribute to meaning:
[Thunder rumbles]
[Glass shatters]
[Applause]
[Laughter]
Music: Identify music when it’s relevant, including title and artist if known:
[MUSIC: "Bohemian Rhapsody" by Queen]
[Upbeat jazz music plays]
[Ominous background music]
Tone and delivery: When relevant, note how something is said:
MANAGER [sarcastically]: Oh, that's just wonderful.
CHILD [whispering]: Is it safe to come out now?
SPEAKER [shouting]: Everyone needs to evacuate!
Music and Lyrics
Identify music: When music is significant to the content, identify it:
[MUSIC: "Happy Birthday" sung by the group]
Include relevant lyrics: When lyrics contribute to meaning, include them with a singing indicator:
[Singing]: ♪ We wish you a Merry Christmas ♪
Or:
[GROUP SINGING]: Happy birthday to you, happy birthday to you
Special Formatting
Whispered or mouthed speech: Indicate the manner of delivery:
SARAH [whispering]: Don't let them know we're here.
JOHN [mouthing silently]: Are you okay?
Off-screen speech: Indicate when speech comes from off-screen:
VOICE [off-screen]: Is anyone there?
Or:
[Through telephone]: Hello, can you hear me?
Inaudible or unclear speech: When speech can’t be understood, note it neutrally:
SPEAKER: We need to... [unclear] ...by tomorrow.
[Inaudible conversation in background]
Avoid judgmental terms like “unintelligible” or “incoherent babbling.”
Strong language: Retain strong language as it appears in the audio:
CHARACTER: This is [BLEEP] ridiculous!
SPEAKER: What the h--- is going on?
If audio is bleeped, use [BLEEP]. If partially muted, use dashes or ellipses.
Formatting Guidelines
Use punctuation for emphasis: Let punctuation convey tone and emphasis rather than adding descriptive text:
Good:
SARAH: I can't believe it!
Less good:
SARAH [excitedly]: I can't believe it!
Use descriptive labels only when punctuation isn’t sufficient.
Maintain suspense: Don’t reveal information before it’s presented in the audio or video:
Bad:
[John will reveal that he's the killer]
DETECTIVE: Who did it?
Good:
DETECTIVE: Who did it?
JOHN: I did.
[Everyone gasps in shock]
Use consistent formatting:
- ALL CAPS for speaker names and descriptive labels
- Mixed case for dialogue and narration
- [Brackets] or (parentheses) for sound descriptions
- Colons after speaker names
Methods of Presenting Transcripts
Directly on the Page
Best for accessibility: Place the transcript on the same page as the media player, directly below or above the video or audio.
Advantages:
- Immediately accessible
- No navigation required
- Improves SEO (search engines index the content)
- Easy to find and use
Implementation:
<video controls>
<source src="video.mp4" type="video/mp4" />
</video>
<section>
<h2>Transcript</h2>
<div class="transcript">
<!-- Transcript content here -->
</div>
</section>
Link to Separate Page
Good for long transcripts: Provide a prominent link to a separate transcript page.
Advantages:
- Doesn’t clutter the media page
- Better for very long transcripts
- Can be printed or shared separately
Implementation:
<video controls>
<source src="video.mp4" type="video/mp4" />
</video>
<p>
<a href="/video-transcript">Read full transcript</a>
</p>
Important: Make the link prominent and place it immediately next to or below the media player.
Expandable/Collapsible Section
Good compromise: Use an expandable section that keeps transcript nearby but hidden by default.
Advantages:
- Keeps page clean
- Makes transcript easy to find
- No page navigation required
Implementation:
<video controls>
<source src="video.mp4" type="video/mp4" />
</video>
<details>
<summary>Show transcript</summary>
<div class="transcript">
<!-- Transcript content here -->
</div>
</details>
Interactive Transcripts
Best user experience: Interactive transcripts allow users to click any sentence to jump to that point in the video or audio.
Advantages:
- Excellent for navigation
- Makes content searchable within the page
- Improves SEO with timestamps
- Enhances user experience for everyone
Features:
- Click any sentence to jump to that moment
- Highlight current sentence as media plays
- Search within the transcript
- Share links to specific timestamps
Implementation: Requires JavaScript to sync transcript with media playback. Many media players offer plugins or built-in support for interactive transcripts.
Example platforms with interactive transcripts:
- YouTube (automatic transcript feature)
- Vimeo
- Custom implementations with libraries like Able Player
Transcript Example
VIDEO TRANSCRIPT: "Introduction to Web Accessibility"
[Opening music plays]
[Title card: "Web Accessibility 101"]
SARAH: Hello, and welcome to our course on web accessibility.
[Sarah appears on screen in a bright office, sitting at a desk]
SARAH: My name is Sarah Chen, and I'll be your instructor for this course.
[Slide appears: "What is Web Accessibility?"]
SARAH: Web accessibility means making websites and applications that everyone can use, regardless of disability.
[Phone rings in the background]
SARAH [laughing]: Sorry about that!
[She silences the phone]
SARAH: Where was I? Oh yes, accessibility benefits everyone, not just people with disabilities.
[Slide appears with four icons representing different disabilities]
SARAH: Throughout this course, we'll explore how people with vision, hearing, motor, and cognitive disabilities use the web.
[Upbeat music plays]
SARAH: Let's get started!
[End card: "Next lesson: Screen Readers 101"]
[Music fades out]
Transcript Quality Checklist
Before publishing transcripts, verify:
- All dialogue is captured accurately
- Speakers are identified consistently
- Important sounds are described
- Visual information is included
- Music and lyrics are noted when relevant
- Formatting is consistent throughout
- No spoilers or premature information reveals
- Punctuation conveys emphasis appropriately
- Tone and delivery are noted when important
- Transcript is easy to find and access
- Formatting uses mixed case for dialogue, ALL CAPS for labels
- Sound descriptions use [brackets] or (parentheses)
Conclusion
Transcripts are essential accessibility features that provide complete access to multimedia content for all users, particularly those who are deafblind. While captions work alongside audio and video, transcripts stand alone as comprehensive text documents that capture every element of the multimedia experience.
Key takeaways:
- Transcripts must include dialogue, sounds, and visual descriptions
- Required for prerecorded audio-only content
- Recommended for all multimedia content (essential for deafblind users)
- Should be easy to find—placed on the same page or clearly linked
- Use consistent formatting with speaker names in ALL CAPS
- Include context through sound and visual descriptions
- Consider interactive transcripts for the best user experience
Remember: Transcripts benefit everyone, not just users with disabilities. They improve SEO, enable searching, support translation, and allow users to access content in various contexts and at their own pace. By providing quality transcripts, you make your content more accessible, discoverable, and valuable to all users.
Creating transcripts requires effort, but the impact is profound—you’re ensuring equal access to information and entertainment for millions of users worldwide.
Resources
Transcript creation tools:
- Rev.com - Professional transcription services
- Otter.ai - AI-powered transcription
- YouTube - Automatic transcripts (edit for accuracy)
- Trint - Automated transcription with editing tools
Interactive transcript solutions:
- Able Player - Open-source accessible media player with interactive transcripts
- AblePlayer - Features interactive transcript functionality
- Video.js - Custom solutions with libraries
Accessibility guidelines: