Skip to main content

Captions for Video and Audio Content

Captions are essential for making audio and video content accessible to everyone. They’re text-based versions of spoken content that display synchronized with the audio, ensuring that all users—regardless of hearing ability—can access your multimedia content.

This comprehensive guide covers everything you need to know about creating effective, accessible captions that meet WCAG standards and provide an excellent user experience.

Who Benefits from Captions?

Captions help far more people than you might expect:

People who are deaf or hard of hearing: Captions provide access to audio content they otherwise couldn’t perceive.

Non-native speakers: Captions help people learning a language follow along and improve comprehension.

People in sound-sensitive environments: Users in libraries, offices, or public spaces can watch videos without disturbing others.

People in noisy environments: Captions make content accessible on buses, trains, or busy locations where audio is difficult to hear.

People who process information better visually: Some users simply prefer reading along while listening to reinforce understanding.

Search engines: Captions provide searchable text content, improving SEO and content discoverability.

Captions vs. Subtitles

While often used interchangeably, captions and subtitles serve different purposes:

Captions:

  • Designed for people who are deaf or hard of hearing
  • Include spoken dialogue and sound descriptions
  • Describe important audio information (music, sound effects, speaker identification)
  • Assume the viewer cannot hear the audio

Subtitles:

  • Designed for people who can hear but don’t understand the language
  • Provide translation of dialogue only
  • Assume the viewer can hear sound effects and music
  • Focus solely on spoken words

For accessibility, you need captions, not just subtitles.

Open Captions vs. Closed Captions

Open Captions

Open captions are permanently embedded (“burned in”) to the video during production. They’re always visible and cannot be turned off.

Advantages:

  • Work with any media player
  • Guaranteed to be visible
  • No need for caption file support
  • Ideal for presentations and conferences

Disadvantages:

  • Cannot be turned off
  • Cannot be customized by users
  • Require separate video files for different languages

Closed Captions

Closed captions are separate text files synchronized with the video. Users can toggle them on and off.

Advantages:

  • Users control visibility
  • Can be customized (size, color, position)
  • Support multiple languages with one video file
  • Can be updated without re-encoding video

Disadvantages:

  • Require media player support
  • Depend on proper file formatting
  • May not display if files are missing or incompatible

Recommendation: Use closed captions for online videos where users have player controls. Use open captions for presentations, conferences, or situations where you can’t guarantee caption support.

What to Include in Captions

Required Elements (WCAG 2.1 AA)

Verbatim dialogue for scripted content: Include all spoken words exactly as said, including filler words like “um” and “uh” for scripted content. This preserves the intended delivery and timing.

All off-screen speech: Capture any dialogue spoken by people not visible on screen.

Speaker identification: Identify speakers when:

  • It’s not obvious who’s speaking
  • The speaker is off-screen
  • Multiple people are speaking
  • Context doesn’t make it clear

Background sounds: Describe important sounds using [brackets] or (parentheses):

  • [phone ringing]
  • [door slams]
  • [applause]

No spoilers: Don’t reveal information before it’s meant to be known. Captions should maintain the same suspense and pacing as the audio.

Conventional spelling: Use standard spelling, not phonetic. Write “going to” not “gonna” unless phonetic spelling is essential to meaning.

Sound descriptions: Describe sounds by what they are, not what causes them:

  • Good: [thunder]
  • Avoid: [storm approaching]

Punctuation for emphasis: Use proper punctuation to convey tone. Avoid adding extra words to explain emphasis.

Music identification: Identify music by title and artist when appropriate and relevant to content.

Important lyrics: Include relevant lyrics verbatim, set off with music notes (♪):

  • ♪ Happy birthday to you ♪

Inaudible speech: When speech is unclear, note it neutrally:

  • [inaudible]
  • [unclear]

Avoid judgmental terms like “unintelligible.”

Whispered or mouthed speech: Indicate the manner of speaking:

  • (whispering): I can’t believe it
  • (mouthing): Are you okay?

Retain strong language: Don’t censor captions unless the audio is also censored. Match the content’s intended audience.

Discretionary Elements

Unscripted content: For live broadcasts, documentaries, and interviews, captions should be verbatim but may omit excessive filler words that impair readability. Balance accuracy with ease of reading.

Visual Presentation Guidelines

Text Display

Line limits:

  • Maximum 3 lines per caption
  • Prevents blocking important visual content
  • Improves reading speed and comprehension

Character limits:

  • Maximum 32 characters per line
  • Ensures readability at standard viewing distances

Line breaks: Insert breaks at logical points between phrases, not mid-sentence:

Good:
I never thought
I'd see you again.

Bad:
I never thought I'd
see you again.

Sentence breaks: Split longer sentences at natural grammatical breaks (conjunctions, prepositions, articles):

Good:
The meeting was postponed
because the CEO was traveling.

Bad:
The meeting was postponed because
the CEO was traveling.

Typography

Case: Use mixed case (standard capitalization) for better readability. AVOID ALL CAPS except for emphasis or titles when appropriate.

Font:

  • Default to sans-serif fonts
  • Normal weight (not bold—harder to read)
  • Avoid decorative or script fonts

Titles: Use quotation marks and standard capitalization for titles:

  • “The Great Gatsby”
  • “Bohemian Rhapsody”

Emphasis:

  • Use italics for emphasis when punctuation isn’t sufficient
  • Use ALL CAPS sparingly for strong emphasis only
  • Prioritize punctuation over formatting

Color and Contrast

Default styling:

  • Black background with white text
  • Provides maximum contrast and readability

Minimum contrast: 3:1 contrast ratio between text and background for 18pt font or larger

Color for meaning: Never use color alone to convey information. Always provide text-based alternatives for users who are colorblind.

Customization: Media players should allow users to customize caption appearance (color, size, font, background).

Timing and Synchronization

Display duration:

  • Minimum 2 seconds on screen
  • Allow 0.3 seconds per word when possible
  • Add extra time for unfamiliar or complex words
  • Add extra time when visuals are complex or busy

Synchronization: Captions must be precisely synchronized with audio, appearing when words are spoken.

Silent intervals:

  • Remove captions after 4-5 seconds of silence
  • Leave 1.5 seconds minimum between captions to prevent jerky visual effects
  • Note periods of silence when visuals suggest important audio: [silence]

Positioning: Position captions to avoid obscuring:

  • On-screen text
  • Faces and expressions
  • Important visual information

Caption File Formats

Caption files contain all spoken words, sound descriptions, and time codes indicating when captions should appear.

Basic Formats

SubRip (.srt):

  • Most widely supported format
  • Simple text-based format
  • Limited styling options
  • Good compatibility across players

SubViewer (.sbv, .sub):

  • Basic format with simple timing
  • Limited player support

LRC (.lrc):

  • Common for music lyrics
  • Basic timing support

Advanced Formats

WebVTT (.vtt):

  • Modern web standard
  • Supports styling and positioning
  • Allows user customization
  • Growing player support
  • Recommended for web content

SAMI (.smi, .sami):

  • Supports styling
  • Good for multiple languages
  • Declining support

TTML (.ttml):

  • Advanced styling capabilities
  • XML-based format
  • Good for broadcast standards

Format Recommendations

Always provide WebVTT when possible:

  • Allows users to set caption preferences in their operating system
  • Settings apply consistently across browsers and videos
  • Future-proof format with growing adoption

Provide multiple formats when necessary:

  • Different players support different formats
  • Include .srt for maximum compatibility
  • Include .vtt for modern features and user customization

Caption Quality Checklist

Before publishing your captions, verify:

  • All dialogue is captured verbatim (or appropriately edited for unscripted content)
  • Speaker identification is clear when needed
  • Important sounds are described
  • Music and lyrics are identified when relevant
  • Captions don’t reveal information prematurely
  • Timing is synchronized with audio
  • Line breaks occur at logical points
  • No line exceeds 32 characters
  • No more than 3 lines display at once
  • Captions remain on screen at least 2 seconds
  • Text uses mixed case with proper punctuation
  • Contrast ratio meets 3:1 minimum
  • Captions don’t obscure important visuals
  • File format is appropriate for your media player

Conclusion

Effective captions make your audio and video content accessible to millions of users who rely on them. While creating quality captions requires attention to detail, the impact is significant—you’re ensuring equal access to information, entertainment, and education.

Key takeaways:

  • Captions benefit many users beyond those who are deaf or hard of hearing
  • Use closed captions for online content with player controls
  • Include all dialogue, speaker identification, and important sounds
  • Follow visual presentation guidelines for readability
  • Provide captions in WebVTT format when possible
  • Test captions for timing, accuracy, and synchronization

Remember: Captions aren’t just a compliance checkbox—they’re a fundamental accessibility feature that improves the experience for all users. Invest the time to create quality captions, and you’ll create content that truly works for everyone.

Resources

For creating and editing caption files:

  • Amara - Free online caption editor
  • YouTube Studio - Automatic captions with editing tools
  • Subtitle Edit - Free desktop software for Windows
  • Aegisub - Advanced subtitle editing software

For caption validation:

  • WebVTT Validator - Check file format correctness
  • Caption testing with screen readers
  • User testing with people who rely on captions