Minh/speech transcription tutorial #1807

minh-hoque · 2025-05-02T14:41:06Z

Summary

This pull-request adds a new end-to-end tutorial, examples/Speech_transcription_methods.ipynb, that compares four different ways to convert speech to text with OpenAI tools:

Audio /transcriptions endpoint (single request)
Audio /transcriptions with streaming
Speech Realtime API (web-socket)
Agents SDK with the new speech_to_text tool

The notebook walks through the trade-offs, provides helper functions, and benchmarks each approach on several sample files.
To support the tutorial we also add:

Sample audio clips (examples/data/sample_audio_files/…)
Explanatory diagrams (Mermaid source in examples/mermaid/… and rendered PNGs in examples/imgs/…)
Updates to .gitignore (ignore large/temporary audio)

Together these assets give cookbook readers a practical, runnable reference for choosing the right transcription workflow.

Motivation

High-quality speech transcription is a common requirement for chatbots, call-analysis, meeting notes, and real-time assistants. OpenAI now offers multiple APIs and SDK features for this, but the differences (latency, code patterns, streaming vs. batch, etc.) are not obvious to newcomers.

This tutorial:

Shows concrete, runnable examples for every current method
Highlights pros / cons and performance considerations
Provides reusable helper code and diagrams to speed up adoption

Adding this content will make the Cookbook a good guide for developers integrating speech capabilities.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

…ech_transcription_tutorial

examples/Speech_transcription_methods.ipynb

danial-openai

Some tiny nits but otherwise LGTM. Thanks for writing this up!

minh-hoque · 2025-05-05T01:18:02Z

Some tiny nits but otherwise LGTM. Thanks for writing this up!

Thank you for the review @danial-openai! Changes have been pushed.

…ech_transcription_tutorial

minh-hoque added 15 commits April 29, 2025 09:28

Added new cookbook on speech transcription methods

3320b42

added audio files

aaa9d29

updated gitignore

14844d4

Fixed resampling to 24 kHz

bdeff55

Fixed markdown

32918fa

Fixed mermaid rendering in notebook

32179a3

updated markdown

f2ff734

Changed audio file

90b3873

Simplified REALTIME API code

883550c

Improved markdown table

cc9feee

Merge commit '22a8c6cf484edc78ec8db5cabd3da9c8b59e1d20' into minh/spe…

6211668

…ech_transcription_tutorial

Improve markdown and helper functions

02be231

cleaned imports

a2d9500

replaced model names with CONSTANT

acf649d

updated registry

8963925

minh-hoque requested review from anoop-openai and danial-openai May 2, 2025 14:41

Merge commit '66a31c140ee480c80749046e817e8c85231a3bb5' into minh/spe…

3e6d268

…ech_transcription_tutorial

danial-openai reviewed May 4, 2025

View reviewed changes

examples/Speech_transcription_methods.ipynb Outdated Show resolved Hide resolved

danial-openai reviewed May 4, 2025

View reviewed changes

examples/Speech_transcription_methods.ipynb Outdated Show resolved Hide resolved

danial-openai reviewed May 4, 2025

View reviewed changes

examples/Speech_transcription_methods.ipynb Outdated Show resolved Hide resolved

danial-openai reviewed May 4, 2025

View reviewed changes

examples/Speech_transcription_methods.ipynb Outdated Show resolved Hide resolved

danial-openai approved these changes May 4, 2025

View reviewed changes

Fixed PR comments

732cf24

minh-hoque added 2 commits May 5, 2025 21:56

Merge commit 'd40a72141ba0e240e4dcbdeb7eeb5ea70da7a384' into minh/spe…

78c005e

…ech_transcription_tutorial

Updated cell outputs

4d69a7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minh/speech transcription tutorial #1807

Minh/speech transcription tutorial #1807

minh-hoque commented May 2, 2025

danial-openai left a comment

minh-hoque commented May 5, 2025

Minh/speech transcription tutorial #1807

Are you sure you want to change the base?

Minh/speech transcription tutorial #1807

Conversation

minh-hoque commented May 2, 2025

Summary

Motivation

Adding this content will make the Cookbook a good guide for developers integrating speech capabilities.

For new content

danial-openai left a comment

Choose a reason for hiding this comment

minh-hoque commented May 5, 2025