Skip to content

Minh/speech transcription tutorial #1807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

minh-hoque
Copy link

Summary

This pull-request adds a new end-to-end tutorial, examples/Speech_transcription_methods.ipynb, that compares four different ways to convert speech to text with OpenAI tools:

  1. Audio /transcriptions endpoint (single request)
  2. Audio /transcriptions with streaming
  3. Speech Realtime API (web-socket)
  4. Agents SDK with the new speech_to_text tool

The notebook walks through the trade-offs, provides helper functions, and benchmarks each approach on several sample files.
To support the tutorial we also add:

  • Sample audio clips (examples/data/sample_audio_files/…)
  • Explanatory diagrams (Mermaid source in examples/mermaid/… and rendered PNGs in examples/imgs/…)
  • Updates to .gitignore (ignore large/temporary audio)

Together these assets give cookbook readers a practical, runnable reference for choosing the right transcription workflow.

Motivation

High-quality speech transcription is a common requirement for chatbots, call-analysis, meeting notes, and real-time assistants. OpenAI now offers multiple APIs and SDK features for this, but the differences (latency, code patterns, streaming vs. batch, etc.) are not obvious to newcomers.

This tutorial:

  • Shows concrete, runnable examples for every current method
  • Highlights pros / cons and performance considerations
  • Provides reusable helper code and diagrams to speed up adoption

Adding this content will make the Cookbook a good guide for developers integrating speech capabilities.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

Copy link
Contributor

@danial-openai danial-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tiny nits but otherwise LGTM. Thanks for writing this up!

@minh-hoque
Copy link
Author

Some tiny nits but otherwise LGTM. Thanks for writing this up!

Thank you for the review @danial-openai! Changes have been pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants