Arabic-first voice AI

The voice AI layer
for the Arabic internet.

Native-quality Arabic text-to-speech, tuned by native speakers, deployed across four regions.

Start for freeHear the voicesNo card. 30 seconds to first synthesis.
113msHakim Saree' v1.3 · TTS
90msHakim Arab v2 · STT
99.97%· 30-day uptime

Hear the range

Twelve voices. Fifteen Arabic dialects. One model family.

Tap a voice. We stream a six-second sample from the same endpoint your app will call. No sign-up.

Layla

Modern Standard · Narrative

Layla · MSA· Narrative
0:000:00
Transcript
Hi, I'm Layla. A Modern Standard Arabic voice, calm and assured, built for narration, audiobooks and e-learning.

How we stack up

Built for Arabic, benchmarked against the world.

We ship voice infrastructure for the Arabic internet, so the comparison that matters runs on Arabic content, not English-tuned demos. Numbers below are calibrated against the latest production tier each vendor offers, measured the week of launch.

15
Arabic dialects shipped
MSA + 14 regional varieties with named voices
113 ms
TTS time-to-first-byte, p50
Arabic content from the region you pick, production checkpoint
leads
Word error rate on Arabic STT
Arabic-LibriSpeech eval set, MSA plus four dialects
Hakim versus the leading TTS, STT, and voice-cloning APIs on Arabic content.
Hakim versus the leading TTS, STT, and voice-cloning APIs on Arabic content.AzureCognitive ServicesHakimBuilt for ArabicElevenLabsMultilingual v2 + FlashGoogleCloud Text-to-SpeechOpenAItts-1 + WhisperFish Audiofish-speech v1.5
Arabic excellence
Arabic dialects with named voices
Counted from each vendor's TTS catalogue, not generic 'Arabic' fallbacks.
3151211
Arabic STT word error rate
Lower is better. 7-hour Arabic-LibriSpeech eval, MSA + four dialects.
TrailsLeadsTrailsCompetitive
Voice cloning quality on Arabic
How the vendor handles a 30-second Arabic sample, accent, register, dialect.
EnterpriseNativeEnglish-tunedLimitedNo consent gate
Performance
TTS time-to-first-byte on Arabic
p50 first-audio-chunk from prompt-submit. EU-Frankfurt, 1 Gbps link, January 2026.
350 ms113 ms320 ms380 ms700 ms280 ms
Streaming TTS
WebSocketChunkedChunkedChunkedChunkedPartial
STT billing granularity
Per-second billing means short clips don't round up to a full minute.
Per-secondPer-secondPer 15 secondsPer minute
Platform & trust
OpenAI-compatible REST API
Drop-in replacement for /v1/audio/speech and /v1/audio/transcriptions.
NoYesNoNoNativePartial
Data residency regions
Where your audio is processed and stored, not just billed.
EU onlyGCC + EUUS onlyEU onlyUS onlyUS / CN
Voice-clone consent gate
An on-platform consent step before a custom voice is callable in production.
EnterpriseYesYesNoNo

Hakim leads on 6 of 9 dimensions and ties on 3 more.

Methodology + sources

Numbers from each vendor's public documentation and our 2026-Q1 internal benchmarks against their highest-quality production tier (ElevenLabs Flash, Azure Neural HD, Google Studio, OpenAI tts-1, Fish Audio real-time). Arabic STT WER measured on a 7-hour Arabic-LibriSpeech evaluation set covering MSA plus four major dialects. Vendor brand names are the trademarks of their respective owners.

Try it yourself

Type a line. Hear Hakim read it back.

Direct call to the same production pipeline your app will use. 5 free generations per day, no sign-up, no card.

Tip: short, complete sentences produce the most natural prosody.66/200

5 free generations per day per visitor.

01 / 05

Voice is the next interface.

Most of the world communicates in speech. AI, until now, has assumed everyone types.

The web has grown to five billion people. Typing, in your first language, without a keyboard that fights you, has not. The gap between 'has a phone' and 'can type comfortably' is where voice belongs.

People online with limited typing fluency3.4Bapprox. global

Sources: ITU global internet estimate, Ethnologue first-language speakers, regional typing-literacy surveys. Full references in the launch thesis post.

02 / 05

Arabic isn't one language.

It's Modern Standard Arabic plus fourteen mutually-semi-intelligible regional dialects, sounds that no English-first model was trained on.

Then there's the diacritic system most real-world text omits, and a script that writes right-to-left and connects letters inside words. 'Arabic TTS' built on an English-first stack mispronounces common names by the second sentence. We built around that, not past it.

Supported dialects
MSA
Khaleeji
Egyptian
Levantine
Maghrebi
Iraqi
Sudanese
Yemeni
Najdi
Sounds English doesn't have
ع/ʕ/voiced pharyngeal
ح/ħ/voiceless pharyngeal
خ/x/velar fricative
ق/q/uvular stop
ص//emphatic s

Supported range refers to current coverage in Hakim Saree' v1.3 TTS and Hakim Arab v2 STT; see the models page for the full capability matrix.

03 / 05

How the Hakim TTS family is built.

Arabic-first TTS engineered for streaming speed. Hakim Saree' v1.3, tuned by native speakers across 15 dialects. The first byte of streaming audio lands in roughly 113 ms p95.

Hakim Saree' (سريع, *swift*) is our streaming-speed tier · the endpoint tuned for the lowest first-byte latency. It reads Arabic script and diacritics natively, handles English inside an Arabic sentence without switching voices, and exposes the same endpoint whether you pick a region in the EU or the GCC. One API, one authentication primitive, the same billing model everywhere.

113msTime-to-first-audio p95· Rolling 15-second window across live regions
15
Arabic dialects
MSA + 14 regional dialects
30+
Languages
Non-Arabic supported range
4
Live regions
GCC + EU

Every voice is recorded and QA'd with a native speaker before it ships. Evaluation methodology and the full three-tier capability matrix live on the models page.

04 / 05

How Hakim Arab v2 is built.

Arabic-first streaming STT. Dialect-aware, code-switch-aware, interactive latency, first token in roughly 90 ms p95.

Hakim Arab is the half of the stack that turns voice into text. It is trained for the sentences Arabic speakers actually say: 15 Arabic dialects, English phrases inside an Arabic sentence treated as first-class input, and proper nouns + emails + numbers preserved through the transcript. 'Send the موعد to ahmed@hakim.ai' is exactly the kind of sentence that breaks English-first STT; Hakim Arab reads it straight through.

Listen · Khaleeji sample

A Khaleeji voice Hakim Arab v2 is designed to transcribe. Play, then imagine the same accuracy going the other way.

Reem · Khaleeji· Khaleeji · narrative register
0:000:00
90ms
Streaming latency p95
First-token from speech-in
15
Arabic dialects
Available today
30+
Languages
Hakim Arab v2 supported range

No accuracy or word-error-rate claim is published on this page, we hold that conversation to the models page, where the evaluation protocol is documented.

05 / 05

Your audio stays where you chose.

Four live regions. GCC (UAE, KSA, Qatar) plus EU Frankfurt. You pick where your audio is processed; it doesn't leave that region.

Your data stays where your customers and regulators expect it to stay. Pick a region when you create a project, and every request, synthesis, transcription, storage, is handled inside that region. No silent cross-border routing, no surprises in your audit trail.

Frankfurt· EU · liveDoha· GCC · liveDubai· GCC · liveRiyadh· GCC · live
Our compliance posture

Compliant, not certified: every badge links to the exact control mapping on our security page. SOC 2 Type II audit is in progress; the Type I report is available on request.

What you can build

Three Arabic-first voice models. One API.

Text-to-speech, speech-to-text, and voice cloning, all native-quality across Arabic dialects and 30+ languages, all behind the same authentication and billing primitives.

TTS · Hakim Fast v1.3

Text to speech, Arabic-first.

Arabic-first streaming TTS at 113 ms p95 time-to-first-audio. MSA plus 14 regional Arabic dialects and 30+ other languages out of the box, served from your choice of four live regions.

See TTS capabilities

STT · Hakim Arab v2

Speech to text, dialect-aware.

Arabic-first streaming STT. Fifteen Arabic dialects, English and Arabic in the same sentence, 90 ms p95 first token, delivered from your choice of four live regions.

See STT capabilities

Voice cloning

Clone any voice instantly.

Record ten seconds, get a production voice that follows your clone's dialect and register. Cloning is instant, and custom voices share the same inference stack as Hakim Fast v1.3, same latency, same quality bar.

See voice cloning

Ship in minutes

Five lines to a voice.

Drop-in SDK for Node and Python, plus pure cURL and browser-JS snippets. Every example hits the same pipeline the paid API uses, no separate staging path.

  • Hakim Fast v1.3 · default TTS model · streaming, low-latency, 15 dialects
  • Reem · Khaleeji · Khaleeji preset · one of twelve voices shipped with every account
import { Hakim } from '@hakim/sdk-node';

const client = new Hakim({ apiKey: process.env.HAKIM_API_KEY });

const audio = await client.audio.speech.create({
  model: 'hakim-fast-v1',
  voice: 'reem-khaleeji',
  input: 'أهلاً وسهلاً بكم في حكيم.',
  format: 'mp3',
});

await audio.writeToFile('hello.mp3');

Built for enterprise

Compliant, not certified, and happy to show our work.

We operate to SOC 2 Type II, GDPR Article 28 processor, EU AI Act Article 50, UAE PDPL, and KSA PDPL requirements. The SOC 2 Type II audit is in progress; the Type I report is available on request. Every badge on this page links to the exact control mapping on the security page.

99.9%
Uptime SLO
Rolling 30-day target
113 ms
TTS p95 across regions
Rolling 24 hours, all live regions
4
Live data-residency regions
UAE · KSA · Qatar · EU-Frankfurt

Our compliance posture today. No badge on this page implies a completed auditor attestation unless the security page's control mapping says so explicitly.

Read the security page
Frankfurt· EU · liveDoha· GCC · liveDubai· GCC · liveRiyadh· GCC · live

Pricing

Pricing that scales with you.

Free while you prototype. Pay only for what you use once you ship. Enterprise terms on request, with data residency, custom retention, and signed BAAs where they matter.

Prototype

Free

$0/ mo

Build against the same production pipeline the paid tiers use. Zero card, zero commitment.

  • 10,000 credits every month
  • Full 12-voice preset catalogue · Playground
  • Community support
Start for free

No card required. Upgrade any time from the dashboard.

Ship it

Creator

$19/ mo

The anchor plan for individual developers and small teams going to production.

  • 500,000 credits every month
  • All presets plus 30 custom voices · overage billing
  • Email support, 24-hour target response

Custom

Enterprise

Custom/mo

Usage-based pricing, signed MSA, data-processing addenda, and the residency posture your legal team asked for.

  • Signed DPA and MSA · custom retention · BAA on request
  • Pick any of 4 live regions · dedicated tenants on contract
  • Unlimited voices · unlimited keys · on-prem optional
  • Slack Connect · dedicated CSM · 24/7 sev-1

One more thing

Your users are waiting for a voice that sounds like them.

Five thousand free characters today. Twelve voices and every dialect we support. Thirty seconds to your first synthesis.

Start for freeNo card. Upgrade when, and if, you grow out of the free tier.