Blog
Thesis pieces, release write-ups, and engineering notes, written by the people building Hakim.
How to evaluate Arabic TTS, STT, and voice cloning APIs without the marketing claims. Eleven criteria that actually matter, what to ignore, and how the major providers stack up across MSA and the regional dialect tree.
Building Arabic-language voice agents for the Gulf, Egypt, and the Levant. STT dialect coverage, sub-120 ms TTS, interrupt handling, and the four mistakes that sink voice-agent rollouts in MENA.
AI dubbing isn't just translation plus TTS. A breakdown of the seven-stage pipeline, where dialect choice changes everything, lip-sync constraints, and how studios actually integrate AI dubbing into existing post-production workflows.
Khaleeji isn't one dialect, it's a family. A primer on Najdi, Hejazi, Emirati, Kuwaiti, Qatari, and Bahraini varieties; what voice AI actually needs to handle them; and what 'covered' looks like in production.
Most voice AI treats Arabic as an afterthought. Here's why we rebuilt the stack with MSA and fifteen dialects as the default, not the fallback.
Behind the sub-120 ms TTFB number: chunked autoregressive decoding, WebSocket transport semantics, mid-stream cancel, and the architectural choices that separate a streaming TTS from a batch TTS labelled streaming.
A deep dive on the latency budget: model surgery, custom inference, and the four-region topology that gets a synthesis call to your user in 113 milliseconds.
Fifteen Arabic dialects and six MENA language tiers, evaluated on regional corpora, not press releases. How we measure it, and how we plan to grow it.
What you're actually paying for: per-character vs. per-second, free-tier traps, voice-cloning amortisation, dialect surcharges, and the hidden costs that show up at 100M characters per month.
The EU AI Act came into force on 2 February 2025. What changed for voice cloning specifically: transparency disclosures, deepfake labelling, the GPAI obligations, and how the EU regime intersects with the GCC's identity-misuse statutes.
Frankfurt, UAE, KSA, Qatar, four full deployments from day one. Why the hub-and-spoke model doesn't work for Arabic voice AI, and what we built instead.
Eighteen months head-down, four founders still on the stack, a production TTS + STT model family, and four live regions. Today we come out of stealth, here's the arc.
Most MENA-targeted products are bilingual by default. How to design a voice layer that handles AR/EN code-switching gracefully: language detection, prosody at the boundary, voice catalogue ergonomics, and the LLM prompt patterns that don't break.
Quality is a single axis. Voice cloning is a four-axis problem: quality, consent, traceability, revocation. Here's how we built each of them, and what we refuse to do.
A retrospective on the nine weeks of decoder surgery, the Beirut-Cairo-Riyadh corpus collection, and the boring infra work that separated a notebook demo from a production API.