Speech-to-Text at 130× real-time, multilingual Text-to-Speech, and tier-1 Voice Cloning — same cluster, same transparent pricing, one unified balance.
Transcription + synthesis + voice cloning. Same API, same billing, same owned cluster.
Speech-to-text at 130× real-time
Multilingual generic voices
Tier-1 speaker cloning, 17 languages
Six real workflows where Orchard replaces expensive transcription and synthesis APIs, or fragmented service stacks — all on one API.
Transcribe WhatsApp, Telegram or live call audio and feed the context to your LLM. Low latency, controlled cost per minute.
Transcribe podcasts, meetings, interviews or thousands of files a day. No rate limits on paid plans, no per-file caps.
Turn customer calls into text + automatic insights for your team. Speaker diarization, ideal for call centers and QA.
Clone real voices for videos, ads and automations. Same consistent voice across hundreds of assets.
Audio → Transcript → Summary → Action with your LLM of choice. Webhooks, retries and batching native to the API.
Build your own conversational assistant or branded voice agent. Text-to-speech in the voice you define, in the language you need.
60× real-time average sustained. 1 hour of audio in under 1 minute.
$0.00042/min on Pro plan. Simple plans, no surprise costs.
Industry-standard API. Existing SDKs work without changes — migrate in minutes.
Press Cmd+Shift+8, speak, paste at cursor. Works in Cursor, Claude Code, Copilot — and any editor.
Install on MarketplaceBuilders, indie hackers and audio-first teams across Latin America, the US and Europe ship faster on Orchard's pay-as-you-go speech stack.
Start free · Upgrade as your volume grows
500 min on signup
≈ 500K chars TTS · 1 cloned voice
Try all 3 products. No card required.
1,500 min/month
≈ 1.5M chars TTS · 3 cloned voices
Coffee-money tier. STT + TTS + Clone Voice share the balance.
15,000 min/month
≈ 15M chars TTS · 10 cloned voices
Bots and small SaaS. All 3 products on shared balance.
60,000 min/month
≈ 60M chars TTS · 50 cloned voices
Production volume. All 3 products on shared balance.
Higher volume?
Optional diarization · Custom SLA · Dedicated capacity
12 voices across 7 languages. Same cluster as Transcribe, one unified balance — your plan covers both products.
We founded Orchard with one purpose: to reshape the Voice Infrastructure industry. As heavy consumers ourselves, we kept hitting the same gaps in the market — exactly where we decided to differentiate: price, volume and concurrency. That's why we built three core verticals: STT, TTS and Voice Cloning.
Our strongest surface today is STT batch — and we're going for the global #1 spot. We back it up with three hard numbers: the cheapest minute on the market, a WER competitive with the best engines in the segment, and an RTF that sustains high volume and massive concurrency without throttling. That combination of quality, speed and price isn't on offer anywhere else.
In parallel, our TTS is consolidating as the default base for voice agents, voice assistants and conversational products — a segment growing double digits as every product turns voice-first.
Voice Cloning is the bet we're most excited about for what's next. It already works great for the current use cases, and where we're investing heavily is the pipeline: capturing prosody, rhythm and emotion with a precision that generic voice cloning will never reach. The goal: when a customer uploads 30 seconds of audio, the model doesn't just reproduce the timbre — it replicates the way they speak, not just the voice they have.