VOICE AI PLATFORM

Audio AI infrastructure
transcription + synthesis in one API.

Speech-to-Text at 130× real-time, multilingual Text-to-Speech, and tier-1 Voice Cloning — same cluster, same transparent pricing, one unified balance.

Try free View plans Quickstart

THREE PRODUCTS · ONE STACK

Audio AI infrastructure ready for production.

Transcription + synthesis + voice cloning. Same API, same billing, same owned cluster.

PRODUCTION

Speech to Text

Speech-to-text at 130× real-time

OpenAI API compatible
10 languages — Spanish included
Diarization opt-in (Starter+)
Free tier: 500 min/month

Try STT

LIVE

Text to Speech

Multilingual generic voices

12 voices across 7 languages
EN, ES, PT, FR, DE, IT, HI
Same balance — pay once, use all 3 products
Latency < 2s on CPU

Try TTS

NEW

Clone Voice

Tier-1 speaker cloning, 17 languages

Record 6-60 s → synthesize in your voice
Premium engine for ES · Multilingual coverage for the rest
Same balance as STT + TTS
Per-plan voice library (1-50+)

Try Clone Voice

USE CASES · WHAT DEVS BUILD ON ORCHARD

Where Orchard fits.

Six real workflows where Orchard replaces expensive transcription and synthesis APIs, or fragmented service stacks — all on one API.

Workflow 01

Conversational AI agents

Transcribe WhatsApp, Telegram or live call audio and feed the context to your LLM. Low latency, controlled cost per minute.

Workflow 02

Bulk audio processing

Transcribe podcasts, meetings, interviews or thousands of files a day. No rate limits on paid plans, no per-file caps.

Workflow 03

Call analysis & support

Turn customer calls into text + automatic insights for your team. Speaker diarization, ideal for call centers and QA.

Workflow 04

Voice cloning for marketing

Clone real voices for videos, ads and automations. Same consistent voice across hundreds of assets.

Workflow 05

Automated pipelines

Audio → Transcript → Summary → Action with your LLM of choice. Webhooks, retries and batching native to the API.

Workflow 06

Custom voice assistants

Build your own conversational assistant or branded voice agent. Text-to-speech in the voice you define, in the language you need.

LIVE ARCHITECTURE

STT · TTS · Clone · LLM

Loading workflow…

Voice notes in seconds. Podcasts in minutes.

60× real-time average sustained. 1 hour of audio in under 1 minute.

10× cheaper than the competition

$0.00042/min on Pro plan. Simple plans, no surprise costs.

Drop-in replacement

Industry-standard API. Existing SDKs work without changes — migrate in minutes.

Voice dictation in VS Code

Press Cmd+Shift+8, speak, paste at cursor. Works in Cursor, Claude Code, Copilot — and any editor.

Install on Marketplace

Global community

+1.5Kusers

+25countries

Builders, indie hackers and audio-first teams across Latin America, the US and Europe ship faster on Orchard's pay-as-you-go speech stack.

🇦🇷AR🇮🇳IN🇺🇸US🇩🇪DE🇬🇧GB🇨🇦CA🇵🇱PL🇺🇦UA🇪🇸ES🇧🇷BR🇦🇪AE🇿🇦ZA🇮🇱IL🇷🇴RO🇳🇱NL🇰🇷KR🇮🇹IT🇫🇷FR🇲🇽MX🇨🇴CO🇨🇱CL🇵🇪PE🇹🇷TR🇯🇵JP🇦🇺AU

Pricing

Simple plans, no surprises.

Start free · Upgrade as your volume grows

Free

500 min on signup

≈ 500K chars TTS · 1 cloned voice

Try all 3 products. No card required.

500 minutes on signup
Automatic monthly refill
1 concurrent request
Community support

Start free

Hobby

$1/month

1,500 min/month

≈ 1.5M chars TTS · 3 cloned voices

Coffee-money tier. STT + TTS + Clone Voice share the balance.

500 minutes on signup
Webhooks + SRT/VTT
1 concurrent request
Billed annually

Popular

Starter

$10/month

15,000 min/month

≈ 15M chars TTS · 10 cloned voices

Bots and small SaaS. All 3 products on shared balance.

Webhooks + SRT/VTT
3 concurrent requests
Email support

Pro

$25/month

60,000 min/month

≈ 60M chars TTS · 50 cloned voices

Production volume. All 3 products on shared balance.

Priority queue
10 concurrent requests
Python + Node SDK

Higher volume?

Optional diarization · Custom SLA · Dedicated capacity

LIVE

Text-to-Speech. Shipped.

12 voices across 7 languages. Same cluster as Transcribe, one unified balance — your plan covers both products.

12 voices · 7 languages
Unified balance with STT
Same pricing model
Latency < 2s

Generate audio

About us

Audio infrastructure at scale.

We founded Orchard with one purpose: to reshape the Voice Infrastructure industry. As heavy consumers ourselves, we kept hitting the same gaps in the market — exactly where we decided to differentiate: price, volume and concurrency. That's why we built three core verticals: STT, TTS and Voice Cloning.

Our strongest surface today is STT batch — and we're going for the global #1 spot. We back it up with three hard numbers: the cheapest minute on the market, a WER competitive with the best engines in the segment, and an RTF that sustains high volume and massive concurrency without throttling. That combination of quality, speed and price isn't on offer anywhere else.

In parallel, our TTS is consolidating as the default base for voice agents, voice assistants and conversational products — a segment growing double digits as every product turns voice-first.

Voice Cloning is the bet we're most excited about for what's next. It already works great for the current use cases, and where we're investing heavily is the pipeline: capturing prosody, rhythm and emotion with a precision that generic voice cloning will never reach. The goal: when a customer uploads 30 seconds of audio, the model doesn't just reproduce the timbre — it replicates the way they speak, not just the voice they have.

X LinkedIn YouTube

ByMateo Bustamante · Ramiro Alvarez

Audio AI infrastructuretranscription + synthesis in one API.