VOICE AI PLATFORM

Audio AI infrastructure
transcription + synthesis in one API.

Speech-to-Text at 130× real-time, multilingual Text-to-Speech, and tier-1 Voice Cloning — same cluster, same transparent pricing, one unified balance.

THREE PRODUCTS · ONE STACK

Audio AI infrastructure ready for production.

Transcription + synthesis + voice cloning. Same API, same billing, same owned cluster.

PRODUCTION

Speech to Text

Speech-to-text at 130× real-time

  • OpenAI API compatible
  • 10 languages — Spanish included
  • Diarization opt-in (Starter+)
  • Free tier: 500 min/month
LIVE

Text to Speech

Multilingual generic voices

  • 12 voices across 7 languages
  • EN, ES, PT, FR, DE, IT, HI
  • Same balance — pay once, use all 3 products
  • Latency < 2s on CPU
NEW

Clone Voice

Tier-1 speaker cloning, 17 languages

  • Record 6-60 s → synthesize in your voice
  • Premium engine for ES · Multilingual coverage for the rest
  • Same balance as STT + TTS
  • Per-plan voice library (1-50+)
USE CASES · WHAT DEVS BUILD ON ORCHARD

Where Orchard fits.

Six real workflows where Orchard replaces expensive transcription and synthesis APIs, or fragmented service stacks — all on one API.

Workflow 01

Conversational AI agents

Transcribe WhatsApp, Telegram or live call audio and feed the context to your LLM. Low latency, controlled cost per minute.

Workflow 02

Bulk audio processing

Transcribe podcasts, meetings, interviews or thousands of files a day. No rate limits on paid plans, no per-file caps.

Workflow 03

Call analysis & support

Turn customer calls into text + automatic insights for your team. Speaker diarization, ideal for call centers and QA.

Workflow 04

Voice cloning for marketing

Clone real voices for videos, ads and automations. Same consistent voice across hundreds of assets.

Workflow 05

Automated pipelines

Audio → Transcript → Summary → Action with your LLM of choice. Webhooks, retries and batching native to the API.

Workflow 06

Custom voice assistants

Build your own conversational assistant or branded voice agent. Text-to-speech in the voice you define, in the language you need.

LIVE ARCHITECTURE
STT · TTS · Clone · LLM
Loading workflow…

Voice notes in seconds. Podcasts in minutes.

60× real-time average sustained. 1 hour of audio in under 1 minute.

10× cheaper than the competition

$0.00042/min on Pro plan. Simple plans, no surprise costs.

Drop-in replacement

Industry-standard API. Existing SDKs work without changes — migrate in minutes.

Voice dictation in VS Code

Press Cmd+Shift+8, speak, paste at cursor. Works in Cursor, Claude Code, Copilot — and any editor.

Install on Marketplace
Global community
+1.5Kusers
+25countries

Builders, indie hackers and audio-first teams across Latin America, the US and Europe ship faster on Orchard's pay-as-you-go speech stack.

🇦🇷AR🇮🇳IN🇺🇸US🇩🇪DE🇬🇧GB🇨🇦CA🇵🇱PL🇺🇦UA🇪🇸ES🇧🇷BR🇦🇪AE🇿🇦ZA🇮🇱IL🇷🇴RO🇳🇱NL🇰🇷KR🇮🇹IT🇫🇷FR🇲🇽MX🇨🇴CO🇨🇱CL🇵🇪PE🇹🇷TR🇯🇵JP🇦🇺AU
Pricing

Simple plans, no surprises.

Start free · Upgrade as your volume grows

Free
$0

500 min on signup

≈ 500K chars TTS · 1 cloned voice

Try all 3 products. No card required.

  • 500 minutes on signup
  • Automatic monthly refill
  • 1 concurrent request
  • Community support
Hobby
$1/month

1,500 min/month

≈ 1.5M chars TTS · 3 cloned voices

Coffee-money tier. STT + TTS + Clone Voice share the balance.

  • 500 minutes on signup
  • Webhooks + SRT/VTT
  • 1 concurrent request
  • Billed annually
Popular
Starter
$10/month

15,000 min/month

≈ 15M chars TTS · 10 cloned voices

Bots and small SaaS. All 3 products on shared balance.

  • Webhooks + SRT/VTT
  • 3 concurrent requests
  • Email support
Pro
$25/month

60,000 min/month

≈ 60M chars TTS · 50 cloned voices

Production volume. All 3 products on shared balance.

  • Priority queue
  • 10 concurrent requests
  • Python + Node SDK

Higher volume?

Optional diarization · Custom SLA · Dedicated capacity

LIVE

Text-to-Speech. Shipped.

12 voices across 7 languages. Same cluster as Transcribe, one unified balance — your plan covers both products.

  • 12 voices · 7 languages
  • Unified balance with STT
  • Same pricing model
  • Latency < 2s
Generate audio
About us

Audio infrastructure at scale.

We founded Orchard with one purpose: to reshape the Voice Infrastructure industry. As heavy consumers ourselves, we kept hitting the same gaps in the market — exactly where we decided to differentiate: price, volume and concurrency. That's why we built three core verticals: STT, TTS and Voice Cloning.

Our strongest surface today is STT batch — and we're going for the global #1 spot. We back it up with three hard numbers: the cheapest minute on the market, a WER competitive with the best engines in the segment, and an RTF that sustains high volume and massive concurrency without throttling. That combination of quality, speed and price isn't on offer anywhere else.

In parallel, our TTS is consolidating as the default base for voice agents, voice assistants and conversational products — a segment growing double digits as every product turns voice-first.

Voice Cloning is the bet we're most excited about for what's next. It already works great for the current use cases, and where we're investing heavily is the pipeline: capturing prosody, rhythm and emotion with a precision that generic voice cloning will never reach. The goal: when a customer uploads 30 seconds of audio, the model doesn't just reproduce the timbre — it replicates the way they speak, not just the voice they have.

ByMateo Bustamante · Ramiro Alvarez