Skip to main content
Base URL: https://api.somya.ai. Replace YOUR_API_KEY with a real key — see Authentication.

1. Get an API key

Sign in to the Playground, open API keys, and create one. Copy the key value — it’s returned only once.

2. Synthesize speech (TTS)

Send text to POST /v1/speech/synthesize. The response streams as newline-delimited JSON (application/x-ndjson) — one record per audio chunk — so you can begin playback before synthesis finishes.
curl -N -X POST https://api.somya.ai/v1/speech/synthesize \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "text": "Namaste! Aapka swaagat hai." }'
text is required (1–2500 characters). voice and ref_audio are optional — omitting voice uses the default. To pick a specific voice (and its language), pass a voice slug from GET /v1/voices. See Text to Speech for the streaming format and voice details.

Authentication header

You can authenticate with either header — they’re equivalent:
X-API-Key: YOUR_API_KEY
# or
Authorization: Bearer YOUR_API_KEY
Prefer X-API-Key for server-to-server calls with a SomyaLabs API key.

3. Transcribe audio (ASR)

Send an audio file to POST /v1/speech/transcriptions as multipart/form-data under the audio field:
cURL
curl -X POST https://api.somya.ai/v1/speech/transcriptions \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "audio=@recording.wav"
The response uses the standard envelope, with the transcript under data.text:
{ "success": true, "data": { "text": "namaste aapka swaagat hai" } }
See Speech to Text for formats and limits.

Next steps

API Reference

Every endpoint, generated live from the OpenAPI spec.

Errors

Error envelope, codes, and how to handle non-200 responses.