Speech to Text (Vyasa)

Vyasa is SomyaLabs’ automatic speech recognition (ASR) model, built for Indian languages, accents, and real-world audio. Transcribe audio with POST /v1/speech/transcriptions.

Request

Send the audio as multipart/form-data under the audio field:

curl -X POST https://api.somya.ai/v1/speech/transcriptions \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "audio=@recording.wav"

Response

The standard envelope, with the transcript under data.text:

{
  "success": true,
  "data": { "text": "namaste aapka swaagat hai" }
}

Audio formats & limits

The endpoint accepts common audio container/codecs (e.g. WAV, MP3, WebM/Opus, M4A). Oversized or unsupported uploads are rejected:

413 — the audio file is too large.
415 — the audio format isn’t supported.

Exact maximum file size and the full list of accepted formats depend on the deployment. If you hit 413/415, downsample/convert to 16 kHz mono WAV — the most broadly supported input — or check the API Reference. See Errors for handling.

Text to Speech (Panini)Errors

​Request

​Response

​Audio formats & limits

Request

Response

Audio formats & limits