Text to Speech (Panini)

Panini is SomyaLabs’ text-to-speech model. It streams natural, low-latency audio via POST /v1/speech/synthesize.

Request

Field	Type	Required	Notes
`text`	string	yes	1–2500 characters to synthesize.
`voice`	string	no	A voice slug from `GET /v1/voices`. Omit to use the default voice.
`ref_audio`	string	no	Reference audio for voice cloning / style conditioning. See below.

{ "text": "Namaste! Aapka swaagat hai.", "voice": "kiran" }

Supported languages

Panini supports 15 languages:

Language	Code	Language	Code	Language	Code
Assamese	`as`	Hindi	`hi`	Nepali	`ne`
Bengali	`bn`	Kannada	`kn`	Odia	`or`
Dogri	`doi`	Maithili	`mai`	Punjabi	`pa`
English	`en`	Malayalam	`ml`	Tamil	`ta`
Gujarati	`gu`	Marathi	`mr`	Telugu	`te`

Voices

You select a language by choosing a voice — voice is the request parameter; there is no separate language field. The named voices:

Voice (slug)	Language	Gender
`omkar`	Marathi	Male
`ananya`	Odia	Female
`sravani`	Telugu	Female
`arjun`	Hindi	Male
`simran`	Punjabi	Female
`madhuri`	Marathi	Female
`priya`	Hindi	Female
`kiran`	Kannada	Male
`teja`	Telugu	Male
`meera`	Malayalam	Female

Omit voice (or send "") to use the default voice.

Pass the slug (e.g. kiran) as voice. panini is the model name, not a voice. For the live, account-specific list call GET /v1/voices (curl https://api.somya.ai/v1/voices -H "X-API-Key: YOUR_API_KEY").

Streaming response (NDJSON)

The endpoint responds with application/x-ndjson — one JSON object per line:

{"chunk_b64": "<base64-encoded WAV>", "is_final": false}
{"chunk_b64": "<base64-encoded WAV>", "is_final": false}
{"chunk_b64": "<base64-encoded WAV>", "is_final": true}

Field	Type	Meaning
`chunk_b64`	string	Base64 of a complete, self-contained WAV (RIFF header + PCM): 24 kHz, 16-bit, mono.
`is_final`	boolean	`false` for incremental chunks; the record with `true` carries the full utterance as a single WAV.

Two ways to consume it:

Stream playback — decode each chunk_b64 as it arrives and play the chunks back-to-back to start audio almost immediately.
Whole file — ignore partials and keep the is_final: true chunk; it’s the complete WAV, ideal for saving or replay.

Python — save the final WAV

import base64, json, requests

resp = requests.post(
    "https://api.somya.ai/v1/speech/synthesize",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={"text": "Namaste!", "voice": "kiran"},
    stream=True,
)
final_wav = None
for line in resp.iter_lines():
    if not line:
        continue
    rec = json.loads(line)
    if rec.get("is_final"):
        final_wav = base64.b64decode(rec["chunk_b64"])
with open("speech.wav", "wb") as f:
    f.write(final_wav)

Reference audio

ref_audio conditions synthesis on a reference sample (voice cloning / style transfer). For reusable custom voices, upload one via POST /v1/voices and then pass its slug as voice.

The accepted ref_audio encoding (e.g. URL vs. base64) and custom-voice upload requirements are environment-specific — check the API Reference for the current schema before using it in production.

​Request

​Supported languages

​Voices

​Streaming response (NDJSON)

​Reference audio

Request

Supported languages

Voices

Streaming response (NDJSON)

Reference audio