Panini is SomyaLabs’ text-to-speech model. It streams natural,
low-latency audio via POST /v1/speech/synthesize.
Request
| Field | Type | Required | Notes |
|---|
text | string | yes | 1–2500 characters to synthesize. |
voice | string | no | A voice slug from GET /v1/voices. Omit to use the default voice. |
ref_audio | string | no | Reference audio for voice cloning / style conditioning. See below. |
{ "text": "Namaste! Aapka swaagat hai.", "voice": "kiran" }
Supported languages
Panini supports 15 languages:
| Language | Code | Language | Code | Language | Code |
|---|
| Assamese | as | Hindi | hi | Nepali | ne |
| Bengali | bn | Kannada | kn | Odia | or |
| Dogri | doi | Maithili | mai | Punjabi | pa |
| English | en | Malayalam | ml | Tamil | ta |
| Gujarati | gu | Marathi | mr | Telugu | te |
Voices
You select a language by choosing a voice — voice is the request
parameter; there is no separate language field. The named voices:
| Voice (slug) | Language | Gender |
|---|
omkar | Marathi | Male |
ananya | Odia | Female |
sravani | Telugu | Female |
arjun | Hindi | Male |
simran | Punjabi | Female |
madhuri | Marathi | Female |
priya | Hindi | Female |
kiran | Kannada | Male |
teja | Telugu | Male |
meera | Malayalam | Female |
Omit voice (or send "") to use the default voice.
Pass the slug (e.g. kiran) as voice. panini is the model name, not
a voice. For the live, account-specific list call
GET /v1/voices (curl https://api.somya.ai/v1/voices -H "X-API-Key: YOUR_API_KEY").
Streaming response (NDJSON)
The endpoint responds with application/x-ndjson — one JSON object per line:
{"chunk_b64": "<base64-encoded WAV>", "is_final": false}
{"chunk_b64": "<base64-encoded WAV>", "is_final": false}
{"chunk_b64": "<base64-encoded WAV>", "is_final": true}
| Field | Type | Meaning |
|---|
chunk_b64 | string | Base64 of a complete, self-contained WAV (RIFF header + PCM): 24 kHz, 16-bit, mono. |
is_final | boolean | false for incremental chunks; the record with true carries the full utterance as a single WAV. |
Two ways to consume it:
- Stream playback — decode each
chunk_b64 as it arrives and play the
chunks back-to-back to start audio almost immediately.
- Whole file — ignore partials and keep the
is_final: true chunk; it’s the
complete WAV, ideal for saving or replay.
Python — save the final WAV
import base64, json, requests
resp = requests.post(
"https://api.somya.ai/v1/speech/synthesize",
headers={"X-API-Key": "YOUR_API_KEY"},
json={"text": "Namaste!", "voice": "kiran"},
stream=True,
)
final_wav = None
for line in resp.iter_lines():
if not line:
continue
rec = json.loads(line)
if rec.get("is_final"):
final_wav = base64.b64decode(rec["chunk_b64"])
with open("speech.wav", "wb") as f:
f.write(final_wav)
Reference audio
ref_audio conditions synthesis on a reference sample (voice cloning / style
transfer). For reusable custom voices, upload one via POST /v1/voices and then
pass its slug as voice.
The accepted ref_audio encoding (e.g. URL vs. base64) and custom-voice upload
requirements are environment-specific — check the
API Reference for the current schema before
using it in production.