Skip to main content
Panini is SomyaLabs’ text-to-speech model. It streams natural, low-latency audio via POST /v1/speech/synthesize.

Request

FieldTypeRequiredNotes
textstringyes1–2500 characters to synthesize.
voicestringnoA voice slug from GET /v1/voices. Omit to use the default voice.
ref_audiostringnoReference audio for voice cloning / style conditioning. See below.
{ "text": "Namaste! Aapka swaagat hai.", "voice": "kiran" }

Supported languages

Panini supports 15 languages:
LanguageCodeLanguageCodeLanguageCode
AssameseasHindihiNepaline
BengalibnKannadaknOdiaor
DogridoiMaithilimaiPunjabipa
EnglishenMalayalammlTamilta
GujaratiguMarathimrTelugute

Voices

You select a language by choosing a voicevoice is the request parameter; there is no separate language field. The named voices:
Voice (slug)LanguageGender
omkarMarathiMale
ananyaOdiaFemale
sravaniTeluguFemale
arjunHindiMale
simranPunjabiFemale
madhuriMarathiFemale
priyaHindiFemale
kiranKannadaMale
tejaTeluguMale
meeraMalayalamFemale
Omit voice (or send "") to use the default voice.
Pass the slug (e.g. kiran) as voice. panini is the model name, not a voice. For the live, account-specific list call GET /v1/voices (curl https://api.somya.ai/v1/voices -H "X-API-Key: YOUR_API_KEY").

Streaming response (NDJSON)

The endpoint responds with application/x-ndjson — one JSON object per line:
{"chunk_b64": "<base64-encoded WAV>", "is_final": false}
{"chunk_b64": "<base64-encoded WAV>", "is_final": false}
{"chunk_b64": "<base64-encoded WAV>", "is_final": true}
FieldTypeMeaning
chunk_b64stringBase64 of a complete, self-contained WAV (RIFF header + PCM): 24 kHz, 16-bit, mono.
is_finalbooleanfalse for incremental chunks; the record with true carries the full utterance as a single WAV.
Two ways to consume it:
  • Stream playback — decode each chunk_b64 as it arrives and play the chunks back-to-back to start audio almost immediately.
  • Whole file — ignore partials and keep the is_final: true chunk; it’s the complete WAV, ideal for saving or replay.
Python — save the final WAV
import base64, json, requests

resp = requests.post(
    "https://api.somya.ai/v1/speech/synthesize",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={"text": "Namaste!", "voice": "kiran"},
    stream=True,
)
final_wav = None
for line in resp.iter_lines():
    if not line:
        continue
    rec = json.loads(line)
    if rec.get("is_final"):
        final_wav = base64.b64decode(rec["chunk_b64"])
with open("speech.wav", "wb") as f:
    f.write(final_wav)

Reference audio

ref_audio conditions synthesis on a reference sample (voice cloning / style transfer). For reusable custom voices, upload one via POST /v1/voices and then pass its slug as voice.
The accepted ref_audio encoding (e.g. URL vs. base64) and custom-voice upload requirements are environment-specific — check the API Reference for the current schema before using it in production.