Skip to content

feat: Add ModelsLab text-to-speech provider#5165

Open
adhikjoshi wants to merge 2 commits intoMintplex-Labs:masterfrom
adhikjoshi:feat/modelslab-tts-provider
Open

feat: Add ModelsLab text-to-speech provider#5165
adhikjoshi wants to merge 2 commits intoMintplex-Labs:masterfrom
adhikjoshi:feat/modelslab-tts-provider

Conversation

@adhikjoshi
Copy link

Description

Adds ModelsLab as a text-to-speech provider in AnythingLLM's TTS settings.

Closes #5164


What is ModelsLab?

ModelsLab is an AI API platform offering affordable text-to-speech, image generation, video, and more. Their TTS API costs $0.0047 per generation with multi-language support and multiple voice presets.

API docs: https://docs.modelslab.com/text-to-speech/overview


What this PR adds

A new modelslab option in the Text-to-Speech Preferences settings panel, following the same pattern as the existing ElevenLabs and OpenAI-compatible providers.

Files changed

File Change
server/utils/TextToSpeech/modelslab/index.js NEW — Provider class with ttsBuffer() + async polling
server/utils/TextToSpeech/index.js Add modelslab case
server/utils/helpers/updateENV.js Add env key mappings + validator
server/models/systemSettings.js Expose settings to frontend
frontend/src/components/TextToSpeech/ModelsLabOptions/index.jsx NEW — Settings form
frontend/src/pages/GeneralSettings/AudioPreference/tts.jsx Add to provider list
frontend/src/media/ttsproviders/modelslab.png NEW — Provider logo

New environment variables

TTS_MODELSLAB_API_KEY=your_api_key  # required
TTS_MODELSLAB_VOICE_ID=en_us_001    # optional (default: en_us_001)
TTS_MODELSLAB_LANGUAGE=english      # optional (default: english)

How to test

  1. Get an API key from https://modelslab.com/dashboard/api-keys
  2. In AnythingLLM → Settings → Text-to-Speech → select ModelsLab
  3. Enter your API key, select a voice and language
  4. Save and trigger a TTS response in any chat

Checklist

  • Follows existing TTS provider pattern (ttsBuffer() interface)
  • Handles both sync (status: success) and async (status: processing) API responses
  • Proper env var naming conventions (TTS_MODELSLAB_*)
  • Settings UI matches existing provider forms
  • Added to supportedTTSProvider validator
  • No hardcoded API keys or unnecessary dependencies
  • Error handling with graceful null return on failure

Adds ModelsLab (https://modelslab.com) as a TTS provider option in AnythingLLM.

ModelsLab offers affordable AI APIs including text-to-speech at $0.0047 per
generation with support for multiple English voice variants and languages.

Changes:
- server/utils/TextToSpeech/modelslab/index.js: New provider class with
  async polling support for ModelsLab's TTS API
- server/utils/TextToSpeech/index.js: Register 'modelslab' provider case
- server/utils/helpers/updateENV.js: Add env key mappings + validator
- server/models/systemSettings.js: Expose ModelsLab settings to frontend
- frontend/src/components/TextToSpeech/ModelsLabOptions/index.jsx: Settings UI
- frontend/src/pages/GeneralSettings/AudioPreference/tts.jsx: Add to provider list
- frontend/src/media/ttsproviders/modelslab.png: Provider logo

Env vars:
- TTS_MODELSLAB_API_KEY (required)
- TTS_MODELSLAB_VOICE_ID (optional, default: en_us_001)
- TTS_MODELSLAB_LANGUAGE (optional, default: english)

Closes #(issue)
@timothycarambat timothycarambat added the Integration Request Request for support of a new LLM, Embedder, or Vector database label Mar 10, 2026
Copy link
Member

@timothycarambat timothycarambat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment on an image, but the icon for the provider may render poorly being so small. Every provider image should be homogenous in size and bg.

  • full white BG
  • 330x330

png/jpg doesnt matter so much.

this.apiKey = process.env.TTS_MODELSLAB_API_KEY;
this.voice = process.env.TTS_MODELSLAB_VOICE_ID ?? ModelsLabTTS.DEFAULT_VOICE;
this.language = process.env.TTS_MODELSLAB_LANGUAGE ?? "english";
this.speed = parseFloat(process.env.TTS_MODELSLAB_SPEED ?? "1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTS_MODELSLAB_SPEED is a property here but is not modifiable by the user via the UI or ENV

Comment on lines +58 to +79
async #pollForResult(requestId, maxAttempts = 20) {
const fetchUrl = "https://modelslab.com/api/v6/voice/fetch";
for (let attempt = 0; attempt < maxAttempts; attempt++) {
await new Promise((r) => setTimeout(r, 3000));
const response = await fetch(fetchUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }),
});
const data = await response.json();
if (data.status === "success" && data.output?.length > 0) {
return await this.#fetchUrl(data.output[0]);
}
if (data.status === "error") {
this.#log("Poll error:", data.message || data.messege || "Unknown error");
return null;
}
this.#log(`Polling attempt ${attempt + 1}/${maxAttempts}...`);
}
this.#log("Timed out waiting for audio generation.");
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no async/await for the HTTP request - you have to poll? This seems like a large error surface since a provider failure to process the job can lead to retrying until it dies to timeouts. Are there any docs around this endpoint?

3s flat is an approach, but an exp backoff might make more sense here? I am not sure what the performance is like for this provider to return audio

const response = await fetch(fetchUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API key is sent as a body param and not an Authorization Header?

@@ -0,0 +1,123 @@
const https = require("https");
const http = require("http");
const { URL } = require("url");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL should be available globally, no need to import

Comment on lines +39 to +50
#fetchUrl(url) {
return new Promise((resolve, reject) => {
const parsedUrl = new URL(url);
const transport = parsedUrl.protocol === "https:" ? https : http;
transport.get(url, (res) => {
const chunks = [];
res.on("data", (chunk) => chunks.push(chunk));
res.on("end", () => resolve(Buffer.concat(chunks)));
res.on("error", reject);
}).on("error", reject);
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Could just return an ArrayBuffer in this case I think

/**
   * Fetches a URL and returns the response body as a Buffer.
   * @param {string} url
   * @returns {Promise<Buffer>}
   */
  async #fetchUrl(url) {
    const response = await fetch(url);
    if (!response.ok) throw new Error(`Failed to fetch audio: ${response.statusText}`);
    const arrayBuffer = await response.arrayBuffer();
    return Buffer.from(arrayBuffer);
  }

Comment on lines +1 to +22
const MODELSLAB_VOICES = [
{ value: "en_us_001", label: "English (US) - Voice 1" },
{ value: "en_us_006", label: "English (US) - Voice 2" },
{ value: "en_us_007", label: "English (US) - Voice 3" },
{ value: "en_us_009", label: "English (US) - Voice 4" },
{ value: "en_us_010", label: "English (US) - Voice 5" },
{ value: "en_uk_001", label: "English (UK) - Voice 1" },
{ value: "en_uk_003", label: "English (UK) - Voice 2" },
{ value: "en_au_001", label: "English (AU) - Voice 1" },
{ value: "en_au_002", label: "English (AU) - Voice 2" },
];

const MODELSLAB_LANGUAGES = [
{ value: "english", label: "English" },
{ value: "spanish", label: "Spanish" },
{ value: "french", label: "French" },
{ value: "german", label: "German" },
{ value: "italian", label: "Italian" },
{ value: "portuguese", label: "Portuguese" },
{ value: "polish", label: "Polish" },
{ value: "hindi", label: "Hindi" },
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have language=French but voice be English (UK) - Voice2? I am not sure if that kind of combination is possible.

Additionally, do we have any insight into how often voices are updated or added? This list will not be actively maintained by the team so it can be out of date quickly.

If there is a way to pull from a GET /voice/models or something and render the dynamic list to the user would be best so its always current.

@timothycarambat timothycarambat added blocked and removed Integration Request Request for support of a new LLM, Embedder, or Vector database labels Mar 10, 2026
@adhikjoshi
Copy link
Author

Thanks for the review @timothycarambat! Let me address your comments:

  1. TTS_MODELSLAB_SPEED - You're right, it's defined but not exposed in the UI. I can add it to the frontend options if you'd like, or remove it from the backend. Let me know your preference.

  2. Polling approach - Yes, ModelsLab uses an async pattern where you submit the request and poll for the result. This is documented at https://docs.modelslab.com/voice-cloning. I'll implement exponential backoff (1s, 2s, 4s...) instead of fixed 3s intervals - that's a good improvement suggestion.

  3. API key in body - Correct, ModelsLab uses key-in-body auth () rather than Bearer token. This is their API design - I can add a note in the docs if helpful.

  4. URL import - Good catch! I'll remove the explicit import and use the global URL.

  5. #fetchUrl simplification - Yes, I can simplify this to use native fetch with ArrayBuffer. I'll update.

  6. Language/Voice combination - Looking at the ModelsLab API, the language parameter affects how the voice is interpreted. You could theoretically set language=French but use an English voice, but the quality would vary. This is a ModelsLab API behavior, not something I can change in the integration.

  7. Dynamic voice list - That's a great suggestion! I'll check if ModelsLab has a GET endpoint for available voices. If not, we could add a refresh button to manually reload the list.

Which changes would you like me to prioritize? Should I push fixes for #2, #4, #5 first?

@adhikjoshi
Copy link
Author

Thanks for the review @timothycarambat! Let me address your comments:

  1. TTS_MODELSLAB_SPEED - You are right, it is defined but not exposed in the UI. I can add it to the frontend options if you would like, or remove it from the backend. Let me know your preference.

  2. Polling approach - Yes, ModelsLab uses an async pattern where you submit the request and poll for the result. This is documented at https://docs.modelslab.com/voice-cloning. I will implement exponential backoff (1s, 2s, 4s...) instead of fixed 3s intervals - that is a good improvement suggestion.

  3. API key in body - Correct, ModelsLab uses key-in-body auth (key in JSON body) rather than Bearer token. This is their API design.

  4. URL import - Good catch! I will remove the explicit import and use the global URL.

  5. #fetchUrl simplification - Yes, I can simplify this to use native fetch with ArrayBuffer. I will update.

  6. Language/Voice combination - Looking at the ModelsLab API, the language parameter affects how the voice is interpreted. You could theoretically set language=French but use an English voice, but the quality would vary.

  7. Dynamic voice list - That is a great suggestion! I will check if ModelsLab has a GET endpoint for available voices. If not, we could add a refresh button to manually reload the list.

Which changes would you like me to prioritize? Should I push fixes for #2, #4, #5 first?

@adhikjoshi
Copy link
Author

I have pushed fixes addressing your review comments:

  1. Removed URL import - now uses global URL
  2. Simplified #fetchUrl - now uses native fetch with ArrayBuffer (as you suggested)
  3. Exponential backoff - polling now starts at 1s and increases by 1s up to 5s max
  4. Removed unused requires - cleaned up unused https/http/URL imports

The PR has been updated with these changes. Let me know if there are any other adjustments needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add ModelsLab text-to-speech provider

2 participants