feat: Add ModelsLab text-to-speech provider#5165
feat: Add ModelsLab text-to-speech provider#5165adhikjoshi wants to merge 2 commits intoMintplex-Labs:masterfrom
Conversation
Adds ModelsLab (https://modelslab.com) as a TTS provider option in AnythingLLM. ModelsLab offers affordable AI APIs including text-to-speech at $0.0047 per generation with support for multiple English voice variants and languages. Changes: - server/utils/TextToSpeech/modelslab/index.js: New provider class with async polling support for ModelsLab's TTS API - server/utils/TextToSpeech/index.js: Register 'modelslab' provider case - server/utils/helpers/updateENV.js: Add env key mappings + validator - server/models/systemSettings.js: Expose ModelsLab settings to frontend - frontend/src/components/TextToSpeech/ModelsLabOptions/index.jsx: Settings UI - frontend/src/pages/GeneralSettings/AudioPreference/tts.jsx: Add to provider list - frontend/src/media/ttsproviders/modelslab.png: Provider logo Env vars: - TTS_MODELSLAB_API_KEY (required) - TTS_MODELSLAB_VOICE_ID (optional, default: en_us_001) - TTS_MODELSLAB_LANGUAGE (optional, default: english) Closes #(issue)
timothycarambat
left a comment
There was a problem hiding this comment.
I cannot comment on an image, but the icon for the provider may render poorly being so small. Every provider image should be homogenous in size and bg.
- full white BG
- 330x330
png/jpg doesnt matter so much.
| this.apiKey = process.env.TTS_MODELSLAB_API_KEY; | ||
| this.voice = process.env.TTS_MODELSLAB_VOICE_ID ?? ModelsLabTTS.DEFAULT_VOICE; | ||
| this.language = process.env.TTS_MODELSLAB_LANGUAGE ?? "english"; | ||
| this.speed = parseFloat(process.env.TTS_MODELSLAB_SPEED ?? "1"); |
There was a problem hiding this comment.
TTS_MODELSLAB_SPEED is a property here but is not modifiable by the user via the UI or ENV
| async #pollForResult(requestId, maxAttempts = 20) { | ||
| const fetchUrl = "https://modelslab.com/api/v6/voice/fetch"; | ||
| for (let attempt = 0; attempt < maxAttempts; attempt++) { | ||
| await new Promise((r) => setTimeout(r, 3000)); | ||
| const response = await fetch(fetchUrl, { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }), | ||
| }); | ||
| const data = await response.json(); | ||
| if (data.status === "success" && data.output?.length > 0) { | ||
| return await this.#fetchUrl(data.output[0]); | ||
| } | ||
| if (data.status === "error") { | ||
| this.#log("Poll error:", data.message || data.messege || "Unknown error"); | ||
| return null; | ||
| } | ||
| this.#log(`Polling attempt ${attempt + 1}/${maxAttempts}...`); | ||
| } | ||
| this.#log("Timed out waiting for audio generation."); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
There is no async/await for the HTTP request - you have to poll? This seems like a large error surface since a provider failure to process the job can lead to retrying until it dies to timeouts. Are there any docs around this endpoint?
3s flat is an approach, but an exp backoff might make more sense here? I am not sure what the performance is like for this provider to return audio
| const response = await fetch(fetchUrl, { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| body: JSON.stringify({ key: this.apiKey, request_id: String(requestId) }), |
There was a problem hiding this comment.
The API key is sent as a body param and not an Authorization Header?
| @@ -0,0 +1,123 @@ | |||
| const https = require("https"); | |||
| const http = require("http"); | |||
| const { URL } = require("url"); | |||
There was a problem hiding this comment.
URL should be available globally, no need to import
| #fetchUrl(url) { | ||
| return new Promise((resolve, reject) => { | ||
| const parsedUrl = new URL(url); | ||
| const transport = parsedUrl.protocol === "https:" ? https : http; | ||
| transport.get(url, (res) => { | ||
| const chunks = []; | ||
| res.on("data", (chunk) => chunks.push(chunk)); | ||
| res.on("end", () => resolve(Buffer.concat(chunks))); | ||
| res.on("error", reject); | ||
| }).on("error", reject); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Is this necessary? Could just return an ArrayBuffer in this case I think
/**
* Fetches a URL and returns the response body as a Buffer.
* @param {string} url
* @returns {Promise<Buffer>}
*/
async #fetchUrl(url) {
const response = await fetch(url);
if (!response.ok) throw new Error(`Failed to fetch audio: ${response.statusText}`);
const arrayBuffer = await response.arrayBuffer();
return Buffer.from(arrayBuffer);
}| const MODELSLAB_VOICES = [ | ||
| { value: "en_us_001", label: "English (US) - Voice 1" }, | ||
| { value: "en_us_006", label: "English (US) - Voice 2" }, | ||
| { value: "en_us_007", label: "English (US) - Voice 3" }, | ||
| { value: "en_us_009", label: "English (US) - Voice 4" }, | ||
| { value: "en_us_010", label: "English (US) - Voice 5" }, | ||
| { value: "en_uk_001", label: "English (UK) - Voice 1" }, | ||
| { value: "en_uk_003", label: "English (UK) - Voice 2" }, | ||
| { value: "en_au_001", label: "English (AU) - Voice 1" }, | ||
| { value: "en_au_002", label: "English (AU) - Voice 2" }, | ||
| ]; | ||
|
|
||
| const MODELSLAB_LANGUAGES = [ | ||
| { value: "english", label: "English" }, | ||
| { value: "spanish", label: "Spanish" }, | ||
| { value: "french", label: "French" }, | ||
| { value: "german", label: "German" }, | ||
| { value: "italian", label: "Italian" }, | ||
| { value: "portuguese", label: "Portuguese" }, | ||
| { value: "polish", label: "Polish" }, | ||
| { value: "hindi", label: "Hindi" }, | ||
| ]; |
There was a problem hiding this comment.
Is it possible to have language=French but voice be English (UK) - Voice2? I am not sure if that kind of combination is possible.
Additionally, do we have any insight into how often voices are updated or added? This list will not be actively maintained by the team so it can be out of date quickly.
If there is a way to pull from a GET /voice/models or something and render the dynamic list to the user would be best so its always current.
|
Thanks for the review @timothycarambat! Let me address your comments:
Which changes would you like me to prioritize? Should I push fixes for #2, #4, #5 first? |
|
Thanks for the review @timothycarambat! Let me address your comments:
Which changes would you like me to prioritize? Should I push fixes for #2, #4, #5 first? |
… removed unused imports
|
I have pushed fixes addressing your review comments:
The PR has been updated with these changes. Let me know if there are any other adjustments needed! |
Description
Adds ModelsLab as a text-to-speech provider in AnythingLLM's TTS settings.
Closes #5164
What is ModelsLab?
ModelsLab is an AI API platform offering affordable text-to-speech, image generation, video, and more. Their TTS API costs $0.0047 per generation with multi-language support and multiple voice presets.
API docs: https://docs.modelslab.com/text-to-speech/overview
What this PR adds
A new
modelslaboption in the Text-to-Speech Preferences settings panel, following the same pattern as the existing ElevenLabs and OpenAI-compatible providers.Files changed
server/utils/TextToSpeech/modelslab/index.jsttsBuffer()+ async pollingserver/utils/TextToSpeech/index.jsmodelslabcaseserver/utils/helpers/updateENV.jsserver/models/systemSettings.jsfrontend/src/components/TextToSpeech/ModelsLabOptions/index.jsxfrontend/src/pages/GeneralSettings/AudioPreference/tts.jsxfrontend/src/media/ttsproviders/modelslab.pngNew environment variables
How to test
Checklist
ttsBuffer()interface)status: success) and async (status: processing) API responsesTTS_MODELSLAB_*)supportedTTSProvidervalidatornullreturn on failure