CH·42 AI & neural audio
verified 2026-06-19

Stable Audio 3 active

Stability AI vunknown added 2026-05-29 verified 2026-06-19

[Use when]

You need to generate high-quality audio and music from text prompts, or edit and inpaint existing audio recordings at scale.

Open homepage at stability.ai
Engines
S Standalone
License

Custom

Pricing

Freemium

Last verified

2026-06-19

Added

2026-05-29

about

Stable Audio 3 is a state-of-the-art generative audio platform built on diffusion transformers and the SAME (Semantic-Acoustic Music Encoder) autoencoder. It supports three core workflows: text-to-audio generation from natural language prompts, audio-to-audio editing with prompt-guided style transfer, and precise inpainting or continuation of specific regions within existing recordings.

The platform offers multiple model sizes: Small models (433M params) run on CPU with no GPU required for lightweight music and SFX generation up to 120 seconds, while the Medium model (1.4B params) delivers higher quality output up to 380 seconds on GPU. Generation speed is measured in milliseconds for multi-second outputs on modern hardware. The SAME autoencoder produces stereo 44.1kHz output at 256-dimensional latents, balancing reconstruction fidelity with generative tractability.

Stable Audio 3 includes LoRA fine-tuning support for personalization, variable-length generation to avoid wasting compute on unused latents, and broad hardware compatibility including CUDA, TensorRT, and Apple Silicon via CoreML. Note: the open-weight models are released under the Stability AI Community License (free for research and for commercial use below a revenue threshold), while the largest model is available via API only — it is open-weight, not OSI-approved open source.