Raveler
Wwise Raveler is a Wwise plugin that runs RAVE (Realtime Audio Variational autoEncoder) models for real-time timbre transfer via neural audio synthesis in game audio contexts. The plugin provides direct integration of trained RAVE models into Wwise effect chains, enabling neural processing of game audio with adjustable latent space manipulation.
The plugin exposes controls for model performance parameters including latent noise injection, prior sampling, and dry/wet mixing. It offers direct manipulation of up to 8 latent dimensions with bias and scaling controls, all of which can be bound to RTPCs for dynamic runtime control. Buffer settings allow balancing between audio quality and latency based on project requirements.
Based on the RAVE VST project, Raveler brings research-grade neural audio synthesis techniques into production game audio workflows through Wwise's standard plugin architecture. Note: the core is released under CC BY-NC 4.0 (non-commercial), which restricts use in commercial products.
N
NoiseBandNet
Standalone NoiseBandNet is a neural network architecture for synthesizing controllable sound effects using filterbanks. It provides multiple control schemes: automatic extraction using loudness and spectral centroid, loudness-only control for loudness transfer between sounds, and user-defined control parameters drawn directly on spectrograms. The system uses a DDSP-inspired approach with learned filter banks, allowing real-time parameter manipulation and amplitude randomization for variations.
The tool includes training workflows for custom sound effect datasets and inference notebooks demonstrating loudness transfer, amplitude randomization for stereo generation, and custom control curve synthesis. Users can train models on their own sound libraries and define control parameters through an interactive labeling interface that displays waveforms and spectrograms.
Implemented in PyTorch, NoiseBandNet outputs controllable synthesis parameters that can be manipulated post-training without retraining, making it suitable for adaptive sound design and procedural audio generation in interactive contexts.
Stable Audio 3 is a state-of-the-art generative audio platform built on diffusion transformers and the SAME (Semantic-Acoustic Music Encoder) autoencoder. It supports three core workflows: text-to-audio generation from natural language prompts, audio-to-audio editing with prompt-guided style transfer, and precise inpainting or continuation of specific regions within existing recordings.
The platform offers multiple model sizes: Small models (433M params) run on CPU with no GPU required for lightweight music and SFX generation up to 120 seconds, while the Medium model (1.4B params) delivers higher quality output up to 380 seconds on GPU. Generation speed is measured in milliseconds for multi-second outputs on modern hardware. The SAME autoencoder produces stereo 44.1kHz output at 256-dimensional latents, balancing reconstruction fidelity with generative tractability.
Stable Audio 3 includes LoRA fine-tuning support for personalization, variable-length generation to avoid wasting compute on unused latents, and broad hardware compatibility including CUDA, TensorRT, and Apple Silicon via CoreML. Note: the open-weight models are released under the Stability AI Community License (free for research and for commercial use below a revenue threshold), while the largest model is available via API only — it is open-weight, not OSI-approved open source.