Raveler
Wwise Raveler is a Wwise plugin that runs RAVE (Realtime Audio Variational autoEncoder) models for real-time timbre transfer via neural audio synthesis in game audio contexts. The plugin provides direct integration of trained RAVE models into Wwise effect chains, enabling neural processing of game audio with adjustable latent space manipulation.
The plugin exposes controls for model performance parameters including latent noise injection, prior sampling, and dry/wet mixing. It offers direct manipulation of up to 8 latent dimensions with bias and scaling controls, all of which can be bound to RTPCs for dynamic runtime control. Buffer settings allow balancing between audio quality and latency based on project requirements.
Based on the RAVE VST project, Raveler brings research-grade neural audio synthesis techniques into production game audio workflows through Wwise's standard plugin architecture. Note: the core is released under CC BY-NC 4.0 (non-commercial), which restricts use in commercial products.
Neural Acoustic Fields is a research implementation that models acoustic propagation in physical scenes as a continuous implicit function. By treating sound propagation as a linear time-invariant system, NAF learns to map any emitter-listener location pair to a neural impulse response that can be applied to arbitrary audio sources.
The system enables continuous spatial audio rendering for listeners at any position in a scene, including novel locations not seen during training. NAF learns magnitude-only representations (using random phase similar to Image2Reverb) and demonstrates how acoustic structure emerges as a byproduct of learning spatial sound propagation. The learned representations can also improve visual learning tasks with sparse views.
This is research code from a NeurIPS 2022 paper, providing training and evaluation pipelines for learning acoustic fields from 3D scene data. It includes baseline comparisons against codec-based interpolation methods (AAC-LC, Opus) and tools for analyzing spectral accuracy, T60 error, and learned feature representations.
Stable Audio 3 is a state-of-the-art generative audio platform built on diffusion transformers and the SAME (Semantic-Acoustic Music Encoder) autoencoder. It supports three core workflows: text-to-audio generation from natural language prompts, audio-to-audio editing with prompt-guided style transfer, and precise inpainting or continuation of specific regions within existing recordings.
The platform offers multiple model sizes: Small models (433M params) run on CPU with no GPU required for lightweight music and SFX generation up to 120 seconds, while the Medium model (1.4B params) delivers higher quality output up to 380 seconds on GPU. Generation speed is measured in milliseconds for multi-second outputs on modern hardware. The SAME autoencoder produces stereo 44.1kHz output at 256-dimensional latents, balancing reconstruction fidelity with generative tractability.
Stable Audio 3 includes LoRA fine-tuning support for personalization, variable-length generation to avoid wasting compute on unused latents, and broad hardware compatibility including CUDA, TensorRT, and Apple Silicon via CoreML. Note: the open-weight models are released under the Stability AI Community License (free for research and for commercial use below a revenue threshold), while the largest model is available via API only — it is open-weight, not OSI-approved open source.