Neural Acoustic Fields is a research implementation that models acoustic propagation in physical scenes as a continuous implicit function. By treating sound propagation as a linear time-invariant system, NAF learns to map any emitter-listener location pair to a neural impulse response that can be applied to arbitrary audio sources.
The system enables continuous spatial audio rendering for listeners at any position in a scene, including novel locations not seen during training. NAF learns magnitude-only representations (using random phase similar to Image2Reverb) and demonstrates how acoustic structure emerges as a byproduct of learning spatial sound propagation. The learned representations can also improve visual learning tasks with sparse views.
This is research code from a NeurIPS 2022 paper, providing training and evaluation pipelines for learning acoustic fields from 3D scene data. It includes baseline comparisons against codec-based interpolation methods (AAC-LC, Opus) and tools for analyzing spectral accuracy, T60 error, and learned feature representations.
N
NoiseBandNet
Standalone NoiseBandNet is a neural network architecture for synthesizing controllable sound effects using filterbanks. It provides multiple control schemes: automatic extraction using loudness and spectral centroid, loudness-only control for loudness transfer between sounds, and user-defined control parameters drawn directly on spectrograms. The system uses a DDSP-inspired approach with learned filter banks, allowing real-time parameter manipulation and amplitude randomization for variations.
The tool includes training workflows for custom sound effect datasets and inference notebooks demonstrating loudness transfer, amplitude randomization for stereo generation, and custom control curve synthesis. Users can train models on their own sound libraries and define control parameters through an interactive labeling interface that displays waveforms and spectrograms.
Implemented in PyTorch, NoiseBandNet outputs controllable synthesis parameters that can be manipulated post-training without retraining, making it suitable for adaptive sound design and procedural audio generation in interactive contexts.
Stable Audio 3 is a state-of-the-art generative audio platform built on diffusion transformers and the SAME (Semantic-Acoustic Music Encoder) autoencoder. It supports three core workflows: text-to-audio generation from natural language prompts, audio-to-audio editing with prompt-guided style transfer, and precise inpainting or continuation of specific regions within existing recordings.
The platform offers multiple model sizes: Small models (433M params) run on CPU with no GPU required for lightweight music and SFX generation up to 120 seconds, while the Medium model (1.4B params) delivers higher quality output up to 380 seconds on GPU. Generation speed is measured in milliseconds for multi-second outputs on modern hardware. The SAME autoencoder produces stereo 44.1kHz output at 256-dimensional latents, balancing reconstruction fidelity with generative tractability.
Stable Audio 3 includes LoRA fine-tuning support for personalization, variable-length generation to avoid wasting compute on unused latents, and broad hardware compatibility including CUDA, TensorRT, and Apple Silicon via CoreML. Note: the open-weight models are released under the Stability AI Community License (free for research and for commercial use below a revenue threshold), while the largest model is available via API only — it is open-weight, not OSI-approved open source.
barelyMusician is a real-time music engine designed for interactive systems that generates and performs musical sounds programmatically with sample-accurate timing. It provides a modern C/C++ API for creating instruments, performers, and musical tasks that can be sequenced and synchronized precisely.
The engine supports procedural note control, tempo management, looping performers, and task-based event scheduling. It processes audio synchronously and is designed for integration into real-time audio applications where predictable timing and low latency are critical.
The project includes native plugins for Unity and Godot, a VST instrument plugin, and builds for multiple platforms including Windows, macOS, Linux, Android, WebAssembly, and embedded hardware (Daisy). It offers an alternative to asset-based approaches when you need fully generative or algorithmically controlled musical content.