Synthetic data enables context‐aware bioacoustic sound event detection.

We propose a methodology for training foundation models that enhances their in‐context learning capabilities within the domain of bioacoustic signal processing. Using synthetically generated training data and a domain‐randomization pipeline, we construct diverse acoustic scenes with temporally strong labels. Our model significantly outperforms previous methods and is available via an API to support conservation and biodiversity monitoring.

Examples: Synthetic Training Audio

The first thirty seconds is the support audio, and the remaining ten seconds is the query.

Examples: FASD13 Benchmark

Newly released bioacoustic datasets

Cluster Examples

Example sounds from pseudovox clusters used in synthetic audio generation.