← Back to tutorial

Beehive Acoustic Simulator

Pick one of 10 colony states, hit play, watch what the Pi's microphone would capture, what a simple classifier would conclude — and see the equations driving each piece of the pipeline.

What this is: a conceptual demo. The audio is synthesized to match the frequency profile of each state from the source guide — it is not a recording of real bees. The classifier is a small rule-based heuristic on the live FFT, not a trained ML model. It exists to make the pipeline tangible: real audio in → real preprocessing → real (visual) features → a decision.
🧠 Want to see the ML side? Step through dataset generation, forward pass, backprop, and training on a small neural network →

1 · Colony state

2 · What the mic captures

Waveform (time domain)

Frequency snapshot (0–4 kHz)

Spectrogram (scrolling, 0–4 kHz on the vertical axis)

silent low mid high

3 · What the classifier sees

waiting for audio
200–600 Hz energy
400–800 Hz energy
1–4 kHz energy
Periodicity (modulation)
Dominant frequency
RMS amplitude (%)
How to read it. The waveform shows raw amplitude over time — same as arecord would write. The frequency snapshot is a single FFT slice (what a Mel spectrogram column would summarise). The spectrogram is what your trained CNN actually consumes. The classifier panel runs a tiny ratio rule across the highlighted frequency bands; a real ML model would do the same job from the spectrogram image, just with millions of parameters and far better accuracy.
Try this: pick Varroa-affected, then watch the classifier. Even though there's a faint 225 Hz wing-beat tone in the audio (the historically claimed mite signature), the heuristic will almost always settle on "Healthy buzzing". That's the point — the source guide's v1.1 caveat explains that the 200–300 Hz Varroa band overlaps normal bee activity, and confident mite detection from sound alone is not field-validated. Direct counting (alcohol wash, sugar roll) remains the diagnostic. Acoustic monitoring earns its keep on queenlessness, swarming, and fanning, not on mites.

4 · The math behind it

Six equations that map directly to what you're seeing above.

fs ≥ 2·fmax
Nyquist–Shannon sampling

To capture a 4 kHz upper bound (bee acoustics top out here), you need ≥ 8 kHz sample rate. The Pi production preset uses 16 kHz for headroom.

X[k] = ∑n=0N−1 x[n]·e−j·2πkn/N
Discrete Fourier transform

Converts the time-domain waveform into N frequency bins. The frequency snapshot above is |X[k]| over short windows; the spectrogram is many of those stacked horizontally.

m(f) = 2595·log10(1 + f/700)
Mel scale

Warps linear frequency into a perceptual scale that compresses high frequencies (where bees carry little information) and expands low ones. Why CNN classifiers consume Mel spectrograms instead of raw FFTs.

RMS = √( (1/N)·∑ x[n]² )
Root-mean-square amplitude

What the waveform's "loudness" actually measures. The classifier uses RMS to separate very quiet states (cold cluster, dead colony) from agitated ones (robbing).

Eband = (1/K)·∑k=lohi |X[k]|
Band energy

Average FFT magnitude in a frequency range. The three values in the feature panel (200–600, 400–800, 1–4k Hz) are computed exactly this way.

SNRdB = 20·log10(Psignal / Pnoise)
Signal-to-noise ratio

The preprocessing pipeline discards clips with SNR < 3 dB. Healthy hive interior typically sits 15–40 dB; below 3 dB you're capturing wind, electronics noise, or a dead mic.