← Back to tutorial
Beehive Acoustic Simulator
Pick one of 10 colony states, hit play, watch what the Pi's microphone would capture, what a simple classifier would conclude — and see the equations driving each piece of the pipeline.
What this is: a conceptual demo. The audio is synthesized to match the frequency profile of each state from the source guide — it is not a recording of real bees. The classifier is a small rule-based heuristic on the live FFT, not a trained ML model. It exists to make the pipeline tangible: real audio in → real preprocessing → real (visual) features → a decision.
2 · What the mic captures
Frequency snapshot (0–4 kHz)
Spectrogram (scrolling, 0–4 kHz on the vertical axis)
silent
low
mid
high
3 · What the classifier sees
200–600 Hz energy—
400–800 Hz energy—
1–4 kHz energy—
Periodicity (modulation)—
Dominant frequency—
RMS amplitude (%)—
How to read it. The waveform shows raw amplitude over time — same as arecord would write. The frequency snapshot is a single FFT slice (what a Mel spectrogram column would summarise). The spectrogram is what your trained CNN actually consumes. The classifier panel runs a tiny ratio rule across the highlighted frequency bands; a real ML model would do the same job from the spectrogram image, just with millions of parameters and far better accuracy.
Try this: pick Varroa-affected, then watch the classifier. Even though there's a faint 225 Hz wing-beat tone in the audio (the historically claimed mite signature), the heuristic will almost always settle on "Healthy buzzing". That's the point — the source guide's v1.1 caveat explains that the 200–300 Hz Varroa band overlaps normal bee activity, and confident mite detection from sound alone is not field-validated. Direct counting (alcohol wash, sugar roll) remains the diagnostic. Acoustic monitoring earns its keep on queenlessness, swarming, and fanning, not on mites.
4 · The math behind it
Six equations that map directly to what you're seeing above.
fs ≥ 2·fmax
Nyquist–Shannon sampling
To capture a 4 kHz upper bound (bee acoustics top out here), you need ≥ 8 kHz sample rate. The Pi production preset uses 16 kHz for headroom.
X[k] = ∑n=0N−1 x[n]·e−j·2πkn/N
Discrete Fourier transform
Converts the time-domain waveform into N frequency bins. The frequency snapshot above is |X[k]| over short windows; the spectrogram is many of those stacked horizontally.
m(f) = 2595·log10(1 + f/700)
Mel scale
Warps linear frequency into a perceptual scale that compresses high frequencies (where bees carry little information) and expands low ones. Why CNN classifiers consume Mel spectrograms instead of raw FFTs.
RMS = √( (1/N)·∑ x[n]² )
Root-mean-square amplitude
What the waveform's "loudness" actually measures. The classifier uses RMS to separate very quiet states (cold cluster, dead colony) from agitated ones (robbing).
Eband = (1/K)·∑k=lohi |X[k]|
Band energy
Average FFT magnitude in a frequency range. The three values in the feature panel (200–600, 400–800, 1–4k Hz) are computed exactly this way.
SNRdB = 20·log10(Psignal / Pnoise)
Signal-to-noise ratio
The preprocessing pipeline discards clips with SNR < 3 dB. Healthy hive interior typically sits 15–40 dB; below 3 dB you're capturing wind, electronics noise, or a dead mic.