Beehive Acoustic Monitor

Hard · weekend build + ongoing data work · Pi Zero 2 W or Pi 4 + I²S MEMS mic

Honey bees produce sounds that systematically change with colony state — queen presence, swarming readiness, fanning, hissing, disease. By pairing a small Raspberry Pi with a MEMS microphone inside the hive, you can record continuously, non-invasively, and feed the audio into a classifier that flags interesting events for the beekeeper. This tutorial walks the full path: wire the mic, sample correctly, preprocess the audio, and train a small model.

🔊 Try the interactive simulator → Listen to synthesized hive states and watch a live classifier respond. Useful for getting an intuitive feel for what each colony state sounds like before you build the rig.

🧠 Want the ML side? → Step through dataset generation, forward pass, backprop, and live training on a small neural network. Pairs with the playback simulator to show the full audio → features → classifier pipeline end-to-end.

Credit and source material. This tutorial adapts the Beehive Acoustic Sensor Design guide (v1.1, May 2026) from the Melbourne AI Hub / AusHive Audio Lab. The original is a 21-page technical report covering the same pipeline with ESP32-S3 as the production platform — refer to it for the deeper biology, ML, and field protocols. The version here is Pi-first, opinionated for a single hive prototype.

You'll need

Raspberry Pi Zero 2 W (~$15, low-power, perfect for in-hive deployment) or Pi 4 (more headroom if you want to train/inference on-device)
I²S MEMS microphone — Adafruit SPH0645LM4H breakout ($7-8) for prototyping, or the bare ICS-43434 (~$3) if you're comfortable soldering. Both are flat from 50 Hz to 20 kHz with ~65 dB SNR.
microSD card (32 GB+), short jumper wires, 3 mm GORE-TEX patch for an acoustic vent
IP65 ABS project box, PG7 cable glands, silica desiccant packs
Optional but recommended: solar panel + Li-Ion cell + TP4056 charger for autonomous operation
Raspberry Pi OS Lite (64-bit)

1. Wire the mic to the Pi (I²S)

I²S sends digital audio over three wires plus power, so it's immune to the noise that plagues long analog mic runs inside a hive.

SPH0645 / ICS-43434	Pi GPIO	Pin #
3V	3.3 V	1
GND	GND	6
BCLK	GPIO 18 (PCM_CLK)	12
LRCL	GPIO 19 (PCM_FS)	35
DOUT	GPIO 20 (PCM_DIN)	38
SEL	GND (left channel)	any GND

Enable the I²S kernel module by appending to /boot/firmware/config.txt (or /boot/config.txt on older OS images):

dtparam=i2s=on
dtoverlay=googlevoicehat-soundcard

Reboot, then confirm the card appears:

arecord -l

You should see a snd_rpi_googlevoicehat_soundcar device. That's your mic — address it as plughw:0.

Why the googlevoicehat overlay? It's a generic I²S driver Google ships in mainline Pi OS that happens to work for any single-mic I²S input. You're not using a real Voice HAT — just borrowing its driver. Alternative overlays like i2s-mmap work too if you prefer.

2. Set the right sampling parameters

Most beehive sound energy sits below 1 kHz, with useful information up to ~4 kHz. Anything beyond that wastes storage and power.

Parameter	Edge / production	Research-grade
Sample rate	16 kHz	48 kHz
Bit depth	16-bit	24-bit
Clip length	5 sec	10 sec
Interval	every 15 min	every 5 min
Daily volume	~14 MB (WAV)	~150 MB (WAV)

For a single hive on a Pi Zero 2 W with Wi-Fi upload, 16 kHz / 16-bit / 5 s every 15 min is the sweet spot. FLAC compression knocks ~50% off the daily volume if storage matters.

3. Capture clips on a schedule

A tiny shell loop is enough. Save as /home/pi/record_hive.sh:

#!/usr/bin/env bash
set -euo pipefail
OUT=/home/pi/recordings
mkdir -p "$OUT"
TS=$(date +%Y-%m-%d_%H-%M-%S)
arecord -D plughw:0 -f S16_LE -r 16000 -c 1 -d 5 \
  "$OUT/$TS.wav"
# Optional: compress to FLAC and remove WAV
flac --silent --delete-input-file "$OUT/$TS.wav"

Drive it from cron — every 15 minutes:

*/15 * * * * /home/pi/record_hive.sh >> /home/pi/recordings/hive.log 2>&1

For continuous (research-grade) recording, see the Acoustic Sound Recorder tutorial for the systemd service pattern with --max-file-time.

4. Preprocess before features

Raw clips go through bandpass filtering, a noise-floor gate (important — see callout), normalisation, and windowing. This is the v1.1-fixed pipeline from the source guide:

import librosa, numpy as np
from scipy import signal

def preprocess(path, sr=16000):
    y, _ = librosa.load(path, sr=sr, mono=True)
    y -= np.mean(y)                                    # DC removal
    sos = signal.butter(4, [20, 4000], btype='bandpass',
                        fs=sr, output='sos')
    y = signal.sosfilt(sos, y)                         # bandpass 20-4000 Hz

    # Critical: gate near-silence BEFORE normalisation.
    # Otherwise a dead mic or winter cluster silence gets amplified
    # to full scale and pollutes training data with pure noise.
    if np.max(np.abs(y)) < 0.01:
        return None
    y = y / np.max(np.abs(y))                          # peak normalise

    # 5-sec windows, 50% overlap
    win = sr * 5
    hop = win // 2
    windows = [y[s:s+win] for s in range(0, len(y) - win + 1, hop)]
    return {'audio': y, 'windows': windows, 'sr': sr}

Heads up — this gate is the v1.1 bug fix. The earlier version of the published pipeline applied peak normalisation unconditionally, which silently destroyed SNR on silent clips. If you're following older references online, add the amplitude check before y / max(abs(y)).

5. Extract features

Two standard choices for audio classification — pick based on where the model runs:

Mel spectrogram (~64 × 313 for a 5 s clip at 16 kHz): primary feature for CNN classifiers. Heavier but expressive.
13 MFCCs + delta + delta-delta (~39 features × time → summarise to 156-d vector): compact, great for edge inference and classical ML.

With librosa:

mel = librosa.feature.melspectrogram(y=audio, sr=16000, n_mels=64, fmax=4000)
log_mel = librosa.power_to_db(mel)

mfcc  = librosa.feature.mfcc(y=audio, sr=16000, n_mfcc=13)
delta = librosa.feature.delta(mfcc)
ddelt = librosa.feature.delta(mfcc, order=2)
features = np.concatenate([mfcc, delta, ddelt], axis=0)  # (39, T)

6. Train a baseline classifier

Start with the simplest formulation: binary healthy vs stressed. A CNN on log-Mel spectrograms gets you to 90%+ accuracy on the Kaggle Beehive Audio Dataset with a few thousand clips per class.

Optimiser: AdamW, weight decay 1e-4, cosine annealing
Label smoothing: 0.1 (bee labels are inherently noisy)
Augmentation: gain jitter ±20%, time shift ±1 s, SpecAugment (time mask ≤20 frames, freq mask ≤8 bins). Aim for 500+ clips per class after augmentation.
Loss: weighted cross-entropy if classes are imbalanced (queenless and dead are always rare)

The single most important rule: never randomly split clips into train/val/test. Clips from the same hive in the same hour are nearly identical and cause severe leakage — your accuracy will look amazing and collapse in the field. Always use leave-hive-out cross-validation: hold out an entire hive's recordings for validation, rotate across at least 5 hives.

7. Deploy back to the Pi (optional)

Once trained, convert to TFLite and run on-device so only labels (not audio) leave the hive:

import tensorflow as tf
conv = tf.lite.TFLiteConverter.from_saved_model('beehive_model')
conv.optimizations = [tf.lite.Optimize.DEFAULT]
conv.target_spec.supported_types = [tf.int8]   # INT8 quantisation
open('beehive_model_int8.tflite', 'wb').write(conv.convert())

A quantised 1D-CNN on MFCC summaries fits in ~50-100 KB and infers in milliseconds on a Pi Zero 2 W. Run it from the cron script after each capture and POST the label to a small endpoint instead of uploading audio.

Hive placement

Centre of the brood box, between frames 4 and 5. Avoid top bars (vibration) and frame edges (dead acoustics).
Mic membrane facing downward so wax and propolis don't accumulate.
~100 mm above the bottom board, ~20 cm from the entrance (cuts wind and outside insect noise).
Seal the enclosure fully and use silica gel packs inside — the 35°C hive vs. ambient temp gradient will cause condensation on any vented box. Replace desiccant every 6 months.

Bill of materials (per hive)

Item	Notes	~AUD
Pi Zero 2 W	or ESP32-S3 for lower power	$25
ICS-43434 / SPH0645	I²S MEMS mic	$3–8
microSD 32 GB	Class 10	$5
IP65 enclosure + cable glands	ABS, PG7	$7
GORE-TEX acoustic vent	3 mm patch	$1
Solar panel + Li-Ion + TP4056	autonomous power	$22
Desiccant packs	silica gel	$1
Total		~$65

Where to go next

Public datasets to bootstrap with: Kaggle Beehive Audio Dataset (8k clips, 5 classes), OSBH Sound Dataset (multi-year), MLBeeHive (University of Bologna, ground-truth inspection linked)
Open-source platforms: Open Source Beehives, Edge Impulse for end-to-end TinyML
Important caveat: Varroa-by-acoustics is not field-validated. The 225 Hz mite wing-beat signature you'll see cited in older papers overlaps normal bee wing noise and cannot be reliably separated outside controlled lab settings. Use alcohol wash or sugar roll counting as the diagnostic, and treat acoustic Varroa signals as a research pointer only.
Remote access to recordings: pair with the Cloudflare Tunnel tutorial for a private endpoint without opening router ports.