← All projects

Beehive Acoustic Monitor

Hard · weekend build + ongoing data work · Pi Zero 2 W or Pi 4 + I²S MEMS mic

Honey bees produce sounds that systematically change with colony state — queen presence, swarming readiness, fanning, hissing, disease. By pairing a small Raspberry Pi with a MEMS microphone inside the hive, you can record continuously, non-invasively, and feed the audio into a classifier that flags interesting events for the beekeeper. This tutorial walks the full path: wire the mic, sample correctly, preprocess the audio, and train a small model.

🔊 Try the interactive simulator → Listen to synthesized hive states and watch a live classifier respond. Useful for getting an intuitive feel for what each colony state sounds like before you build the rig.
🧠 Want the ML side? → Step through dataset generation, forward pass, backprop, and live training on a small neural network. Pairs with the playback simulator to show the full audio → features → classifier pipeline end-to-end.
Credit and source material. This tutorial adapts the Beehive Acoustic Sensor Design guide (v1.1, May 2026) from the Melbourne AI Hub / AusHive Audio Lab. The original is a 21-page technical report covering the same pipeline with ESP32-S3 as the production platform — refer to it for the deeper biology, ML, and field protocols. The version here is Pi-first, opinionated for a single hive prototype.

You'll need

1. Wire the mic to the Pi (I²S)

I²S sends digital audio over three wires plus power, so it's immune to the noise that plagues long analog mic runs inside a hive.

SPH0645 / ICS-43434Pi GPIOPin #
3V3.3 V1
GNDGND6
BCLKGPIO 18 (PCM_CLK)12
LRCLGPIO 19 (PCM_FS)35
DOUTGPIO 20 (PCM_DIN)38
SELGND (left channel)any GND

Enable the I²S kernel module by appending to /boot/firmware/config.txt (or /boot/config.txt on older OS images):

dtparam=i2s=on
dtoverlay=googlevoicehat-soundcard

Reboot, then confirm the card appears:

arecord -l

You should see a snd_rpi_googlevoicehat_soundcar device. That's your mic — address it as plughw:0.

Why the googlevoicehat overlay? It's a generic I²S driver Google ships in mainline Pi OS that happens to work for any single-mic I²S input. You're not using a real Voice HAT — just borrowing its driver. Alternative overlays like i2s-mmap work too if you prefer.

2. Set the right sampling parameters

Most beehive sound energy sits below 1 kHz, with useful information up to ~4 kHz. Anything beyond that wastes storage and power.

ParameterEdge / productionResearch-grade
Sample rate16 kHz48 kHz
Bit depth16-bit24-bit
Clip length5 sec10 sec
Intervalevery 15 minevery 5 min
Daily volume~14 MB (WAV)~150 MB (WAV)

For a single hive on a Pi Zero 2 W with Wi-Fi upload, 16 kHz / 16-bit / 5 s every 15 min is the sweet spot. FLAC compression knocks ~50% off the daily volume if storage matters.

3. Capture clips on a schedule

A tiny shell loop is enough. Save as /home/pi/record_hive.sh:

#!/usr/bin/env bash
set -euo pipefail
OUT=/home/pi/recordings
mkdir -p "$OUT"
TS=$(date +%Y-%m-%d_%H-%M-%S)
arecord -D plughw:0 -f S16_LE -r 16000 -c 1 -d 5 \
  "$OUT/$TS.wav"
# Optional: compress to FLAC and remove WAV
flac --silent --delete-input-file "$OUT/$TS.wav"

Drive it from cron — every 15 minutes:

*/15 * * * * /home/pi/record_hive.sh >> /home/pi/recordings/hive.log 2>&1

For continuous (research-grade) recording, see the Acoustic Sound Recorder tutorial for the systemd service pattern with --max-file-time.

4. Preprocess before features

Raw clips go through bandpass filtering, a noise-floor gate (important — see callout), normalisation, and windowing. This is the v1.1-fixed pipeline from the source guide:

import librosa, numpy as np
from scipy import signal

def preprocess(path, sr=16000):
    y, _ = librosa.load(path, sr=sr, mono=True)
    y -= np.mean(y)                                    # DC removal
    sos = signal.butter(4, [20, 4000], btype='bandpass',
                        fs=sr, output='sos')
    y = signal.sosfilt(sos, y)                         # bandpass 20-4000 Hz

    # Critical: gate near-silence BEFORE normalisation.
    # Otherwise a dead mic or winter cluster silence gets amplified
    # to full scale and pollutes training data with pure noise.
    if np.max(np.abs(y)) < 0.01:
        return None
    y = y / np.max(np.abs(y))                          # peak normalise

    # 5-sec windows, 50% overlap
    win = sr * 5
    hop = win // 2
    windows = [y[s:s+win] for s in range(0, len(y) - win + 1, hop)]
    return {'audio': y, 'windows': windows, 'sr': sr}
Heads up — this gate is the v1.1 bug fix. The earlier version of the published pipeline applied peak normalisation unconditionally, which silently destroyed SNR on silent clips. If you're following older references online, add the amplitude check before y / max(abs(y)).

5. Extract features

Two standard choices for audio classification — pick based on where the model runs:

With librosa:

mel = librosa.feature.melspectrogram(y=audio, sr=16000, n_mels=64, fmax=4000)
log_mel = librosa.power_to_db(mel)

mfcc  = librosa.feature.mfcc(y=audio, sr=16000, n_mfcc=13)
delta = librosa.feature.delta(mfcc)
ddelt = librosa.feature.delta(mfcc, order=2)
features = np.concatenate([mfcc, delta, ddelt], axis=0)  # (39, T)

6. Train a baseline classifier

Start with the simplest formulation: binary healthy vs stressed. A CNN on log-Mel spectrograms gets you to 90%+ accuracy on the Kaggle Beehive Audio Dataset with a few thousand clips per class.

The single most important rule: never randomly split clips into train/val/test. Clips from the same hive in the same hour are nearly identical and cause severe leakage — your accuracy will look amazing and collapse in the field. Always use leave-hive-out cross-validation: hold out an entire hive's recordings for validation, rotate across at least 5 hives.

7. Deploy back to the Pi (optional)

Once trained, convert to TFLite and run on-device so only labels (not audio) leave the hive:

import tensorflow as tf
conv = tf.lite.TFLiteConverter.from_saved_model('beehive_model')
conv.optimizations = [tf.lite.Optimize.DEFAULT]
conv.target_spec.supported_types = [tf.int8]   # INT8 quantisation
open('beehive_model_int8.tflite', 'wb').write(conv.convert())

A quantised 1D-CNN on MFCC summaries fits in ~50-100 KB and infers in milliseconds on a Pi Zero 2 W. Run it from the cron script after each capture and POST the label to a small endpoint instead of uploading audio.

Hive placement

Bill of materials (per hive)

ItemNotes~AUD
Pi Zero 2 Wor ESP32-S3 for lower power$25
ICS-43434 / SPH0645I²S MEMS mic$3–8
microSD 32 GBClass 10$5
IP65 enclosure + cable glandsABS, PG7$7
GORE-TEX acoustic vent3 mm patch$1
Solar panel + Li-Ion + TP4056autonomous power$22
Desiccant packssilica gel$1
Total~$65

Where to go next