← Back to playback simulator

Beehive ML Training Simulator

A small neural network trained from scratch on synthesized beehive features. Step through dataset generation, forward pass, loss, backprop, and inference — with the math and parameters displayed live at every step.

What this is: a tiny 6-dim → 8 hidden → 4-class multilayer perceptron implemented from scratch in JavaScript. The training data isn't real audio — it's feature vectors sampled from class-specific Gaussian distributions matching the band-energy profiles of healthy / queenless / swarming / hissing hives. The goal is to make every part of the pipeline (data, weights, forward pass, loss, gradients, updates) visible and pokable. For real audio behaviour see the playback simulator.

1Build a training dataset

For each class, sample feature vectors from a Gaussian centred on a hand-tuned centroid that reflects that hive state's acoustic profile.

x = μc + ε, ε ~ 𝒩(0, σ²I)
μc is the class centroid (6 features), σ controls how noisy each sample is.

Class centroids:

60
0.08

2Initialize the model

x ∈ ℝ⁶ → W₁ ∈ ℝ⁸ˣ⁶ + b₁ → z₁ ∈ ℝ⁸ → ReLU → h ∈ ℝ⁸
→ W₂ ∈ ℝ⁴ˣ⁸ + b₂ → z₂ ∈ ℝ⁴ → softmax → ŷ ∈ ℝ⁴
Wij ~ 𝒰(−s, s), b = 0
Weights drawn uniformly in [−s, s]; biases start at zero.
0.30

3Forward pass — one sample, with every intermediate value

Generate the dataset and initialize the model first.

4Train — gradient descent with cross-entropy loss

L = −∑ yi · log(ŷi)
Cross-entropy: smaller when the predicted probability of the true class is closer to 1.
W ← W − η · ∂L/∂W
SGD update. For each sample, walk every weight a small step downhill on the loss surface.
0.080
Epoch
0
Train loss
Train accuracy

Loss curve:

Current weight matrices (live):

5Confusion matrix — where the model gets confused

Rows: true class · Columns: predicted class · Cells: sample count. Off-diagonal entries are misclassifications.

6Inference — classify a brand-new sample

Generate a fresh feature vector from a chosen true class, push it through the trained model, watch the output probabilities.

MThe math, in one place

All equations the simulator executes, in order:

z1 = W1x + b1, h = max(0, z1)
Hidden layer: linear projection followed by ReLU nonlinearity.
z2 = W2h + b2, ŷi = ez2,i / ∑j ez2,j
Output logits followed by softmax — turns scores into a probability distribution.
L = −∑ yi log ŷi
Cross-entropy loss with one-hot label y.
∂L/∂z2 = ŷ − y
The softmax+cross-entropy gradient simplifies beautifully — just the prediction minus the label.
∂L/∂W2 = (ŷ − y) h, ∂L/∂h = W2(ŷ − y)
Backprop through the output layer.
∂L/∂z1 = (∂L/∂h) ⊙ 𝟙[z1 > 0], ∂L/∂W1 = (∂L/∂z1) x
ReLU's derivative is 1 where z1 > 0 else 0 — gradient flows only through active units.
W ← W − η · ∂L/∂W, b ← b − η · ∂L/∂b
SGD weight update with learning rate η.