Beehive ML Training Simulator

A small neural network trained from scratch on synthesized beehive features. Step through dataset generation, forward pass, loss, backprop, and inference — with the math and parameters displayed live at every step.

What this is: a tiny 6-dim → 8 hidden → 4-class multilayer perceptron implemented from scratch in JavaScript. The training data isn't real audio — it's feature vectors sampled from class-specific Gaussian distributions matching the band-energy profiles of healthy / queenless / swarming / hissing hives. The goal is to make every part of the pipeline (data, weights, forward pass, loss, gradients, updates) visible and pokable. For real audio behaviour see the playback simulator.

1Build a training dataset

For each class, sample feature vectors from a Gaussian centred on a hand-tuned centroid that reflects that hive state's acoustic profile.

x = μ_c + ε, ε ~ 𝒩(0, σ²I)

μ_c is the class centroid (6 features), σ controls how noisy each sample is.

Class centroids:

Samples / class 60

Noise σ 0.08

2Initialize the model

x ∈ ℝ⁶ → W₁ ∈ ℝ⁸ˣ⁶ + b₁ → z₁ ∈ ℝ⁸ → ReLU → h ∈ ℝ⁸
→ W₂ ∈ ℝ⁴ˣ⁸ + b₂ → z₂ ∈ ℝ⁴ → softmax → ŷ ∈ ℝ⁴

W_ij ~ 𝒰(−s, s), b = 0

Weights drawn uniformly in [−s, s]; biases start at zero.

Init scale s 0.30

3Forward pass — one sample, with every intermediate value

Generate the dataset and initialize the model first.

4Train — gradient descent with cross-entropy loss

L = −∑ y_i · log(ŷ_i)

Cross-entropy: smaller when the predicted probability of the true class is closer to 1.

W ← W − η · ∂L/∂W

SGD update. For each sample, walk every weight a small step downhill on the loss surface.

Learning rate η 0.080

Epoch

Train loss

—

Train accuracy

—

Loss curve:

Current weight matrices (live):

5Confusion matrix — where the model gets confused

Rows: true class · Columns: predicted class · Cells: sample count. Off-diagonal entries are misclassifications.

6Inference — classify a brand-new sample

Generate a fresh feature vector from a chosen true class, push it through the trained model, watch the output probabilities.

MThe math, in one place

All equations the simulator executes, in order:

z₁ = W₁x + b₁, h = max(0, z₁)

Hidden layer: linear projection followed by ReLU nonlinearity.

z₂ = W₂h + b₂, ŷ_i = e^z_2,i / ∑_j e^z_2,j

Output logits followed by softmax — turns scores into a probability distribution.

L = −∑ y_i log ŷ_i

Cross-entropy loss with one-hot label y.

∂L/∂z₂ = ŷ − y

The softmax+cross-entropy gradient simplifies beautifully — just the prediction minus the label.

∂L/∂W₂ = (ŷ − y) h^⊤, ∂L/∂h = W₂^⊤(ŷ − y)

Backprop through the output layer.

∂L/∂z₁ = (∂L/∂h) ⊙ 𝟙[z₁ > 0], ∂L/∂W₁ = (∂L/∂z₁) x^⊤

ReLU's derivative is 1 where z₁ > 0 else 0 — gradient flows only through active units.

W ← W − η · ∂L/∂W, b ← b − η · ∂L/∂b

SGD weight update with learning rate η.