Beehive ML Training Simulator
What this is: a tiny 6-dim → 8 hidden → 4-class multilayer perceptron implemented from scratch in JavaScript. The training data isn't real audio — it's feature vectors sampled from class-specific Gaussian distributions matching the band-energy profiles of healthy / queenless / swarming / hissing hives. The goal is to make every part of the pipeline (data, weights, forward pass, loss, gradients, updates) visible and pokable. For real audio behaviour see the playback simulator.
1Build a training dataset
x = μc + ε, ε ~ 𝒩(0, σ²I)
μc is the class centroid (6 features), σ controls how noisy each sample is.
2Initialize the model
x ∈ ℝ⁶
→ W₁ ∈ ℝ⁸ˣ⁶ + b₁ →
z₁ ∈ ℝ⁸
→ ReLU →
h ∈ ℝ⁸
→ W₂ ∈ ℝ⁴ˣ⁸ + b₂ → z₂ ∈ ℝ⁴ → softmax → ŷ ∈ ℝ⁴
→ W₂ ∈ ℝ⁴ˣ⁸ + b₂ → z₂ ∈ ℝ⁴ → softmax → ŷ ∈ ℝ⁴
Wij ~ 𝒰(−s, s), b = 0
Weights drawn uniformly in [−s, s]; biases start at zero.
3Forward pass — one sample, with every intermediate value
4Train — gradient descent with cross-entropy loss
L = −∑ yi · log(ŷi)
Cross-entropy: smaller when the predicted probability of the true class is closer to 1.
W ← W − η · ∂L/∂W
SGD update. For each sample, walk every weight a small step downhill on the loss surface.
Epoch
0
Train loss
—
Train accuracy
—
5Confusion matrix — where the model gets confused
6Inference — classify a brand-new sample
MThe math, in one place
z1 = W1x + b1, h = max(0, z1)
Hidden layer: linear projection followed by ReLU nonlinearity.
z2 = W2h + b2, ŷi = ez2,i / ∑j ez2,j
Output logits followed by softmax — turns scores into a probability distribution.
L = −∑ yi log ŷi
Cross-entropy loss with one-hot label y.
∂L/∂z2 = ŷ − y
The softmax+cross-entropy gradient simplifies beautifully — just the prediction minus the label.
∂L/∂W2 = (ŷ − y) h⊤, ∂L/∂h = W2⊤(ŷ − y)
Backprop through the output layer.
∂L/∂z1 = (∂L/∂h) ⊙ 𝟙[z1 > 0], ∂L/∂W1 = (∂L/∂z1) x⊤
ReLU's derivative is 1 where z1 > 0 else 0 — gradient flows only through active units.
W ← W − η · ∂L/∂W, b ← b − η · ∂L/∂b
SGD weight update with learning rate η.