Talk to Your Mesh: Hands‑Free Speech Commands for Meshtastic

November 8, 2025 · 7 min read

Co-creator of FoBE Studio

Why Speech Command?

Voice control is everywhere at home—lights, fans, assistants—but it’s still rare in offline, low‑bandwidth outdoor mesh.

Adding a local, on‑device Speech Command layer to Meshtastic means you can trigger actions by speaking—even with gloves on, hands busy, or eyes up in the field. Spoken intent becomes compact, low‑bandwidth commands, speeding up response and laying the groundwork for richer, domain‑specific voice interactions.

Challenge of Speech Command in Mesh

Unlike typical smart home devices—which rely on cloud-based speech recognition and constant power—Mesh devices are designed for outdoor, low-power scenarios, often with limited or no network connectivity. This brings several unique challenges:

Offline operation: Speech recognition must run locally on the device, so it works even without a network.
Low power: The system needs to operate with minimal energy to maximize battery life.
Environmental noise: Outdoor noise can impact recognition accuracy.
Limited compute resources: Mesh devices have constrained processing power and can't run large, complex models.

We needed a lightweight speech recognition module that runs efficiently under these constraints, while still delivering reliable accuracy and fast response. Based on these requirements, we built a demo version of Speech Command integration and tested it on Meshtastic devices.

How to Build It

The following sections detail how the Speech Command module was implemented and integrated into Meshtastic devices.

Hardware Selection

We chose the FoBE Mesh Tracker C1 as the test platform because it’s based on the nRF52840, giving enough compute headroom for a lightweight on‑device speech model while keeping excellent energy characteristics for outdoor use. Its flexible expansion interface lets us snap in a microphone module quickly for audio capture.

FoBE Mesh Tracker C1 and Flexible Microphone Module

Installation is trivial: insert the flex microphone board into the expansion connector and press to lock—no soldering, no wiring.

FoBE Mesh Tracker C1 Flexible Microphone Module Install

With that, the hardware platform is ready.

Software Integration

To validate feasibility we started with a small, universal speech command set rather than continuous speech: simple directional intents ("go", "backward", "left", "right", "up", "down") mapped to device actions.

We first needed a training dataset. We used Google’s Speech Commands v0.02 corpus—short, single‑word audio clips ideal for tiny, resource‑constrained models.

Download: Speech Commands v0.02

After extracting, each folder represents a label. We selected the six folders: go, backward, up, down, left, right as our initial intent set.

We needed a lightweight inference stack that runs on nRF52840 and supports fast iteration of command sets. Edge Impulse was the obvious choice: a full TinyML pipeline (ingest → preprocess → feature extraction → train → deploy) with export options for embedded targets.

Public project: arduino-voice-command

On the Edge Impulse project we uploaded the selected clips from Speech Commands and used built‑in preprocessing + feature extraction blocks.

Uploads can be done via the Edge Impulse CLI or the web UI. After ingestion we verified labels to ensure each sample maps to the correct intent class.

Edge Impulse Resources Upload

Next we performed MFCC feature extraction. Peak RAM usage was the primary constraint on nRF52840. We used 13 MFCC coefficients with tuned frame length / stride parameters to balance time resolution and memory footprint—enough discriminative power without blowing the buffer.

Edge Impulse MFCC Feature Extraction

We then trained a lightweight 1D CNN on the extracted features. Hyperparameter tuning focused on the trade‑off between model expressiveness and embedded latency so it can run real‑time on nRF52840 while maintaining strong accuracy.

Edge Impulse Neural Network settings

Parameter	Description	Value	Purpose
Number of training cycles	Epoch count	200	More passes improve learning; too many risk overfitting.
Learning rate	Step size for updates	0.0008	Lower = stable convergence; higher = faster but risk oscillation.
Training processor	Accelerator	GPU	Speeds up training substantially.
Validation set size	Hold‑out proportion	20%	Monitors generalization on unseen data.
Batch size	Samples per update	64	Balances stability vs. memory usage.
Auto-weight classes	Class balancing	Yes	Mitigates imbalance across labels.
Profile int8 model	INT8 quantization	Yes	Produces a compact, faster inference model.

Edge Impulse Neural network architecture

Layer	Type	Params	Notes
1	Input layer	637 features	Raw feature vector
2	Reshape layer	13 columns	Shape for temporal conv
3	1D conv / pool layer	16 filters, kernel=3	Local feature extraction
4	Dropout	rate=0.1	Regularization
5	1D conv / pool layer	32 filters, kernel=5	Deeper temporal patterns
6	Dropout	rate=0.25	Regularization
7	1D conv / pool layer	64 filters, kernel=7	Higher‑level semantic features
8	Dropout	rate=0.25	Regularization
9	Flatten layer	—	Collapse to 1D
10	Dense layer	48 neurons	Integrate features
11	Dropout	rate=0.25	Regularization
12	Dense layer	24 neurons	Compress representation
13	Dropout	rate=0.25	Regularization
14	Dense layer	12 neurons	Pre‑output layer
15	Output layer	6 classes	Final classification

Clicking Save & Train starts the job on Edge Impulse. The resulting metrics on both train and validation sets are strong, indicating the model reliably separates the six intents.

Edge Impulse Train Result

Overall performance

Accuracy: 95.0%
Loss: 0.20

This reflects high validation accuracy with a low loss—accurate and well‑converged.

Class	Accuracy	Notes
BACKWARD	93.1%	Minor 2.7% confusion with "GO"; overall solid.
DOWN	94.9%	Very good; minimal confusion.
GO	93.5%	Small confusion with "DOWN" / "UP" suggests similar patterns.
LEFT	95.3%	Clean and stable separation.
RIGHT	96.3%	Among the best performing classes.
UP	95.8%	Also very consistent.

Edge Impulse Build Resources

The built model uses about 71.6 KB flash and 7.4 KB RAM, which fits comfortably on the nRF52840.

Edge Impulse Build Library

Finally, export the model as an Arduino library and drop it into the Meshtastic firmware tree for integration.

For a working reference inside Meshtastic firmware, see our example variant: fobe_idea_mesh_tracker_c1_voice

Meshtastic Firmware Repository

In this variant we add microphone support and integrate the exported Edge Impulse model. By adjusting the main loop, the firmware listens for speech commands and dispatches actions—e.g., send a message or control local functions—when a target intent is detected.

What Next?

This demo validates on‑device Speech Command for Meshtastic under tight resource and power constraints. Next directions:

Richer command space: expand from 6 intents toward letters (A–Z), numerics (0–9), or semantic tags (status, location, SOS).
Adaptive models: dynamically load / prune command subsets by role or mission profile to keep RAM/flash overhead minimal.
Noise robustness: dual‑mic (beamforming), spectral denoise, multi‑window voting to resist wind and impact noise.
Multilingual & customization: user‑specific fine‑tuning or few‑shot on‑device adaptation for accent / language variants.
Reliability safeguards: confidence thresholds + optional fallback (e.g., button confirm when below margin).
Lower power: refined duty‑cycling, wake‑word gating before feature extraction, and peripheral clock scaling to extend battery life.

If you have better ideas, let us know on Discord or GitHub!

Why Speech Command?​

Challenge of Speech Command in Mesh​

How to Build It​

Hardware Selection​

Software Integration​

What Next?​