Skip to main content

Talk to Your Mesh: Hands‑Free Speech Commands for Meshtastic

· 7 min read
Chiho Sin
Co-creator of FoBE Studio

Why Speech Command?

Voice control is everywhere at home—lights, fans, assistants—but it’s still rare in offline, low‑bandwidth outdoor mesh.

Adding a local, on‑device Speech Command layer to Meshtastic means you can trigger actions by speaking—even with gloves on, hands busy, or eyes up in the field. Spoken intent becomes compact, low‑bandwidth commands, speeding up response and laying the groundwork for richer, domain‑specific voice interactions.

Challenge of Speech Command in Mesh

Unlike typical smart home devices—which rely on cloud-based speech recognition and constant power—Mesh devices are designed for outdoor, low-power scenarios, often with limited or no network connectivity. This brings several unique challenges:

  • Offline operation: Speech recognition must run locally on the device, so it works even without a network.
  • Low power: The system needs to operate with minimal energy to maximize battery life.
  • Environmental noise: Outdoor noise can impact recognition accuracy.
  • Limited compute resources: Mesh devices have constrained processing power and can't run large, complex models.

We needed a lightweight speech recognition module that runs efficiently under these constraints, while still delivering reliable accuracy and fast response. Based on these requirements, we built a demo version of Speech Command integration and tested it on Meshtastic devices.

How to Build It

The following sections detail how the Speech Command module was implemented and integrated into Meshtastic devices.

Hardware Selection

We chose the FoBE Mesh Tracker C1 as the test platform because it’s based on the nRF52840, giving enough compute headroom for a lightweight on‑device speech model while keeping excellent energy characteristics for outdoor use. Its flexible expansion interface lets us snap in a microphone module quickly for audio capture.

FoBE Mesh Tracker C1 and Flexible Microphone Module

Installation is trivial: insert the flex microphone board into the expansion connector and press to lock—no soldering, no wiring.

FoBE Mesh Tracker C1 Flexible Microphone Module Install

With that, the hardware platform is ready.

Software Integration

To validate feasibility we started with a small, universal speech command set rather than continuous speech: simple directional intents ("go", "backward", "left", "right", "up", "down") mapped to device actions.

We first needed a training dataset. We used Google’s Speech Commands v0.02 corpus—short, single‑word audio clips ideal for tiny, resource‑constrained models.

Download: Speech Commands v0.02

After extracting, each folder represents a label. We selected the six folders: go, backward, up, down, left, right as our initial intent set.

We needed a lightweight inference stack that runs on nRF52840 and supports fast iteration of command sets. Edge Impulse was the obvious choice: a full TinyML pipeline (ingest → preprocess → feature extraction → train → deploy) with export options for embedded targets.

Public project: arduino-voice-command

On the Edge Impulse project we uploaded the selected clips from Speech Commands and used built‑in preprocessing + feature extraction blocks.

Uploads can be done via the Edge Impulse CLI or the web UI. After ingestion we verified labels to ensure each sample maps to the correct intent class.

Edge Impulse Resources Upload

Next we performed MFCC feature extraction. Peak RAM usage was the primary constraint on nRF52840. We used 13 MFCC coefficients with tuned frame length / stride parameters to balance time resolution and memory footprint—enough discriminative power without blowing the buffer.

Edge Impulse MFCC Feature Extraction

We then trained a lightweight 1D CNN on the extracted features. Hyperparameter tuning focused on the trade‑off between model expressiveness and embedded latency so it can run real‑time on nRF52840 while maintaining strong accuracy.

Edge Impulse Neural Network settings

ParameterDescriptionValuePurpose
Number of training cyclesEpoch count200More passes improve learning; too many risk overfitting.
Learning rateStep size for updates0.0008Lower = stable convergence; higher = faster but risk oscillation.
Training processorAcceleratorGPUSpeeds up training substantially.
Validation set sizeHold‑out proportion20%Monitors generalization on unseen data.
Batch sizeSamples per update64Balances stability vs. memory usage.
Auto-weight classesClass balancingYesMitigates imbalance across labels.
Profile int8 modelINT8 quantizationYesProduces a compact, faster inference model.

Edge Impulse Neural network architecture

LayerTypeParamsNotes
1Input layer637 featuresRaw feature vector
2Reshape layer13 columnsShape for temporal conv
31D conv / pool layer16 filters, kernel=3Local feature extraction
4Dropoutrate=0.1Regularization
51D conv / pool layer32 filters, kernel=5Deeper temporal patterns
6Dropoutrate=0.25Regularization
71D conv / pool layer64 filters, kernel=7Higher‑level semantic features
8Dropoutrate=0.25Regularization
9Flatten layerCollapse to 1D
10Dense layer48 neuronsIntegrate features
11Dropoutrate=0.25Regularization
12Dense layer24 neuronsCompress representation
13Dropoutrate=0.25Regularization
14Dense layer12 neuronsPre‑output layer
15Output layer6 classesFinal classification

Clicking Save & Train starts the job on Edge Impulse. The resulting metrics on both train and validation sets are strong, indicating the model reliably separates the six intents.

Edge Impulse Train Result

Overall performance

  • Accuracy: 95.0%
  • Loss: 0.20

This reflects high validation accuracy with a low loss—accurate and well‑converged.

ClassAccuracyNotes
BACKWARD93.1%Minor 2.7% confusion with "GO"; overall solid.
DOWN94.9%Very good; minimal confusion.
GO93.5%Small confusion with "DOWN" / "UP" suggests similar patterns.
LEFT95.3%Clean and stable separation.
RIGHT96.3%Among the best performing classes.
UP95.8%Also very consistent.

Edge Impulse Build Resources

The built model uses about 71.6 KB flash and 7.4 KB RAM, which fits comfortably on the nRF52840.

Edge Impulse Build Library

Finally, export the model as an Arduino library and drop it into the Meshtastic firmware tree for integration.

For a working reference inside Meshtastic firmware, see our example variant: fobe_idea_mesh_tracker_c1_voice

Meshtastic Firmware Repository

In this variant we add microphone support and integrate the exported Edge Impulse model. By adjusting the main loop, the firmware listens for speech commands and dispatches actions—e.g., send a message or control local functions—when a target intent is detected.

What Next?

This demo validates on‑device Speech Command for Meshtastic under tight resource and power constraints. Next directions:

  • Richer command space: expand from 6 intents toward letters (A–Z), numerics (0–9), or semantic tags (status, location, SOS).
  • Adaptive models: dynamically load / prune command subsets by role or mission profile to keep RAM/flash overhead minimal.
  • Noise robustness: dual‑mic (beamforming), spectral denoise, multi‑window voting to resist wind and impact noise.
  • Multilingual & customization: user‑specific fine‑tuning or few‑shot on‑device adaptation for accent / language variants.
  • Reliability safeguards: confidence thresholds + optional fallback (e.g., button confirm when below margin).
  • Lower power: refined duty‑cycling, wake‑word gating before feature extraction, and peripheral clock scaling to extend battery life.

If you have better ideas, let us know on Discord or GitHub!