Connected Audio-based Threat Detection on Raspberry Pi

496

2025-03-07 | By Maker.io Staff

License: See Original Project Single Board Computers Raspberry Pi SBC

Copy-of-detection-result

Many smart devices can continuously monitor background noises and listen for threats like ‎breaking glass or barking dogs. While this feature can be convenient, many users have ‎privacy concerns and would rather not have corporations record and process every sound ‎they make. This article investigates an alternative if you want to pair the convenience of ‎automatic audio classification with more control over how your data is used and where it is ‎processed. The project utilizes a Raspberry Pi to detect potentially dangerous sounds and ‎communicate with other smart-home services to alert users.‎

Bill of Materials

This project utilizes the following components:‎

Qty Part

‎1 - ‎ Raspberry Pi 5

‎1‎ - Official Raspberry Pi 5 active cooling ‎fan

‎1‎ - Official Raspberry Pi 5 power supply

‎1‎ - Compatible USB microphone

Defining the Problem

Before continuing, you should understand the basic machine learning process and the ‎standard ML terminology. After covering the basics, the next step is to specify the problem ‎the model should solve, and the data required to train it. This project targets specific ‎audible threats with distinct and unique sounds:‎

Gunshots
Glass shatter sounds
Dog barks
Sirens

In theory, the system could detect numerous types. However, each additional category ‎dramatically increases the required data and model training time. The resulting model may ‎also become too complex, resulting in performance issues on embedded systems. ‎Complicated models with too many labels are often more prone to errors or blurry ‎boundaries between predicted classes. Therefore, focusing on a few common scenarios is ‎a practical approach.‎

Finally, define evaluation metrics to assess the trained model's performance and the target ‎accuracy you want to achieve. This project uses a confusion matrix with F1 scores. Usually, ‎an accuracy of around 90% is desirable. Everything below might miss or misclassify too ‎many critical events. Lower accuracy risks missing critical events, while overly accurate ‎models may overfit, detecting only samples very similar to the training data.‎

Acquiring and Labeling Training Data

‎Machine learning requires a significant number of samples for training. This project utilizes ‎supervised learning, so the data must also be labeled accordingly. You also need to provide ‎negative samples to teach the system what it should not detect. In this instance, it should ‎ignore regular background noises and non-threatening household sounds like clinging ‎cutlery.‎

Pre-labeled datasets can be sourced from many platforms. For example, the Google ‎AudioSet library provides annotated audio events with links to YouTube videos that include ‎the samples and the appropriate timestamps. Kaggle is another popular source for ML ‎datasets, and this project uses samples from this classifier dataset. For better specificity, ‎the background should be recorded at home.‎

Connecting and Configuring a USB Microphone

‎This project uses a Raspberry Pi with a USB microphone to acquire accurate background ‎noise samples for training and inference. Start by connecting the microphone, and then ‎verify that the system detects the device using one of the following commands:‎

Copy Code

lsusb

arecord -l

The system should detect the connected device without additional measures, as shown in ‎the following screenshot:‎

Copy-of-usb-microphone-detection

Make sure that the Pi detects the USB microphone.‎

Building the Training and Test Data Sets on Edge Impulse

Start by creating a new free Edge Impulse account. Then, create a new project and navigate ‎to its data page by using either the button on the dashboard or the side navigation bar:‎

Copy-of-add-data-1

Click one of the highlighted buttons to add data to an Edge Impulse project.‎

On the data page, click the plus button and then use the upload option to add existing ‎clips:‎

Copy-of-add-data-2

Use the highlighted buttons to upload existing audio clips for model training and testing.‎

Following these steps opens the upload dialog, where you can specify how the service ‎processes and labels the files. For this project, upload 100 samples from each category ‎‎(e.g., barking, sirens). Select the option to automatically split the samples into training and ‎test data for even distribution. Finally, ensure you enter the target label for each category:‎

Copy-of-add-data-3

Configure the data uploader, as shown in this screenshot. Don’t forget to set the correct ‎label.‎

After repeating these steps for each label, the data panel should look similar to this:‎

Copy-of-add-data-4

Verify that the samples look correct and are labeled as expected.‎

Click through some of the samples and verify that they are labeled correctly. Also, double-‎check that the data were split into the training and test sets at approximately an 80-20 ratio.‎

Recording Samples Using the Raspberry Pi

Edge Impulse offers a companion application that lets the Raspberry Pi connect to the ‎cloud platform to collect data and interact with the trained model. To get started, install at ‎least version 20 of NodeJS – a JavaScript runtime environment – along with the required ‎dependencies by typing:‎

Copy Code

sudo apt update

curl -sL https://deb.nodesource.com/setup_20.x | sudo bash -‎

sudo apt install -y gcc g++ make build-essential nodejs sox gstreamer1.0-tools ‎gstreamer1.0-plugins-good gstreamer1.0-plugins-base gstreamer1.0-plugins-base-‎apps

Once the process finishes, install the Edge Impulse app with:‎

Copy Code

sudo npm install edge-impulse-linux -g --unsafe-perm

Finally, launch the application with the following command:‎

Copy Code

edge-impulse-linux --disable-camera

Deactivate the unneeded camera to prevent the program from throwing an error. When first ‎launched, the app asks for your email address and password, and you’ll also have to select ‎the microphone it should use for sampling:

Copy-of-companion-first-launch

‎This screenshot shows the initial configuration steps when first running the companion ‎app.‎

The Raspberry Pi should now appear in the connected devices section of the website:‎

Copy-of-raspberry-pi-device

Verify that the Pi is connected to Edge Impulse.‎

Further, you can use the Pi to collect audio samples from the data acquisition tab:‎

Copy-of-pi-collect-data

Record background noise samples using the Pi and the USB microphone.‎

After setting the target label to “harmless,” you can use the sampling button to record five-‎second background noise samples. Make sure to collect around three minutes of sounds ‎that usually occur naturally in your home. Keep the sounds as natural as possible. Do not ‎exaggerate any of them, as the ML model will use these samples to learn what normal ‎background noise sounds like in your dwelling. Don’t forget to transfer some harmless ‎audio clips to the test set.‎

Preprocessing the Audio Samples

‎Building an ML model necessitates transforming the audio samples into a format the ‎computer can understand. The samples all have slightly varying lengths, which makes ‎them unsuitable for most neural networks. Neural networks — the ML model we want to ‎build in this project — usually require fixed-length inputs. There are numerous ways to fulfill ‎this requirement when dealing with data of different lengths. A typical approach involves ‎padding shorter samples and splitting larger ones.‎

Since we don't have many samples, we can further increase the number of training samples ‎by splitting each audio clip into shorter pieces. Overlapping the resulting pieces results in ‎higher training accuracy since it preserves some of the coherence of the snippets. However, ‎it's vital not to overdo the snipping to a point where the samples become too short to ‎represent the target label.‎

Lastly, the data must be transformed into numeric information that the computer can ‎understand. This project's preprocessing step converts each sample's audible information ‎to the frequency domain using fast Fourier transformation (FFT). Plotting the frequencies of ‎each sample point across the entire length of an audio snippet results in a spectrogram – a ‎visual representation of the characteristic frequencies over time. The neural network uses a ‎matrix representation of this information to learn characteristic patterns in the samples and ‎associate them with the target labels.‎

Creating the Training Pipeline in Edge Impulse

‎Edge Impulse hides most of the complexity from its users, and all that's left to do is create ‎an impulse with the preprocessing tasks and model description. Click the "create impulse" ‎option in the side toolbar to get started.‎

The first block is already there, and it can't be changed. It represents the time-series audio ‎data. To create overlapping windows, set the window size to 1000ms and the stride to ‎‎500ms. Check the zero-pad data checkbox to ensure shorter samples are padded to the ‎target length.‎

Copy-of-create-impulse

Follow the highlighted steps to build the impulse.‎

Then, add an MFE processing block. This unit takes the uniform snippets, performs FFT, ‎and outputs its characteristic spectrogram. Lastly, add a neural network classifier block ‎and save the impulse.‎

Generating the Training Features‎

The next step requires extracting features from the samples for model training using the ‎MFE block. Navigate to the MFE settings using the side toolbar and make sure that the ‎‎“Parameters” tab is selected on the top of the MFE page:‎

Copy-of-create-spectra

This image illustrates how to extract training features from the audio samples.‎

This page lets you adjust multiple settings of the preprocessing step. You may need to ‎change the filter number and FFT length parameters if no spectrogram is visible on the right. ‎These spectra show the characteristic patterns of certain sounds, and you can use the ‎controls in the top-right corner of the raw data panel to inspect different labels. When you’re ‎done, use the button to save the parameters and navigate to the feature-generation tab at ‎the top of the page:‎

Copy-of-feautre-generation

This screenshot shows the feature extraction results.‎

Click the generate button to extract the features. The resulting 2D plot on the right-hand side ‎of the page shows how well the features group the samples in each category into distinct ‎clusters. In this instance, the separation is imperfect, and there is some overlap between ‎the gunshot samples and barking. However, the samples cluster nicely for the most part, ‎which should result in overall acceptable model accuracy.‎

Training the Machine Learning Model

Use the left sidebar again to navigate to the classifier settings and training page. The default ‎settings will work fine for this project, and since the sample size is relatively small, training ‎even 100 epochs should only take a short time. Use the button at the bottom of the page to ‎save the settings and start the training process:‎

Copy-of-start-training

Follow the steps shown in this image to start training the neural network.‎

After training, the page shows the confusion matrix of all target labels together with the ‎accuracy of all possible combinations and the F1-score of each target label:‎

Copy-of-training-output

Verify that the trained model performs sufficiently well for your use case.‎

As suspected, there is a significant overlap between the barking and gunshot sounds, and ‎the model performs relatively poorly in differentiating the two classes. However, what is ‎essential is that the model separates harmless sounds from any other harmful ones with ‎an accuracy of almost 90%. That accuracy is sufficient for this application, especially given ‎the small number of training samples. Introducing more (or better) samples, primarily ‎gunshot sounds and clips of barking dogs, could increase the accuracy.‎

Testing, Tweaking, and Deploying the Model

‎After training, the model is ready to classify new audio samples recorded by the Raspberry ‎Pi microphone. However, an optional step can help tweak the model's type-1 and type-2 ‎error rates. Doing so adjusts whether the system is more prone to false or missed alarms. ‎In this case, we prefer the system to be cautious: It should rather suspect something is ‎happening than miss any potentially dangerous situations. Select the performance ‎calibration option in the sidebar and then set the background noise label option to use the ‎‎“harmless” label. Finally, click the green start button to run the automated test:‎

Copy-of-generate-performance-config

Configure the automated test, as shown in this image. Click the green button to start the ‎testing process.‎

During this test, Edge Impulse generates new audio samples of realistic background noise ‎and overlays them with various samples from the validation set. The framework then ‎records how often the model’s predictions are accurate. After each run, it tweaks some of ‎the model’s options, and it presents users with settings to choose from after testing ‎concludes:‎

Copy-of-save-performance-config

Select the tweaked settings that best match your expectations.‎

Note that this model performs rather poorly on average on unseen data. That is likely due to ‎the reduced number of training samples and the fact that the training background noise ‎was not very varied. Therefore, the system often mistakenly detects threats when tested ‎with different background ambiance.‎

Select the point that minimizes the false rejection rate (the y-axis) and maximizes the false ‎activation rate (the x-axis) to tweak the model to be overly cautious rather than too relaxed. ‎Then, click the save button and navigate to the deployment tab using the sidebar.‎

On the model deployment page, select Linux (ARMv7) as the target platform for deployment ‎on the Raspberry Pi. Then, select the quantized model to reduce the complexity, model size, ‎and resource requirements for inference on embedded devices. Finally, click the build ‎button. The resulting model is automatically downloaded to your computer so you can ‎transfer the file to the Raspberry Pi.‎

Copy-of-model-deployment-1

Follow the steps outlined in this screenshot to build and deploy the model.‎

Alternatively, you can also download the model directly to the Pi using the companion app ‎from before:‎

Copy Code

edge-impulse-linux-runner --download audio_detector_model.eim

Performing Inference Offline

‎After deploying and downloading the model, the Raspberry Pi can perform inference ‎without an active Internet connection. Local inference reduces lag and ensures that audio ‎samples are not uploaded to remote servers, giving users complete control over their data ‎and privacy.‎

Edge Impulse offers multiple SDKs for high-level programming languages like Python and ‎C++. The Python SDK requires Python 3.7 or newer, and it also needs some external ‎libraries, which can be installed using the following command:‎

Copy Code

sudo apt install libatlas-base-dev python3-pyaudio portaudio19-dev

Then, use the Python package manager to install a few additional libraries:‎

Copy Code

pip3 install opencv-python

Once these processes finish, the SDK can be installed using pip:‎

Copy Code

pip3 install edge_impulse_linux -i https://pypi.python.org/simple

The following simple Python program uses the trained ML model to continuously record ‎short samples using the first audio input device and perform inference on the snippet. It ‎then prints the inference result to the console if the program detects a potentially ‎threatening sound:‎

Copy Code

import os

from edge_impulse_linux.audio import AudioImpulseRunner

runner = None

device = 1‎

model = 'audio_detector_model.eim'‎

def label_detected(label_name, certainty):‎

‎ print('Detected %s with certainty %.2f\n' % (label_name, certainty), end='')‎

‎ # TODO: Perform other actions if required‎

def setup():‎

‎ dir_path = os.path.dirname(os.path.realpath(__file__))‎

‎ return os.path.join(dir_path, model)‎

def teardown():‎

‎ if (runner):‎

‎ runner.stop()‎

if name == '__main__':‎

‎ model_file = setup()‎

‎ with AudioImpulseRunner(model_file) as runner:‎

‎ try:‎

‎ model_info = runner.init()‎

‎ labels = model_info['model_parameters']['labels']‎

‎ for res, audio in runner.classifier(device_id=device):‎

‎ for label in labels:‎

‎ score = res['result']['classification'][label]‎

‎ if score > 0.5:‎

‎ label_detected(label, score)‎

‎ except RuntimeError:‎

‎ print('Error')‎

‎ finally:‎

‎ teardown()

‎

The code starts by importing the Edge Impulse SDK. It contains a few variables that hold the ‎inference runner object, the microphone ID, and the model file name.‎

The code defines some custom helper functions. The label_detected function outputs the ‎prediction result and the certainty. However, it could perform additional actions, such as ‎sending messages to an existing smart home setup. The setup function finds the Python ‎script's parent folder and appends the model file name to build a full path. Lastly, the ‎teardown method stops the model runner to release system resources when the app quits.‎

The main method calls the setup function on startup. It then creates a new ‎AudioImpulseRunner object for performing inference locally. Within the try block, the ‎application initializes the runner and loads all available labels. It then iterates all possible ‎labels and checks whether the model detected any of them. If the model reports a detection ‎with a certainty value over 50%, the script calls the label_detected helper with the label ‎name and certainty.‎

Interfacing With Smart Home Platforms Using MQTT

This final section of the project discusses how to use MQTT — a popular communication ‎protocol for exchanging data between IoT and smart-home devices — to publish the ML ‎model’s predictions to an MQTT broker. This broker can then relay the messages to other ‎smart home platforms like Apple HomeKit or Amazon Alexa. It’s recommended that you ‎familiarize yourself with the basics of MQTT on the Raspberry Pi if all of this is new to you.‎

You can use an existing MQTT broker and publish messages directly to that one or set up a ‎new one on the Raspberry Pi, for example, using Mosquitto. To set up a new broker, start by ‎installing the following packages on the Pi:‎

Copy Code

sudo apt install mosquitto mosquitto-clients

The MQTT broker should automatically start up and be ready to accept clients and requests. ‎Next, install the paho-mqtt library for Python by typing:‎

Copy Code

pip3 install paho-mqtt

This SDK lets the Python program publish messages to the MQTT broker whenever the ML ‎model detects a dangerous situation. Start by adding the following import statement at the ‎start of the Python script:‎

Copy Code

import paho.mqtt.client as mqtt

Next, define the following variables for the MQTT broker, the client, and the topic label:‎

Copy Code

client = None

broker_address = 'localhost'‎

broker_port = 1883‎

out_topic = "pi/threat_detected"‎

threat_detected = "false"

Adjust the broker address if your project communicates with an external MQTT broker that is ‎not running on the Pi. Afterward, expand the setup function by adding the MQTT client setup ‎code so that it looks as follows:‎

Copy Code

def setup():‎

‎ global client

‎ client = mqtt.Client()‎

‎ client.on_connect = mqtt_connected

‎ client.on_message = mqtt_message_received

‎ client.connect(broker_address, broker_port, 60)‎

‎ client.loop_start()‎

‎ dir_path = os.path.dirname(os.path.realpath(__file__))‎

‎ return os.path.join(dir_path, model)‎

The updated code now additionally creates a client object and stores it in the global client ‎variable. It also registers two callback functions to handle incoming MQTT requests. The ‎event handlers look as follows:‎

Copy Code

def mqtt_connected(client, userdata, flags, rc):‎

‎ client.subscribe("pi/reset")‎

‎ client.publish(out_topic, threat_detected)‎

def mqtt_message_received(client, userdata, msg):‎

‎ global threat_detected

‎ if msg.topic == "pi/reset":‎

‎ threat_detected = "false"‎

‎ client.publish(out_topic, threat_detected)

‎The mqtt_connected callback is called whenever a connection starts. It subscribes to the ‎pi/reset topic to let users or other devices reset a previously triggered alarm. The function ‎then publishes the initial trigger state. The mqtt_message_received helper function handles ‎incoming messages. In this instance, it only resets the threat_detected flag and publishes ‎the result.‎

The teardown method also needs to be adjusted to include a client disconnect call:‎

Copy Code

def teardown():‎

‎ if (client):‎

‎ client.loop_stop()‎

‎ client.disconnect()‎

‎ if (runner):‎

‎ runner.stop()‎

Finally, you can publish messages to the MQTT broker by expanding the label_detected ‎function like so:‎

Copy Code

def label_detected(label_name, certainty):‎

‎ global threat_detected

‎ threat_detected = "true"‎

‎ client.publish(out_topic, threat_detected)

The updated function sends the threat_detected flag and sends it to the MQTT broker using ‎the publish method.‎

Relaying MQTT Messages to Apple HomeKit

‎I finished the project by relaying the MQTT messages sent to the broker on the Pi to my ‎HomeKit environment. Doing so requires a mapping layer that lets the Pi interface with ‎Apple’s infrastructure. To get started, add the Homebridge repository and its GPG key to the ‎package manager:‎

Copy Code

curl -sSfL https://repo.homebridge.io/KEY.gpg | sudo gpg --dearmor | sudo tee ‎‎/usr/share/keyrings/homebridge.gpg > /dev/null‎

echo "deb [signed-by=/usr/share/keyrings/homebridge.gpg] ‎https://repo.homebridge.io stable main" | sudo tee ‎‎/etc/apt/sources.list.d/homebridge.list > /dev/null‎

Then, install Homebridge on the Raspberry Pi by typing the following commands:‎

Copy Code

sudo apt-get update

sudo apt-get install homebridge

Once installed, the setup process can be finished by visiting the Homebridge UI using a web ‎browser to navigate to http://raspberrypi.local:8581/. Homebridge needs a plugin to ‎translate MQTT messages into control elements HomeKit can understand. Navigate to the ‎plugins section in the web UI and install the homebridge-mqttthing plugin:‎

Copy-of-homebridge-plugin

Install the plugin shown in this screenshot.‎

Once the process finishes, add the following accessories to the global JSON configuration ‎data to the plugin to make it listen to the topic published by the Python program:‎

Copy Code

‎"accessories": [‎

‎ {‎

‎ "type": "leakSensor",‎

‎ "name": "Potential Threat",‎

‎ "url": "mqtt://localhost:1883",‎

‎ "topics": {‎

‎ "getLeakDetected": "pi/threat_detected"‎

‎ },‎

‎ "accessory": "mqttthing"‎

‎ },‎

‎ {‎

‎ "type": "switch",‎

‎ "name": "Reset Threat Detection",‎

‎ "url": "mqtt://localhost:1883",‎

‎ "topics": {‎

‎ "getOn": "false",‎

‎ "setOn": "pi/reset"‎

‎ },‎

‎ "accessory": "mqttthing"‎

‎ }‎

‎ ],‎

This snippet configures the mqttthing plugin installed earlier. It defines the MQTT broker ‎address and gives the accessories recognizable names displayed in the HomeKit app. Next, ‎it states which topics to use to obtain the threat detection flag and reset the alarm. After ‎restarting the Homebridge service, the accessories page should show two new widgets, ‎and, when triggered, the Potential Thread accessory should indicate a detection:‎

Copy-of-homebridge-dashboard

After I added the device to HomeKit and played a dog barking sound on my phone, the app ‎showed that the model detected a potential threat.‎

Summary

‎Developing an ML model starts with acquiring suitable training data and labels if ‎necessary. Training data sets are available from many sources, but augmenting the data ‎with self-collected samples helps build more specific models.‎

Using Edge Impulse, training a complex model is as easy as uploading the data, defining a ‎new impulse, and adding processing and classification blocks. This project uses FFT to ‎translate audible information into a format a neural network can understand before training ‎the model. The resulting model is then fine-tuned by selecting a configuration that favors ‎false alarms over missed alarms.‎

The Edge Impulse Python SDK enables offline inference to protect users' privacy. The ‎predicted labels are then broadcast to other parts of a home automation system using ‎MQTT and Homebridge.

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.