Skip to content

ROS Voice Assistant

Overview

ros_voice_assistant is a ROS 2 package that enables natural language interaction with a robot using advanced AI models. It integrates Google Text-to-Speech (TTS), OpenAI's Whisper for speech recognition (via Groq), and LLaMA 70B (via Groq) for natural language understanding. A Flask-based web interface allows users to record and send voice commands to the robot in real-time.

Features

βœ… Natural language voice interaction using large language models

πŸ—£οΈ Speech-to-text via Whisper

πŸ”Š Text-to-speech via Google TTS

🌐 Flask web interface for command input

🧠 Inference powered by Groq LPU accelerators

πŸ—ΊοΈ Location-aware commands using ROS maps

Installation

  1. Install ROS dependencies

Update your packages list and install the required packages:

sudo apt update
sudo apt install ros-$ROS_DISTRO-tf-transformations ffmpeg
  1. Git Clone

Download this repository into your workspace and build it:

cd src
git clone https://gitea.locker98.com/locker98/ros-voice-assistant.git
cd ros-voice-assistant
pip install -r requirements.txt
cd ../../
colcon build
  1. Export Groq API Key

You’ll need an API key from Groq:

echo "export GROQ_API_KEY=gsk_*******************************" >> ~/.bashrc
source ~/.bashrc
  1. Setting Up Arm If you want to run the Voice Assistant with a robotic arm from Universal Robot you have to install locker98_tools package from https://gitea.locker98.com/locker98/locker98_tools_ws.git.

Map File Structure

The maps directory should contain the following files, all sharing the same base name:

maps/
β”œβ”€β”€ rhodes_map.pgm     # Map image used by SLAM and navigation
β”œβ”€β”€ rhodes_map.yaml    # Metadata file with scale, origin, and image path
└── rhodes_map.json    # Location data (generated by map_node) used by voice assistant
  • The .pgm and .yaml files are standard output from SLAM Toolbox or map_server tools.
  • The .json file is generated by the map_node tool and is required for the voice_assistant.launch.py to load predefined location data with descriptions.

Running the Package

1. Launch the Voice Assistant

To start the voice assistant, use the following command with your custom namespace and map file:

ros2 launch voice_assistant_bringup voice_assistant.launch.py namespace:=a200_0000  location_file:=$HOME/husky_configs/maps/rhodes2_full.json

Note:
Standard SLAM maps typically include a .pgm image file and a .yaml metadata file. The .yaml file contains information such as the origin and resolution (scale) of the map image.

To keep naming consistent, the voice assistant expects a .json file that contains known location data. This file should share the same base name as the .pgm and .yaml files (e.g., rhodes2_full.pgm, rhodes2_full.yaml, and rhodes2_full.json).

Note: Browsers will not let you use a microphone on a non https website so you might need to follow these instructions here to fix it.


2. Manage Locations via Flask Interface

To interactively add or remove locations on the map, run the map_node. This uses the .yaml file from SLAM Toolbox as input:

ros2 run voice_assistant map_node --ros-args -p location_file:=$HOME/husky_configs/maps/rhodes2_full.yaml

Note:
The .yaml file provides the necessary information about the map’s origin and resolution, and points to the corresponding .pgm file.

The map_node tool will generate a .json file that contains location data and is compatible with the voice assistant. This .json file will have the same base name as your existing .pgm and .yaml files.

3. Launch the Voice Assistant Without Arm

To start the voice assistant without the robotic arm, use the following command with your custom namespace and map file:

ros2 launch voice_assistant_bringup voice_assistant.launch.py namespace:=a200_0000 location_file:=$HOME/husky_configs/maps/rhodes2_full.json enable_arm:=False

Note:
This allows you to run the Voice Assistant without installing the locker98_tools package.