EatAble - Voice-Controlled Robotic Assistant

What is EatAble?

EatAble is a voice-controlled robotic assistant designed to empower people with upper limb disabilities to eat independently — restoring dignity, freedom, and equality through accessible AI and robotics.

At its core, EatAble allows a person to simply say what they want to eat, and the robot will find that item on the table and gently feed them.

"We don't just build robots. We build freedom — one meal at a time."

The Mission

Millions of people live with physical disabilities that make daily tasks — even eating — a challenge. For many, this means:

Relying on others for basic needs
Losing a sense of independence
Feeling isolated in everyday life

EatAble is designed to help restore independence and dignity using affordable, real-time technology that can work at home, in hospitals, or in care centers.

🎯

Voice-Controlled

Simply say what you want to eat. Natural language commands make it intuitive and accessible.

🤖

AI-Powered Vision

Multi-view camera system with SmolVLA model for robust object detection and manipulation.

💪

Restore Independence

Empower people with upper limb disabilities to eat independently and regain dignity.

🏥

Versatile Deployment

Works at home, in hospitals, rehabilitation centers, and care homes.

How It Works

Example Interaction:

User: "I want to eat beef."
✅ Robot detects beef
✅ Picks it up using robotic arm
✅ Brings it to the user's mouth

The Complete Pipeline

Voice Input: User speaks natural language command (e.g., "I want to eat carrots")
Speech Recognition: Google Speech Recognition API converts audio to text
Intent Understanding: OpenAI GPT-4o-mini with structured output parsing determines action intent
Voice Feedback: ElevenLabs voice model provides natural voice responses
Task Execution: Robot receives task instruction and:
- Captures multi-view observations from cameras
- SmolVLA model selects action based on visual observations and task
- Executes manipulation with custom action threshold detection
- Continues until task complete or timeout (45 seconds)

Technology Stack

AI & Machine Learning

SmolVLA (Small Vision-Language-Action) Policy: Pre-trained model from LeRobot Hugging Face
Base Model: lerobot/smolvla_base trained on teleoperated demonstrations
Training Infrastructure: AMD Instinct™ MI300X GPU on AMD Developer Cloud
Training: 40,000 steps with batch size 64 (approximately 8 hours)

Voice & Language

Speech Recognition: Google Speech Recognition API
Intent Understanding: OpenAI GPT-4o-mini with structured output parsing
Voice Synthesis: ElevenLabs multilingual voice model

Robotics Framework

LeRobot: Hugging Face framework for robot learning
Multi-view Camera System: 3 cameras for robust visual perception
Custom Action Threshold Detection: Automatic task completion detection

🧠

SmolVLA Model

Vision-language-action model combining visual understanding with robotic control.

🎤

Natural Language

Voice commands with flexible expression - no memorization needed.

👁️

Multi-View Vision

Three-camera system for robust object detection across different angles.

⚡

AMD GPU Powered

Trained on AMD Instinct™ MI300X GPU for high-performance inference.

Use Cases

👩‍🦽 Supporting people with upper limb disabilities
🧓 Assisting elderly individuals who struggle with mobility
🏥 Deploying in hospitals, rehabilitation centers, and care homes
🏠 Enabling independent living at home with affordable robotics

Voice Commands

EatAble understands natural language in various forms:

Feeding Requests:

"I want to eat carrots"
"Feed me"
"I'm hungry"
"Can you feed me?"
"I want to eat beef"
"I feel like eating vegetables"

General Conversation:

Questions about menu
Casual chat
Inquiries about capabilities

Exit Requests:

"exit", "quit", "goodbye"
"I'm done", "I'm full", "that's enough"

Architecture

Training Pipeline

Dataset Collection: Teleoperated demonstrations captured with multi-view cameras
Model Training: Fine-tuned SmolVLA on AMD Instinct™ MI300X GPU
Model Deployment: Trained model pushed to Hugging Face Hub

Inference Pipeline

Modular Architecture: Separated voice assistant, robot control, and model inference
Dummy Mode: Supports testing without physical robot hardware
Configurable Parameters: Camera indices, robot ports, model parameters
Real-time Processing: Continuous action monitoring with automatic completion detection

Generalization & Flexibility

Task Generalization:

SmolVLA model can be extended to various manipulation tasks beyond feeding
Multi-view camera system provides robust perception across different setups
Natural language interface allows flexible expression

Hardware Portability:

Standard LeRobot framework
Adaptable to different robot platforms
Configurable for various camera and robot setups

Explore EatAble

EatAble is open source. Explore our implementation and contribute to building accessible robotics technology.

GitHub Repo 🎥 Watch Demo

Our Hackathon Journey

EatAble won 3rd Prize 🥉 at the AMD Robotics Hackathon 2025! Read about our journey building this voice-controlled robotic assistant and how we're helping restore independence through accessible AI and robotics.

Read the full story →

The Team

This project was built by the amazing Tihado team for the AMD Robotics Hackathon 2025, winning 3rd Prize 🥉:

Hanh Thi-Hong Tran

Tien Ngoc Viet

Tan Nhat Linh LE

Phuong Nhi Chau

From a meal... to a life of independence.

Winner: 3rd Prize 🥉 at AMD Robotics Hackathon 2025

View on GitHub • Watch Demo