Robotic drawing
exploring interaction through gesture-based human–robot collaboration
with ABB-IRB120
This project explores gesture-based human–robot interaction through an “air drawing” system that translates hand movements into robotic drawings. Using an Intel RealSense camera, MediaPipe hand tracking, Rhino/Grasshopper, ROS, and a robotic arm, the system captures gestures performed in mid-air and transforms them into physical marks on paper.
The project developed through three stages: direct gesture replication, gesture interpretation, and finally robotic response generation. Alongside real-time robotic drawing, we experimented with AI-generated interpretations of gestures using a language model pipeline.
The physical setup was designed to create a direct relationship between human gesture and robotic output. An Intel RealSense camera was mounted overhead to capture hand movements within a defined interaction zone. In front of the robot, a drawing surface was positioned to receive the translated gestures as physical marks.
To support drawing, we designed and fabricated a custom 3D-printed end effector capable of holding water brushes. In later iterations, this evolved into a three-pen holder that allowed the robot to produce multiple simultaneous marks, introducing variation and visual differentiation into the output.
The system connects perception, geometric processing, and robotic actuation through a distributed ROS workflow. Hand gestures are captured using MediaPipe hand tracking and converted into 3D landmark coordinates through the RealSense camera feed. These coordinates are published as ROS topics from a Linux machine.
A second machine running Rhino and Grasshopper subscribes to the incoming data, where the gesture paths are transformed, scaled, and smoothed into trajectories suitable for robotic execution. The processed paths are then transmitted to the robotic arm as motion commands.
In later iterations, we integrated a Claude API to experiment with AI-generated responses to the captured gestures, extending the workflow beyond direct replication into interpretation and response.
Test 01: Realsense + python script
Test 02: real sense + python in linux
Demo setup
Development progressed through a series of iterative prototypes. The initial phase focused on establishing a functional perception pipeline using MediaPipe hand tracking and real-time visualization of gesture paths.
We implemented a simple gesture-based state machine to structure the interaction:
“Victory” gesture to begin drawing
“Pinch” gesture to record points
“Thumbs Up” gesture to end the sequence
Once the perception system was stable, we expanded the workflow to include ROS communication, Grasshopper-based geometry processing, and robotic execution. Additional experiments explored smoothing strategies, path transformations, polar arrays, and AI-mediated responses.
Throughout the process, the project evolved from simple gesture replication toward a more collaborative interaction model between human input, computational interpretation, and robotic output.
The final system successfully demonstrates a real-time pipeline translating mid-air gestures into robotic drawings. Users are able to intuitively interact with the system, producing physical marks through movement alone.
The project produced both digital and physical artifacts, including recorded interaction sessions and robot-generated drawings. One of the key findings was that expressive information is progressively reduced throughout the workflow: dense human gestures become simplified geometric paths after processing and smoothing.
Experiments with AI-generated responses further highlighted how representation affects interpretation. While the system could technically generate responses to gestures, the simplified point-based data limited the semantic richness of the outputs. These findings reinforced the importance of representation and legibility in gesture-based human–robot collaboration.
Grasshopper array test
Mimicking
Generative response