Multimodal Zero-Shot Activity Recognition for Process Mining of Robotic Systems
Published in Business Process Management: Responsible BPM Forum, Process Technology Forum, Educators Forum. BPM 2025, 2025
Recommended citation: F. Corradini, S. Pettinari, B. Re, L. Rossi, and M. Sampaolo. Multimodal Zero-Shot Activity Recognition for Process Mining of Robotic Systems. Business Process Management: Responsible BPM Forum, Process Technology Forum, Educators Forum. BPM. vol 565, 263–275, 2025
Abstract
Understanding and analyzing the behavior of robotic systems is essential to ensure their reliability, efficiency, and continuous improvement, especially as robots are increasingly deployed in complex, dynamic environments. Process mining offers a powerful approach to uncover and analyze the execution of robotic operations. However, applying process mining to robotic systems requires bridging the gap between fine-grained multimodal data and high-level activity representations. Recent advances in foundation models provide a promising solution to this challenge, as the knowledge acquired during their extensive pretraining enables them to interpret multimodal data without the need for task-specific training. In this work, we propose a novel multimodal process mining pipeline that leverages the zero-shot capabilities of foundation models to perform activity recognition from visual and auditory inputs. By transforming fine-grained multimodal data into event logs, the pipeline enables the application of process mining techniques to robotic systems. We applied our approach to the Baxter UR5 95 Objects dataset, which offers synchronized video and audio recordings of a Baxter robot manipulating objects. The fusion of activity recognition results from these complementary modalities yields an event log that more accurately represents the robot’s operations, mitigating imprecision associated with using a single modality. Our results demonstrate that foundation models effectively enable the application of process mining to robotic systems, facilitating monitoring and analysis of their behavior.
Download paper here
Cite as: F. Corradini, S. Pettinari, B. Re, L. Rossi, and M. Sampaolo. Multimodal Zero-Shot Activity Recognition for Process Mining of Robotic Systems. Business Process Management: Responsible BPM Forum, Process Technology Forum, Educators Forum. BPM: vol 565, 263–275, 2025
@inproceedings{corradiniPRRS25a,
title={Multimodal Zero-Shot Activity Recognition for Process Mining of Robotic Systems},
author={Corradini, Flavio and Pettinari, Sara and Re, Barbara and Rossi, Lorenzo and Sampaolo, Massimiliano},
booktitle={Business Process Management: Responsible BPM Forum, Process Technology Forum, Educators Forum},
year={2025},
series={LNBIP},
volume={565},
pages={263–-275},
publisher={Springer}
}
