AVRGR – Audio Visual Robot Gesture Recognition
The aim of this task is to recognize gestures from audio and video. In the Ravel data set [1, 2], the “Robot Gesture” scenario was conceived for this particular purpose. Indeed, several actors/actresses performed gestures such as “point”, “yes”, “not”, … The task is the isolated recognition of these robot gestures.
Evaluation metric
In order to have a common way to evaluate different methods, the “confusion matrix” should be used when targetting this task. A confusion matrix for N classes is an N × N matrix with its ij-th element (i-th row, j-column) means: how many instances os class i the method recognized as class j. The confusion matrix for all the gestures should be provided. Obviously, testing and training subsets should be different. We reccomend a leave-one-out strategy for evaluation.
[1] The RAVEL data set. http://ravel.humavips.eu/.
[2] Xavier Alameda-Pineda, Jordi Sanchez-Riera, Vojtech Franc, Johannes Wienke, Jan Cech, Kaustubh Kulkarni, Antoine Deleforge, and Radu P. Horaud. The ravel data set. In IEEE/ACM ICMI 2011 Workshop on Multimodal Corpora, Alicante, Spain, November 2011.