Benchmark Data for Evaluating Visualization and Analysis Techniques for Eye Tracking for Video Stimuli
Eye tracking can be applied in various research fields for the analysis of human eye movement. Therefore, an increasing number of analysis methods have emerged to examine and analyze different aspects of the data. In particular, due to the complex spatio-temporal nature of gaze data for dynamic stimuli, there has been a need and recent trend toward the development of visualization and visual analytics techniques for such data. With this paper, we provide benchmark data to test visualization and visual analytics methods, but also other analysis techniques for gaze processing. In particular, for eye tracking data from video stimuli, existing datasets often provide few information about recorded eye movement patterns and, therefore, are not comprehensive enough to allow for a faithful assessment of the analysis methods. Our benchmark data consists of three ingredients: the dynamic stimuli in the form of video, the eye tracking data, and annotated areas of interest. We designed the video stimuli and the tasks for the participants of the eye tracking experiments in a way to trigger typical viewing patterns, including attentional synchrony, smooth pursuit, and switching of focus. In total, we created 11 videos with eye tracking data acquired from 25 participants.
We provide three components of data: the video stimulus, the recorded eye tracking data, and AOI annotations. These components are given for each of the 11 scenarios summarized in the previous section.
Video stimulus: With the exception of stimulus 7 (Kite), all videos were captured with a Panasonic HDC-SD5 camcorder. Stimulus 7 was captured with an Apple iPhone 4S at 30 frames per second (fps) since the camcorder was not available at this point in time. The other videos were recorded at 25 fps and with a tripod for stabilization. Except for stimulus 3 (Dialog), the audio track was removed from the videos since it was negligible for the tasks. Stimulus 3 has a stereo MP3-coded audio track at 128 kBit/s. All videos were converted to Xvid-coded AVI files with a frame rate of 25 fps and a maximum data rate of 12MBit/s. The videos have a resolution of 1920x1080 pixels and were displayed centered on the screen with their native resolution. These technical parameters were chosen to ensure the compatibility with the eye tracking software.
Eye tracking data: The data was recorded with a Tobii T60 XL eye tracker, with a sampling rate of 60Hz and a 24" screen with a resolution of 1920x1200 pixels. We provide the complete data from the recordings (except for the absolute timestamps due to privacy) in separated TSV files, exported from the Tobii software. The data files include raw gaze data with timestamps and coordinates, as well as fixation indices extracted by the Tobii fixation filter with standard settings (velocity threshold = 35 pixels/samples; distance threshold = 35 pixels). We recommend using the raw gaze data for best accuracy and reliability of the data; this is especially true for videos with smooth pursuit because this type of eye movement is not supported by the fixation filter. A complete description of the extracted file format is available in the Tobii T60 XL manual.
Dynamic AOIs: To support the application of advanced analysis methods based on AOIs, we included sets of manually annotated, dynamic AOIs for every video in the dataset. With this additional information, various AOI-based eye tracking metrics can be applied to the data. AOIs are annotated by dynamic and axis-aligned bounding boxes. We provide the data in an XML format that is compatible to the well documented ViPER file format. Hence, an import to other visualization or analysis systems can be performed by simple XML parsing.
Filename convention: The video filenames are coded by the stimulus ID and the name of the video. AOI annotations have the identical name for the XML file. The names for files with exported gaze data are coded by the order of participant ID, followed by the group, and the stimulus name. For example, the file "P1A-01-car pursuit.tsv" contains the eye tracking data for stimulus S1 (car pursuit) from participant P1 in group A.
|Panning camera follows a red car while it was going
through a roundabout.
|Follow the red car.
|Potential smooth pursuit with long time spans of attentional synchrony on
the red car.
|Camera follows turning car.
The movement of the car describes
the shape of an eight.
|Recognize the shape that is described by the movement
of the car.
on the car with potential
smooth pursuit eye
|Two persons talk to each other in front of the camera.
|Follow the dialog attentively.
|Switching focus between
the faces of both persons.
Label on shirt (right person) attracts additional attention.
|A thimblerig with three cups and a marble.
|Find the cup with the marble.
mainly on the cup with the marble.
A 4x4 memory game. Pairwise flipping of cards is performed until all pairs are found.
|After one card is flipped,
focus on the corresponding
card of the pair.
|Increasing attention on matching cards after several turns and switching focus during the search.
|Two persons play UNO card game until the right player wins.
|For each player's turn, focus
on the playable cards on the
|Switching focus and attention mainly distributed between both hands and the stack of played cards.
|Person on a meadow steers a kite. The kite repeatedly leaves the field of view.
|Follow the flight path of the kite if possible.
|Smooth pursuit if the kite is visible. Otherwise, the participants either tried to estimate the position of the kite, or focused on the person.
|Various persons crossing the field of view while a text ribbon in the lower part is showing further information.
|Task is provided by the text
ribbon: Look for metal case.
on the text ribbon until the metal case appears and the task is readable.
|Three players with orange shirts and one player with a white shirt pass a ball around.
|Task group A: Count ball contacts of the white player.
Task group B: Count passes between orange players.
|Attentional synchrony often on the ball, independent from the task.
|Various persons carrying
different bags are crossing the
field of view.
|Look for a specific bag. Two
groups: two different search
targets, presented before the
|Switching focus on new bags in the scene. Depending on the group, the search targets attract more attention.
|People with different clothing
cross the field of view.
|Task group A: Find the person
with a hooded sweater.
Task group B: Find the person
with a red shirt and a
|Switching focus on new
persons. After identification, search targets become less important than new persons.