When to Use What Data Set for Your Self-Driving Car Algorithm: An Overview of Publicly Available Driving Datasets


Data collection on public roads has been deemed a valuable activity along with the development of self-driving vehicles. The vehicle for data collection is typically equipped with a variety of sensors such as camera, LiDAR, radar, GPS, and IMU. The raw data of all sensors is logged on a disk while the vehicle is manually driven. The logged data can be subsequently used for training and testing different algorithms for autonomous driving, e.g., vehicle/pedestrian detection and tracking, SLAM, and motion estimation. Data collection is time-consuming and can sometimes be avoided by directly using existing datasets including sensor data collected by other researchers. A multitude of openly available datasets have been released to foster the research on automated driving. These datasets vary a lot in terms of traffic conditions, application focus, sensor setup, data format, size, tool support, and many other aspects. This paper presents an overview of 27 existing publicly available datasets containing data collected on public roads, compares each other from different perspectives, and provides guidelines for selecting the most suitable dataset for different purposes.

Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems (ITSC)