Simulation testing for 3D object detection, collision prediction in AVs

Article By : Vidya Sagar and Leya Lakshmanan

Here is how to perform testing of the Point-GNN algorithm using LiDAR data in the simulated environment for autonomous vehicle designs.

There was a time when self-driving cars taking over the roads was a distant vision. Today, the situation has changed dramatically. Although autonomous vehicles (AVs) took longer than expected to make their first appearance, automotive experts believe that this technology is poised to mature quickly.

Automotive OEMs and their key technology partners are already in the process of developing and testing truly self-driving vehicles—Level 5 autonomous cars—that can cruise along open roads without human drivers. This is partly made possible through the technology of sensor fusion, powered by machine-learning algorithms.

Perception and decision-making

  • Perception
  • Decision-making

While perception includes the ability of the machine-learning model to accurately identify and classify surrounding objects, decision-making encompasses the ability of the algorithm to determine the position of the object in the future and the possibility of a collision. The vision of a Level 5 autonomous vehicle would be possible only if both these milestones are achieved with accuracy approaching 100%.

Both milestones pose a fair degree of challenge to the machine-learning engineer. To test and ensure supreme accuracy in the perception and decision-making processes, our engineering team designed a simulation environment to test autonomous vehicles. Any algorithm can be tested using this simulated environment. Algorithms can utilize inputs from LiDAR, camera, and radar (as necessary) and provide accurate predictions.

In this article, we discuss the testing of the Point-GNN algorithm in the simulated environment.

What’s Point-GNN?

Point-GNN is a machine-learning algorithm that processes the point-cloud data for object detection. Here, we discuss the use of LiDAR data and Point-GNN, as well as modified Resnet, for the detection and tracking of 3D objects across multiple frames. We also use this data to predict collisions and the time to collide—a crucial factor for avoiding accidents.

We used sensor fusion to improve the accuracy of the prediction. We also analyzed the performance of graphs and Point-GNN when compared with alternative methods.

Development of simulation environment

The work on this project started toward the end of 2020. In the initial days of the project, we focused on developing the simulation environment. Sensor-fusion algorithm development started after the simulation environment was ready for testing.

Carla Simulator

In our project, we used the Carla Simulator to develop the demonstrator. The underlying Unreal Engine of Carla manages the rendering, physics of objects, and their movements. It also controls the movement of non-player characters (NPCs) that are essential for realism. Additionally, Carla supports the simulation of various weather and lighting conditions to test our algorithms for ADAS/AD functionalities.

Carla also provides various sensor suites such as radar, LiDAR, color/grayscale cameras, IMUs, and more. For this project, we have configured a LiDAR sensor for 3D point-cloud data and a camera for visualization.

Carla also offers data pipeline for training and validation of ADAS/AD algorithms. In our project, we used the classic data pipeline that comprises subsystems for perception, planning, and control.

Kitti vision benchmark suite

The need for simulated data in autonomous-driving applications has become increasingly important, both for validation of pretrained models and for training new models. For these models to generalize real-world applications, it is critical that the underlying dataset contains a variety of driving scenarios and that the simulated sensor readings closely mimic real-world sensors. In this project, we used the Kitti dataset for training and testing.

Because there are multiple coordinate systems for each sensor, it is necessary to transform coordinates from one reference to the other and fuse the data from multiple sensors. For instance, it’s important to perform transformation operation and correctly draw the bounding boxes in image data, so the bounding boxes of the object detected in point-cloud data (from the LiDAR sensor) are represented accurately.

Solution architecture

A high-level architecture diagram of our project is shown below:

Figure 1 The block diagram shows the high-level view of the solution architecture. Source: Embitel

The following steps are performed to detect 3D objects from a point cloud:

  • Point cloud to graph conversion
  • A GNN of “T” iterations
  • Bounding box merging and scoring

A graph is a unique data structure for representing point-cloud data as it doesn’t have a fixed form. It consists of vertices/nodes and edges, as shown below:

Figure 2 Point cloud to graph conversion comprises vertices/nodes and edges. Source: Embitel

A graph (G) can be constructed for a point cloud as G = (PE), where P represents the point cloud and E represents neighbors in a fixed radius.

We also used the Voxel downsampling technique before the construction of the initial graph to reduce computational complexity.

GNNs use the concept of nodes being defined by their neighbors to derive output. So, in Figure 2, the state of A can be updated based on the states of B, E, and D and the features of edges AB, AE, and AD.

Prediction of collision and time to collide

After the object-detection stage, the next steps are prediction of collision and computation of time to collide. In this context, “ego vehicle” is the term used to refer to the vehicle that has the sensor suite fitted and is controlled by the user (manually or through API calls).

Prediction is done in two steps:

  • Based on the present speed of the ego vehicle, find all the vehicles with which a collision is possible within a time period, t seconds. This is done by extending the bounding boxes based on the current speed of the vehicle and finding possible overlap using a polygon intersection algorithm.
  • Once the possible collisions are identified, for each of the objects, compute the time to collide by extending bounding boxes in small steps; for example, in smaller intervals of time, and check for possible intersection of polygons.

Examples for bounding boxes before and after extension are shown below along with LiDAR data.

Figure 3 This image shows the detected bounding boxes. Source: Embitel

Testing and evaluation of different algorithms

After the installation of the necessary libraries for the programming environment, the 3D object-detection data from the Kitti dataset is used. For Point-GNN, a pretrained model is employed, and evaluation is run over all 7,500+ test samples.

For comparison, modified Resnet is used to train and test on Kitti data. We used pretrained Resnet parameters and added a layer of our own to tune the parameters on training dataset (Kitti).

For modified Resnet, training is performed over nearly 7,400+ training data. For testing, 7,500+ test samples are used.

The workflow of Point-GNN in Carla is shown below:

Figure 4 The Point-GNN workflow is shown in Carla Simulator. Source: Embitel

We identified that Point-GNN has better mean precision when compared with modified Resnet. However, the precision for pedestrians is significantly less than cars.

Figure 5 This is how 3D object detection with LiDAR looks like. Source: Embitel

Figure 6 The comparison shows the difference in achieving object detection through Point-GNN and modified Resnet approaches. Source: Embitel

The road ahead

  • We have been able to successfully build a framework for testing ADAS/AD algorithms in a simulated environment while testing different algorithms.
  • We found that the use of LiDAR is effective in object detection in 3D space.
  • The Point-GNN algorithm gave the best results in a 3D environment, with close to 90% precision for detecting cars.
  • Modified Resnet delivered reasonable results when compared with Point-GNN. The model can be trained with more classes and with a larger detected area by modifying configurations.
  • We are exploring the reasons for low precision in detecting pedestrians.
  • There is a need to rewrite Point-GNN using fast libraries such as PyTorch to improve performance in terms of time taken for execution.
  • We will be developing APIs for easier integration of different algorithms with Carla.

Currently, our team is working on these aspects to fortify the capabilities of the system for perception and prediction.

This article was originally published on EDN.

Vidya Sagar Jampani is head of Embitel’s IoT Business Unit.

Leya Lakshmanan is head of marketing at Embitel Technologies.


Leave a comment