Industry leaders have all integrated SLAM technology into their AR/VR headsets using low power, embedded processors. Why can’t robot developers do the same?
Robots have the potential to help solve many of today’s challenges, from shortages of workers in key industries to fighting climate change. The robotics industry today is evolving quickly to meet these challenges, but is still best represented as thousands of niche verticals all with different objectives, technical and commercial requirements. To really take off it must accelerate from siloed development to a supply chain model that sees specialists contributing solutions to specific challenges like SLAM.
Simultaneous Localization and Mapping (SLAM) is a fundamental requirement for autonomous robots. It is the complex processing of data from multiple sensors that allows an autonomous system to estimate its position within an environment. It is also one of the most challenging aspects of creating robots that are able to navigate themselves. Whilst the past few years have seen an explosion in mobile robots able to move around factories, warehouses, hospitals and even shopping malls and private homes, most still have limited true autonomy. There are three main reasons for this.
Three questions that define autonomy
To be truly autonomous, a robot must be able to answer three fundamental questions:
Incorrect answers to any one of these questions lead to the vast majority of robot failures. In most cases they simply freeze, and there are countless stories of humans needing to step in to reorientate ‘autonomous’ robots already working in today’s environments.
Pyramid of Spatial Intelligence (Source: SLAMcore)
Fully autonomous robots will answer these three questions as part of a spatial intelligence pyramid. At the base they will know their position, they will then build effective maps and finally perceive the difference between different objects in those maps.
The final stage not only builds on the previous two but helps improve them. Robots that can identify an object as a chair or table, rather than a wall, can ignore them as long-term landmarks (because they may move, they are not useful for positioning). This reduces the number of potential landmarks that need to be considered, lowering processing overheads. Of course, people and other important elements of a scene can also be identified and robots programmed to act in specific ways around them.
Different sensors – different data
The complexity of SLAM starts with the need to analyze data from numerous different sensors which provide information about the space the robot is in. There are many types of sensor each of which has its own pros and cons and designers rarely rely on just one. For example, LiDAR is a popular type of sensor for robots – perhaps the most commonly used currently. It uses lasers to calculate the distance between a robot and other objects. It is fast, accurate and reliable. However, most LiDAR’s used today provide only a thin ‘slice’ map of the environment so several might be used or augmented with other sensors including odometers, gyroscopes and accelerometers.
With every new sensor added comes additional cost, and additional complexity. Each new feed of data must be integrated and calibrated. More data means more work for processors to handle. In turn this means more powerful processors that consume more energy and so need bigger batteries and suddenly the whole robot design has changed again.
Learn from nature
A different approach is needed. One that delivers the benefits of robust and accurate SLAM even to those designers that don’t have access to world-class experts with PhDs in spatial intelligence.
The key to this approach is a visual SLAM solution optimized for the most commonly used sensors and processors. Most animals constantly calculate their position, map the world around them and understand what objects are using just two types of sensor – their eyes and inner ears. Visual Inertial SLAM systems do the same. With two simple cameras – like those found in most smartphones, and an IMU – an inertial measurement unit that tracks orientation and acceleration, these systems provide cost effective yet accurate positioning, mapping and perception for robots.
Industry leaders including Microsoft, Google and Facebook have all integrated this technology into their AR/VR headsets for consumer and business use. They use low power, embedded processors to provide light, cost effective wearable solutions that use SLAM capabilities to create accurate immersive worlds for their users. So why can’t robot developers do the same?
The optimization challenge
The answer lies in the way these systems are designed and the economics of their production. The hardware, sensors and algorithms are all heavily optimized to work together. Absolute accuracy of timing is essential for SLAM estimations to be precise. Data feeds from the cameras and inertial measurements from the IMU are tightly integrated and time stamped to the millisecond fusing the sensor data into a consistent data stream for the algorithms to process.
This high level of optimization means that the SLAM software will only work with that exact combination of hardware. Porting software from one of these products to work on another would not just yield substandard performance – it would not work at all. Whilst this is fine when you plan to manufacture millions of relatively low-cost devices all doing the same thing – it makes no sense for a typical robot developer who has specific hardware requirements but is only looking to sell in the low thousands at best.
But Visual Inertial SLAM does still have a bright future in the robotics industry. As mentioned, the sensors are low-cost and easy to source. Cameras also provide masses of useful spatial information. In fact, you can get more data for SLAM from a single 1-megapixel VGA camera costing less than one dollar than from a top of the range LiDAR costing thousands of dollars. The challenge is to process this data reliably in real-time without using excessive computing power or energy.
The solution is to deploy specialist PhDs to create cutting edge Visual Inertial SLAM software that can slot seamlessly into the autonomy stacks of a wide range of robots. By developing highly efficient algorithms that deliver highly accurate results using standard sensors and low-cost, low-power, embedded processors the whole journey to SLAM effectiveness can be accelerated.
But, if as noted above, the optimization of software and hardware is essential for good results, how can you create algorithms useful for all types of robot with different sensors? One solution is to optimize algorithms for a selection of popular and easily available hardware options. For example, X86 processors as well as the Jetson range from NVIDIA plus Intel RealSense Depth cameras D435i and D455. Optimizing for these most commonly used and highly regarded components in the industry will allow the majority of developers and designers to quickly integrate effective SLAM into their robot prototypes. This range can then be extended over time as more hardware options become commonly available.
The coding of three levels of SLAM
Visual SLAM algorithms work by creating probabilistic models of the environment by selecting a number of natural features visible in the scene captured by the camera. The relative position of these features to each other and to the robot can be calculated even with a single camera. As the robot and the camera move, the same identified features are visible from a new angle. Using parallax principles, the difference between the two views can be used to calculate distance. This single camera SLAM principle was first demonstrated in a real-time system by SLAMcore founder, Professor Andrew Davison in 2003 with his seminal paper MONOSLAM.
click for full size image
Architecture of a feature based visual-inertial SLAM algorithm (Source: SLAMcore)
Detecting specific features that are suitable for calculating position is at the heart of effective SLAM. That means transforming the rich and heavy data streams from the cameras into something which can be quickly processed with even low-end processors. ‘Feature detection’ is an exercise in dimensionality reduction, going from millions of pixels to just hundreds of points useful for locating the robot. Feature detection is performed on every pixel in a scene, but the computation can be parallelized. Two of the world’s leaders in the area, SLAMcore co-founder Dr Leutenegger and CTO Dr Alcantarilla – authors of two of the most popular open-source feature detectors, BRISK and AKAZE, have developed SLAMcore’s capabilities in this area.
Detecting the right features and positioning them accurately allows the creation of a sparse, point-cloud map of a robot’s surroundings. The highly efficient process means that maps of spaces from living-room to warehouse scale can be created with cm-level accuracy. These sparse maps are the foundation of the spatial intelligence pyramid allowing robots to accurately estimate their position in real-time. The efficiency of these algorithms means that with two cameras and an IMU these SLAM position maps can be processed on a Raspberry Pi.
Deploying SLAM in robots should also be quick and easy. For example, developers can launch SLAMcore’s core positioning algorithms from a simple library launched with a single command (as below). As long as a compatible sensor is plugged in then all that is required is to open the terminal and type the following:
click for full size image
SLAMcore Visualiser – 3D View Features (Source: SLAMcore)
There are thousands of hyper-parameters associated with a SLAM system. From the number of features that should be detected in each frame to the distance at which they start to be rejected. Each parameter can be adjusted to tune the performance of this complex system. Instead of forcing developers to wade through lines of source code and adjust parameters through trial and error, simple presets corresponding to specific planned use-cases can be used. For example, warehouse, office, drone, wheeled robot, indoor, outdoor, high accuracy, high speed. These presets can be selected just by adding an extra command:
user@ubuntu:~$ slamcore_visualiser -c ~/preset_file.json
click for full size image
Depth image creation and 3D mapping (Source: SLAMcore)
Building on these maps, using the data from the same sensors, a depth image is created. This is similar to a regular image but instead of each pixel representing colour, it represents the distance away from the camera. Algorithms then combine this information with the position estimate previously described to create rich dense maps.
As with positioning algorithms, mapping algorithms are also provided as a simple library launched by adding a simple text flag to the launch command. With a compatible sensor plugged in, open the terminal and type the following:
user@ubuntu:~$ slamcore_visualiser -m 1
|click for full size image
SLAMcore:Map 2.5D (Source: SLAMcore)
|click for full size image
SLAMcore:Map 3D (Source: SLAMcore)
2.5D maps show volumes of space as well as heights that build into more detailed maps of the world around the robot. They indicate where the robot could move, and which space is occupied to what height. 3D maps with increasing levels of detail can also be generated, depending on the amount of time and processing power available. Detailed 3D maps can be created, saved and uploaded to robots who then use the faster, sparse maps to position themselves in real-time in those environments.
Essential for industry’s progress
Optimized algorithms that work out-of-the-box with the most popular hardware combinations will dramatically reduce the barriers to entry for those looking to integrate visual inertial SLAM into their autonomous robots. Accessing these, developers will free up time and resources to focus on the applications and functions that make their robots different and useful, rather than repeating the trial and error process just to get them to accurately position themselves and create maps allowing them to get from A-to-B. They will get their robots to market faster and provide more cost-effective solutions.
The future is bright for robotics. By helping solve the complex challenges of SLAM and democratizing access to some of the most cutting edge research, practical application and world-leading PhDs in the field, we hope to bring this future closer for everyone.
This article was originally published on Embedded.
Owen Nicholson is Founder and CEO of SLAMcore. Nicholson’s early career saw him managing research and development projects for government and commercial organisations which ultimately led him to lead commercialisation at the Robotic Vision Lab, Imperial College London. Working alongside genuine world-leaders in the field of machine vision, Nicholson helped to transition leading edge academic research into applications that could quickly deliver benefits to the wider world. Seeing the potential for SLAM in autonomous robots, he founded SLAMcore in 2016 with the goal of democratising access to cutting edge, visual SLAM technologies.