Problems faced when using iPhone to capture 360° car images

Article By : Jai Chaudry and Greg Surma

A camera-heavy SDK allows engineers to build camera pipeline and capture high-resolution, fully interactive 360-degree images of cars.

Today’s iPhone has become extremely powerful. At Fyusion, we utilize that power to perform 360-degree imaging of cars and then ship SDKs that capture, analyze, view, and upload that imaging data. Even with powerful iPhone processors, we face certain problems in everyday development work.

Figure 1 Users can inspect vehicle damage with iPhone. Source: Fyusion

We have a camera-heavy SDK, so we build our own camera pipeline internally, which enables us to capture high-resolution, fully interactive 360 images of cars that can be augmented with a variety of machine learning (ML) capabilities. But of course, using a camera comes with a well-established set of problems.

Camera overhead and overheating

Our SDK has a use case that keeps the camera open for a long time, and that tends to heat up the phone. Hot summer days make this issue even more noticeable.

We utilize Apple’s MetalKit and AVFoundation APIs which are usually GPU and CPU heavy. After capturing a 360 video, we apply in-house processing on top of the captured media using computer vision and tools like Core ML. All this can contribute to overheating if not balanced properly. So, we have come up with various solutions on how to optimize this problem.

The most important aspect of on-device machine learning on iOS is to leverage neural engine, which is designed explicitly for accelerating machine learning inference. It’s on average about 10 times faster than the GPU, and with this performance boost, you also enjoy a lower energy consumption. However, in order to use it, we have to use only the supported layers.

One valuable application for on-device machine learning is real-time visualization. To achieve smooth real-time visualizations without overheating the device, below are a few things to consider:

  1. Skipping framesnot running ML models on every frame. Usually, 25 fps is a good tradeoff between accuracy and smoothness.
  2. Using direct data pointers instead of subscripting arrays when reading ML outputs. The time difference between these two approaches may seem negligible at first glance, but when run thousands of times per frame, it really adds up. For example, unpacking a value with a pointer offset approach in one of our apps takes 0.000000012 seconds compared to 0.00000076 seconds for an array subscript approach. Both of these values may seem very small, thus negligible, but at the end of the day, they are crucial for performance.
  3. Using the smallest network that can get the job done. Since inference time is usually proportional to the network’s size, it’s better to stay careful and avoid big networks that can take too much time to run smoothly in real-time.
  4. Using Metal kit for visualizations instead of UIKit. Since we strive to generate at least 25 frames per second, our code needs to be as fast as possible. Computing visualizations with Metal kit shaders is orders of magnitude faster than with UIKit. That’s simply because UIKit is a high-level framework that’s primarily used for interfaces and UI elements that are mostly static. On the other hand, Metal, which as the name suggests, is literally close to the hardware, and provides low-level and low-overhead APIs suited for high-performance graphics rendering. Apart from that, in order to present an overlay with UIKit, we would first need to compute a UIImage, which is done mostly on the relatively slow CPU. On the other hand, with Metal, we can provide Metal shaders that calculate multiple pixel values at the same time, and with all of them computed on the GPU, we can render our overlay much faster.
  5. Using low-level C++ and Objective-C++ instead of higher-level languages like Swift. Even though Swift is catching up in terms of performance, it still is not on the level of C++. The general rule of thumb is the more we care about performance, the closer to the hardware we need to go.
  6. Quantizing weights. While it won’t directly increase the inference speed, it will reduce the memory footprint of the ML model and its computations, which is beneficial for the general app performance.

Uploading tasks

While uploading at first might seem like a trivial task, it comes with its own problems. In our case, our users were often using our app in parking lots, which provided us with tough network conditions. We also had to make sure that uploads continue even if the app is running in the background or is suspended by the OS. While Apple does provide specific APIs to do such tasks, it’s not as straightforward as you might think.

Figure 2 Users can capture 3D images of vehicles on their smartphones, add audio or visual tags, and send them via email, text, or messaging apps. Source: Fyusion

In the past, we used to upload our data—in our proprietary file format called a .fyuse—over a series of multiple calls. While that might have been a good solution at first, for cases where there was a queue of .fyuses, it wasn’t the best idea. Remember, we need to support uploading in the background and an app’s rate limiter always comes into the picture. You can think of a rate limiter as a thing that increases your launch time every time you try to wake up the app. Waking up can be anything like a deep link URL, network call, and push notification. A series of calls can pile up and increase time to launch so much that nothing gets launched. Think about 100’s of uploads in the queue.

For this reason, we decided to move forward with zipping the file’s content into one single chunk, so that it decreases the number of times that an app has to wake up. We also decided to use a presigned S3 URL to upload it. The only disadvantage of this is that if an upload were at 98% and somehow it failed, even then it would have to start at 0%. To accommodate this situation, we recommend chunking sufficiently large files.

The above pipeline also comes with some limitations:

  • We need to make sure that all files are written to disk before we can begin uploading. That, in turn, increases I/O operations and thus increases CPU usage.
  • It’s very hard to debug this kind of pipeline using background URLSession APIs by Apple; the only way out is to check the macOS console and see if the uploads are running in the background.

Debugging issues

When developing an SDK, it’s very important to have the ability to track how a user is using the app, track down logs of any errors and warnings generated while using the SDK, and to be able to check for crashes.

In this situation, it’s tempting to reach for a tried-and-true third party framework. Unfortunately, third party frameworks tend to be designed with application-level tracking in mind, and leverage singleton patterns which could cause unintended conflicts if embedded within an SDK. So, we have come up with various other mechanisms to help us track crashes; for instance, printing logs when someone is capturing a 360-degree image using our SDK. We are also able to use the Atos tool to symbolicate a crash if we have its address on the SDK binary file.

As mentioned earlier, we usually have important logs printed out when someone is using our 360 imaging SDK. These logs usually give us information about critical errors, warnings, and any other event that might be of use while debugging a bug or crash. We also utilize the OSLog framework provided by Apple to monitor logs in debug builds, as well as the OSLog sign posts to monitor time-consuming processes in our SDK. The OSLog prints logs directly to the Mac Console, so it’s easy to debug issues.

This article was originally published on EDN.

Jai Chaudry is head of iOS engineering at Fyusion with a background in product development.

Greg Surma is senior software engineer at Fyusion with a background in computer vision, iOS, and machine learning.

 

Leave a comment