With the DIY maker kits, Google claims you can “build intelligent systems that see, speak, and understand.”
The first three entries in my “2020: A consumer electronics forecast for the year(s) ahead” piece, published back in January, all had to do with deep learning. Why? Here’s part of what I wrote back then:
The ability to pattern-match and extrapolate from already-identified data (“training”) to not-yet-identified data (“inference”) has transformed the means by which many algorithms are developed nowadays, with impact on numerous applications.
This transformation is already well underway, as even a casual perusal of the titles and coverage topics of content published at EDN, EE Times, and elsewhere will make clear. Don’t panic: there’s still time to “catch the wave,” especially if your focus is on resource-constrained implementations. But you don’t want to wait too long lest you end up stuck bobbing around in the water while more foresighted colleagues are already at the beach enjoying the AI “party.”
So where to start? There’s plenty of low-cost hardware out there feasible for implementing deep learning training and (especially, along with being your likely implementation focus) inference, as well as plenty of open source (translation: free) and low-priced software, some tied to specific silicon and other more generic. Tying the two (hardware and software) together in a glitch-free and otherwise robust manner is the trick; select unwisely and you’ll waste an inordinate amount of time and effort wading through arcane settings and incomplete (and worse: incorrect) documentation, trying to figure out why puzzle pieces that should fit together perfectly aren’t.
That’s where Google’s AIY (which stands for “Artificial-Intelligence-Yourself,” a play on DIY, i.e., “Do-It-Yourself) Project Kits come in. They’re targeted at hobbyists and professionals alike: in Google’s own words, “With our maker kits, build intelligent systems that see, speak, and understand. Then start tinkering. Take things apart, make things better. See what problems you can solve.” While the hardware and software included in each kit may not match what you end up using in your own designs after you get up the initial learning curve, the fundamentals you’ll “grok” in using them will stay with you and continue to apply. Reminiscent of Google’s DIY AR and VR headset, low-cost cardboard dominates the chassis construction materials suite. And Github is their common software repository.
As I wrote back in January, “Clusters of image pixels aren’t the only thing that deep learning is good at pattern-matching, of course. What about phonemes and other units of sound?” Google describes its AIY Voice Kit this way: “With the Google Assistant built-in, build an intelligent speaker that can understand you, and respond when you ask it a question or tell it to do something. Create your own projects that use voice recognition to control robots, music, games, and more.”
I’d pre-ordered my Voice Kit from Micro Center in late August 2017; it became available for pickup two months later. Mine’s the v1 version:
The v2 successor, which is what you’ll likely get if you purchase one today, is more feature-complete out-of-box:
The Raspberry Pi and SD card (micro SD in v2) that I had to supply myself with the v1 kit are included in the updated kit release, although you’ll still need to provide your own USB power adapter. This comprehensiveness enhancement is certainly nice from a convenience standpoint, although there’s an associated price uptick; mine cost $24.99, while the v2 kit is $30 more expensive than that. One other note bears mentioning; out of box the AIY Voice Kit leverages Google’s cloud-centric Assistant and Speech APIs, although it’s possible to upgrade the software suite to alternatively (or additionally, if you prefer) run the edge-centric (therefore offline-capable) TensorFlow framework.
One month after my v1 Voice Kit came in, I placed a pre-order for its v1 Vision Kit sibling:
Stepping back for a moment, what’s the opportunity for deep learning here? Building on my earlier point that clusters of image pixels aren’t the only thing that deep learning is good at pattern-matching, here’s what I wrote back in January:
Computer vision was one of the first disciplines to enthusiastically embrace deep learning, and for good reason: traditional algorithm development was tedious and narrow in applicability, not able to accurately handle “corner cases” such as off-axis object views, poor lighting conditions, and atmospheric and other distortions and obscurations. Plus, algorithms developed to identify one class of objects often required re-coding from scratch in order to identify a different object class.
Conversely, with deep learning, after you enter a sufficient-sized and robustly labeled data set into a training routine, the resultant deep learning model is able to robustly identify similar objects within that same class. Need to broaden the object suite to be identified? Incrementally train the model with more data. And even if you need to swap out models in order to deliver sufficient identification breadth, the underlying framework can remain the same.
As with its Voice Kit predecessor, the Vision Kit is now available in a more comprehensive v2 build, albeit again with an associated price uptick; mine was $44.99, while the newer version is $55 more than that:
The Raspberry Pi Zero WH (wireless, i.e. Wi-Fi-supportive) that I had to come up with by myself is now included, and notably (at least for the soldering-adverse) also includes a pre-installed header connector (therefore the added “H” suffix naming this time). Also in the box this go-around is the necessary Raspberry Pi Camera Module v2 (note if you have a v1 kit like me, the included software does not support the alternative Pi NoIR Camera Module v2, guess which one I bought the first go-around?). As with the Voice Kit evolution, the mass storage format switched from SD (not included) to micro SD (included). But as with the Voice Kit evolution, you’ll still need to come up with your own USB power adapter (I suspect that they assume most of us already have a sufficient-current-output one lying around).
Processors, coprocessors and next steps
Careful readers may understandably be confused at this point. Considering that vision applications are generally quite a bit more compute-demanding than their voice alternatives, why on earth is the Vision Kit based on a Raspberry Pi Zero board containing a Broadcom BCM2835 (integrating a 32-bit 1 GHz single-core ARM1176JZF-S CPU, and with a VFPv2 FPU that doesn’t support the NEON instruction set) while the v1 Voice Kit is based on a Raspberry Pi 3 board containing a BCM2837 (integrating a 64-bit 1.2 or 1.4 GHz quad-core Arm Cortex-A53, and with a NEON-supportive VFPv4 FPU)?
Part of the answer is that Google seemingly over-estimated the Voice Kit’s processing requirements, although the v2 kit’s transition to native TensorFlow support may have also been a factor. The v2 Voice Kit, like its Vision Kit sibling, is also based on a Raspberry Pi Zero containing a BCM2835; both versions of the Voice Kit include a supplemental hardware board (the Voice HAT in v1, the smaller Voice Bonnet in v2), but their silicon content primarily consists of DACs, ADCs, microphone preamps, speaker amps, and the like.
Speaking of Bonnets, however, the supplemental hardware in the Vision Kits is significantly more substantial. The Vision Bonnet common to both Vision Kit versions is based on a Movidius (now Intel Movidius) Myriad 2 (MA2450) Vision Processor (VPU), thereby adding significant (and deep learning-centric, not to mention vision-optimized) processing horsepower to the mix.
My v1 kits may be obsolete, but they’re still eminently usable (although it’s embarrassingly now been 3 years since I first got them, and I still haven’t fired either one up, busy, don’cha know). Trust me; I really still plan to do so in both cases, predictably providing plenty of fodder for future blog posts in the process. And those of you lucky enough to get your hands on upgraded v2 hardware don’t need to wait for my lead; feel free to post your hands- (and eyes-, and ears-) on impressions of the Voice and Vision Kits in the comments!
—Brian Dipert is Editor-in-Chief of the Edge AI and Vision Alliance, and a Senior Analyst at BDTI and Editor-in-Chief of InsideDSP, the company’s online newsletter.
This article was originally published on EDN.