When Google first announced Google Lens last year, it was described as a kind of search in reverse. Rather than type a text query to find image results, you could point your phone’s camera at an object, like a dog or a plant, to find text-based information. Lens was not only a statement about your camera as an input device but also a most Google-y expression of technology: It combined search, computer vision, AI, and AR, and put it all in apps that weren’t limited to one ecosystem.
At this year’s developers conference, Google announced the most significant update yet to Google Lens—one that emphasizes shopping, text-reading, and additional language support. And to make Lens more convenient for people to use, Google has convinced a bunch of handset partners to offer Lens as an option right in the native camera app.
The new features, which roll out at the end of May, represent Google’s next steps to make your smartphone camera “like a visual browser for the world around you,” says Aparna Chennapragada, vice president of product for AR, VR, and vision-based products at Google. “By now people have the muscle memory for taking pictures of all sorts of things—not just sunsets and selfies but the parking lot where you parked, business cards, books to read,” Chennapragada says. “That’s a massive behavior shift.”
In other words, Google’s vision of the future still involves searching for things. Now it’s just by whipping out your phone and pointing the camera at something, a behavior that’s become second nature to smartphone users. But Google knows it’s not the only tech company working on visual search, so it’s trying to wedge Lens into places you’re already active on your phone.
Earlier versions of Lens could be accessed through Google Assistant and Google Photos; the new version will be built directly into the camera on more than ten different Android phones. This includes Google’s Pixel phones; handsets from Asus, Motorola, Xiaomi, and OnePlus; the new LG G7 ThinQ; and more. On the G7 ThinQ, Lens will also have a physical button—press it twice and the Lens camera automatically opens—the same way that Bixby has a dedicated button on Samsung flagship phones.
In a demo of the new features, launching Lens with a physical button worked like it was supposed to on the LG G7 ThingQ. On phones without a dedicated Lens button, Lens appears as one of the main options in the camera app, the same way that video recording does.
Another thing that’s new about Lens: The camera app starts scanning the space around you as soon as you open it. “We realized that you don’t always know exactly the thing you want to get the answer on,” says Clay Bavor, Google’s vice president of virtual and augmented reality. “So instead of having Lens work where you have to take a photo to get an answer, we’re using Lens Real-Time, where you hold up your phone and Lens starts looking at the scene [then].” This scanning function appears as a series of AR dots, mapping the world around you, before a virtual button appears as a ready-to-go signal.
Both native camera access and the Lens Real-Time feature contribute to a faster visual search experience, but the latter also mean Lens grabs information you may not need it to. In one instance, I pointed the new Lens at a pair of shoes only to get search results for the restaurant Nopalito, because the restaurant’s menu was sitting on a shelf below the shoes and Lens had picked up on it as I raised the camera. It also wasn’t 100 percent accurate when it came to shopping, one of the other key features of the new Lens. At one point, Lens identified a large gray sweater as an elephant.
But the version of the app I saw was still in beta, and Google says the misidentification will be fixed by the time it rolls out at the end of the month. And in general, the shopping results were impressive. An earlier version of Lens might simply identify the object as a sweater, or a pillow, or a pair of shoes. The new Lens has something Google calls “Style Match”: It found a match for all three items, showed options for where to buy them, and recommended similar items. It even knew the pillow I brought with me for the demo was from Etsy.com. If the first version of Lens was about pets and plants, this version might be defined by clothes and home decor.
The new Google Lens will also support Spanish, Portuguese, French, German, and Italian—which, it’s worth noting, is different from translation. Lens has always been able to translate languages supported by Google Translate. This update just means if you’re a native speaker in one of those new languages, you can run a version of Lens that’s specific to that language.
Of course, Google already has all of that information indexed, whether it’s puppy breeds, restaurant menus, clothing inventory, or foreign languages. So why is it so hard to bring it all to Lens search? Chennapragada insists that it’s quite difficult to provide on-the-fly context for visual objects in what she calls a “very unstructured, noisy situation.”
“We’ve always used vision technology in our image recognition algorithms, but in a very measured way,” she says.
Bavor says it’s also the sheer number of objects that exist in the world that makes visual search a unique challenge. “In the English language there’s something like 180,000 words, and we only use 3,000 to 5,000 of them. If you’re trying to do voice recognition, there’s a really small set of things you actually need to be able to recognize. Think about how many objects there are in the world, distinct objects, billions, and they all come in different shapes and sizes,” Bavor says. “So the problem of search in vision is just vastly larger than what we’ve seen with text or even with voice.”
It’s a problem that many others are trying to tackle as well. Facebook, Amazon, and Apple have begun been building their own visual search platforms or acquiring technology companies that analyze photo content. Last February, Pinterest launched its own Lens tool, which lets users search the site using the Pinterest camera. Pinterest Lens also happens to power Samsung’s Bixby Vision. There are smaller competitors too: The AR app platform Blippar can recognize flowers, public faces, and famous landmarks through a smartphone’s camera. Even high schoolers are building “smart lens” apps.
For Google, though, expectations might be higher, given that the company defined online search as the world now knows it. Can it do the same for visual search? More importantly, can it do it without creating visual algorithms that are biased or even downright offensive? The sweater I saw misidentified as an elephant was a benign example, but shows how a seemingly simple object could be mistaken for something else. One advantage text-based queries have is that they tend to be explicit, whereas object or person recognition is still open to a lot of algorithmic misinterpretation.
“A key approach we’ve taken in building Lens is to make the system identify why errors happen, and build improvements that help mitigate those errors,” Chennapragada wrote in an email when I asked what Google was doing to ensure accuracy in visual search. “This is a similar philosophy to what we have done with Search and Autocomplete.” She went on to write that Lens is solving a complex problem as part of a “multi-year journey,” and that it’s hard to recognize and understand the billions of objects in the world.
Still, it’s clear that Google executives are excited about visual search—and not just its potential, but what it can do right now. At Google’s offices the day of the Lens demo, Bavor pulled out his smartphone and showed me a photo he’d taken of a Datsun 1500 Roadster that he spotted from the back of his Lyft ride. “You think about how you’d formulate that query. ‘Old car with round headlights and a big grille and a curvy line on the side and it’s a convertible and it has silver pointy’… What is the query you would even write? And I Lensed it, and oh, it’s the Datsun 1500,” says Bavor. “There’s literally no query I could have written to figure that out.”