Google teased translation glasses at last week's Google I/O developer conference, holding out the promise that you can one day talk with someone speaking in a foreign language, and see the English translation in your glasses.
Company execs demonstrated the glasses in a video; it showed not only “closed captioning” — real-time text spelling out in the same language what another person is saying — but also translation to and from English and Mandarin or Spanish, enabling people speaking two different languages to carry on a conversation while also letting hearing-impaired users see what others are saying to them.
As Google Translate hardware, the glasses would solve a major pain point with using Google Translate, which is: If you use audio translation, the translation audio steps on the real-time conversation. By presenting translation visually, you could follow conversations much more easily and naturally.
Unlike Google Glass, the translation-glasses prototype is augmented reality (AR), too. Let me explain what I mean.
Augmented reality happens when a device captures data from the world and, based on its recognition of what that data means, adds information to it that’s available to the user.
Google Glass was not augmented reality — it was a heads-up display. The only contextual or environmental awareness it could deal with was location. Based on location, it could give turn-by-turn directions or location-based reminders. But it couldn’t normally harvest visual or audio data, then return to the user information about what they were seeing or hearing.
Google’s translation glasses are, in fact, AR by essentially taking audio data from the environment and returning to the user a transcript of what’s being said in the language of choice.
Audience members and the tech press reported on the translation function as the exclusive application for these glasses without any analytical or critical exploration, as far as I could tell. The most glaring fact that should have been mentioned in every report is that translation is just an arbitrary choice for processing audio data in the cloud. There's so much more the glasses could do!
They could easily process any audio for any application and return any text or any audio to be consumed by the wearer. Isn’t that obvious?
In reality, the hardware sends noise to the cloud, and displays whatever text the cloud sends back. That’s all the glasses do. Send noise. Receive and display text.
The applications for processing audio and returning actionable or informational contextual information are practically unlimited. The glasses could send any noise, and then display any text returned from the remote application.
The noise could even be encoded, like an old-time modem. A noise-generating device or smartphone app could send R2D2-like beeps and whistles, which could be processed in the cloud like an audio QR code which, once interpreted by servers, could return any information to be displayed on the glasses. This text could be instructions for operating equipment. It could be information about a specific artifact in a museum. It could be information about a specific product in a store.
These are the kinds of applications we’ll be waiting for visual AR to deliver in five years or more. In the interim, most of it could be done with audio.
One obviously powerful use for Google’s “translation glasses” would be to use them with Google Assistant. It would be just like using a smart display with Google Assistant — a home appliance that delivers visual data, along with the normal audio data, from Google Assistant queries. But that visual data would be available in your glasses, hands-free, no matter where you are. (That would be a heads-up display application, rather than AR.)
But imagine if the “translation glasses” were paired with a smartphone. With permission granted by others, Bluetooth transmissions of contact data could display (on the glasses) who you’re talking to at a business event, and also your history with them.
Why the tech press broke Google Glass
Google Glass critics slammed the product, mainly for two reasons. First, a forward-facing camera mounted on the headset made people uncomfortable. If you were talking to a Google Glass wearer, the camera was pointed right at you, making you wonder if you were being recorded. (Google didn’t say whether their “translation glasses” would have a camera, but the prototype didn’t have one.)
Second, the excessive and conspicuous hardware made wearers look like cyborgs.
The combination of these two hardware transgressions led critics to assert that Google Glass was simply not socially acceptable in polite company.
Google’s “translation glasses,” on the other hand, neither have a camera nor do they look like cyborg implants — they look pretty much like ordinary glasses. And the text visible to the wearer is not visible to the person they’re talking to. It just looks like they’re making eye contact.
The sole remaining point of social unacceptability for Google’s “translation glasses” hardware is the fact that Google would be essentially “recording” the words of others without permission, uploading them to the cloud for translation, and presumably retaining those recordings as it does with other voice-related products.
Still, the fact is that augmented reality and even heads-up displays are super compelling, if only makers can get the feature set right. Someday, we’ll have full visual AR in ordinary-looking glasses. In the meantime, the right AR glasses would have the following features:
- They look like regular glasses.
- They can accept prescription lenses.
- They have no camera.
- They process audio with AI and return data via text.
- and they offer assistant functionality, returning results with text.
To date, there is no such product. But Google demonstrated it has the technology to do it.
While language captioning and translation might be the most compelling feature, it is — or should be — just a Trojan Horse for many other compelling business applications as well.
Google hasn’t announced when — or even if — “translate glasses” will ship as a commercial product. But if Google doesn’t make them, someone else will, and it will prove a killer category for business users.
The ability for ordinary glasses to give you access to the visual results of AI interpretation of whom and what you hear, plus visual and audio results of assistant queries, would be a total game changer.
We’re in an awkward period in the development of technology where AR applications mainly exist as smartphone apps (where they don’t belong) while we wait for mobile, socially acceptable AR glasses that are many years in the future.
In the interim, the solution is clear: We need audio-centric AR glasses that capture sound and display words.
That's just what Google demonstrated.
No comments:
Post a Comment