Tea, Earl Grey, Lukewarm – Alasdair Watson

Bookmarking a link to an article in praise of Amazon’s Echo got me thinking about voice controlled computing, where it is now, where it’s going, it’s upsides and it’s flaws.

My major experience of it is with Apple’s Siri – via watch, phone, and TV. And I’ve found that I do specific tasks with Siri on each one. On the watch, I turn my lights out at night. (Yes, that much of a nerd.) On the phone, I set reminders and alarms. On the TV, I skip back 30 seconds, or sometimes a minute.

Yes, I really do just those specific things. Mostly because those are the things I’m confident with – I know they work, they do so reliably and are useful. Other things I’ve tried tend to either be hit-and-miss, or the technology just isn’t quite there yet.

Here’s an example: if I could say “Hey Siri, play Songs of Separation from the laptop via the speakers in the living room”, or “Hey Siri skip back 20 mins, then start playing last night’s audiobook on the bedroom speaker”, those I’d use a lot. They’re convenience tasks, which is what voice feels most natural for – things you want to do while also doing something else.

But those are currently too complicated for Siri – it simply doesn’t know enough (and I can’t teach it) to identify my home audio equipment, it’s not contextually aware enough, and it’s not deeply integrated with 3rd party apps (or rather, 3rd party apps aren’t deeply enough integrated with it), and it’s not good enough at parsing my specific speech – I have to speak in a way it understands, rather than it simply understanding how I speak. Even just being able to say to my phone/watch “Hey siri do X on the laptop” (eg: “open this link on the laptop for when I get home”) would be useful, but we’re still a while away from that.

So where do I see this all going?

An optimistic view: this feels very like the web felt in 1995/6. There’s definitely something here, and we’re a few years off it starting to get serious. Another few years after that, it’ll be everywhere, and ten years after that, we’ll barely remember a time when it wasn’t commonplace. My nieces are going to grow up shouting at their household appliances.

A less optimistic view: this isn’t the web. It’s not a set of interoperable standards that anyone can hook into, it’s a load of low-walled gardens. Siri works with Apple products. Cortana with Microsoft. Google have something or other, Facebook something else. They all talk to the everyday web to some extent, but for more advanced interactions, they only talk to some products provided by individually selected third parties, for commercial reasons.

The only one that makes me even a little optimistic is the Amazon one, Echo, because I can write my own back end for it – so Amazon do the speech to command translation, and fire something at my code, which does something in response. The problem is that that’s great for the web, but less useful for the home appliances.

On that front, Siri, and Apple’s homekit have a slight lead, but not much of one – it’s a closed system that requires hardware developers to work directly with Apple. We’re going to wind up with several sets of products, that don’t entirely work well together. It’s going to be the vocal equivalent of trying get a PC and Mac networked together in the early nineties, and we’re going to be stuck there for a decade and more.

An even less optimistic view: the idea that our homes are going to be full of internet-connected passive listening devices is unbelievably creepy, and a recipe for disaster – either from state surveillance or hacking (or more likely both at the same time). You could not pay me to put a Google or Facebook listener in my house – they’ll only make money off it by selling what it hears to advertisers, and I’m not up for that. I don’t like my web browsing being tracked, I am not having my casual conversation around the house being tracked.

Amazon – not sure, they’re at least a company who I pay for things, rather than a company that sell me to advertisers. Apple – most likely, they’re pretty good on privacy and encryption (assuming the US government doesn’t fuck that up for all of us), but they’re also the least interested in giving me something I can write my own commands for that exist separate to their devices. Either way: it’ll remain a problem that’s about 40% solved for ages.

I suspect we’ll get a little bit of all the above. The actual utility of voice control is only going to be better from here but it’ll remain fractured, so you’ll be forced to pick a product family to set up in your home, and when you make that choice, you’ll be able to choose between a cheap surveillance bot that sells your data, or a much more expensive suite of homeware that also come with the luxury of privacy.

Leave a Reply Cancel reply