Three Screens And A Cloud: Xbox, Windows Phone, and the Future of the Personal Computer

There are plenty of smartphone remote applications. But the Windows Phone 7 does two smart things: It doesn’t just change the channel on the Xbox, but can pull and identify content from it, context-dependent based on what you’re already watching.

If you buy something using links in our stories, we may earn a commission. Learn more.

Last Wednesday I met with Microsoft's Andy Lees, President of the Windows Phone Division, to talk about the company's new smartphones. During our meeting, though, my eyes kept flickering over to the television set running Xbox 360 across the room. It was sporting the new media dashboard that's rolling out in November. And the Mango-powered smartphone in my hand was the perfect remote.

It's not only because Microsoft had just scored a big deal to bring a whole slew of TV content to Xbox, and I'm a remarkably typical television-loving American. I love smartphones, too; I covered the Windows Phone 7 launch event for Wired almost exactly a year ago. But I've always been more interested in how smartphones (including WinPhone7) pull together devices and services. I've also long been fascinated by how Xbox has helped reboot our assumptions for media interfaces in the living room. I've been closely tracking Windows 8's increasingly form-factor agnostic approach to desktop computing. But between the Xbox and the Windows Phone, here were two very different kinds of "personal computers." Both are inherently capable, and even more compelling used together — and there wasn't a traditional Windows machine in sight.

For example, there's a new Windows Phone app for Mango called "Xbox Companion." Now, there are already plenty of smartphone remote applications. But Microsoft's version does two smart things: It doesn't just change the channel on the Xbox, but can pull and identify content from it, context-dependent based on what you're already watching.

Here's a scenario Microsoft's Derek Snyder demonstrated. You're watching an animated film on the Xbox. You can almost-but-not-quite recognize one of the voices. So you pull out your phone to look it up. So far, this is pretty typical of how most people use their smartphones or tablets while watching video. Instead of looking it up on Google or imdb.com, you open up the companion app.

There you find a button for "what's happening now." This automatically pulls up a card with all the top-line information about the movie you're watching on the Xbox, including cast. Each of these entries are in turn hot-linked to a Bing search that finds all related content that can be played on the Xbox — mostly movies, music or television shows. Once you find something here, you can select it to begin playing on screen immediately.

What's more, it doesn't just do this over Wi-Fi, if the Xbox and WinPhone are on the same local network. It can do it completely through the cloud, using the common Windows Live ID on both devices. If the Xbox had 3G/4G cellular data like the Windows Phone does, you wouldn't even need a local router. "It just knows the Xbox and the Xbox knows it," Snyder said.

This phrase caught my attention. Let me explain why I think it's important.

Actor-Network Theory for your living room

This is why identity services, whether through Microsoft, Facebook, Google, Apple, Amazon, Twitter or anyone else, are increasingly key to personal computing. All of the devices are networked communicators (so they need access to services like Skype or FaceTime or Google Talk, as well as your social graph), all of the devices are storefronts for apps, games and media (so they need access to your billing information), and all of them are personalized for each user and synchronized for each account, so they can port media and messaging and even simple information like a channel flip from one device to another.

From an input perspective, that distributed recognition on the network means everything on it hooks together, in a way that's arguably more seamless even than on or between Apple devices. There's no separate look-up or entry from machine to machine, and within each machine, no need to copy and paste from app to app.

("Although we do copy and paste!" Lees told me when I made this observation. "I just wanted to bring that up, you know," he laughed. "I'm sorry we didn't get it into this [first] version [of Windows Phone 7], I'm sorry.")

The point, however, is to make copy-and-paste move from impossible (remember how few smartphones could implement it with any facility just five years ago?) to unnecessary. Once you pair a smartphone and all of its sensors with a sophisticated game console like the Xbox with Kinect and all of its sensors, you have a staggering range of input possibilities: game controllers and traditional remotes, virtualized versions of both on the smartphone, both voice and gestures, whether on the smartphone and Kinect.

When you design the experience for those kinds of input, it quickly becomes impossible to assume file-and-folder-style nested hierarchies. Instead, you open up all sorts of other possibilities to push and pull information from one point to another. You open up touch, voice, gesture, and automated and artificial intelligence of all kinds. And the smartphone becomes a natural focal point for that, because its sensors and communication capabilities are already inherently so rich.

But there's also a bit of sorcery there. It just knows the Xbox and the Xbox knows it. "It knows" without my intervention, except as a first mover and witness. My Live ID — just an electronic impression in the data center — is enough to link the two directly. From a mile-high view, every device on the network is equally an agent/participant, including the user.

Continue reading 'Three Screens And A Cloud' ...

"Three screens and a cloud" was Ray Ozzie's mantra when he was chief software architect at Microsoft. The idea is that desktop, mobile, and entertainment/living room experiences each require their own form factors, tied together by backend services that pull those devices together — and furthermore, that all of them in unison serve the function formerly known as "personal computing."

Ozzie's vision is becoming realized, and not only at Microsoft. Whether you're talking about Apple's new iCloud, Google's push to bring thin-client operating systems to everything from enterprise notebooks to TV sets, or even Amazon's cloud-backed Kindle Fire tablet, you see the same logic at work.

The differences between companies like Microsoft, Google, Amazon, Apple and others can be found mostly in their experimentation with exactly how many screens are viable, and where those devices fit — in your house or office, on or around your body, or in the interstices of your day. Is an iPad-sized tablet a laptop-like computer, a mobile device, a living room machine, or a bit of all three? Where do e-readers and mobile gaming/music devices fit in? Are there still meaningful differences between portable laptops and stationary desktops, even if they run the same OS? How much ought office and enterprise hardware and software resemble the consumer experience, and in which direction(s) should innovation flow? How much computing power (rather than simple sync and storage) do you push from a local machine into the centralized backend? When even RIM's vaunted central services crash, and a demand spike makes Apple's new cloud-based backups and updates fail, how much ought users trust their cloud services for reliability, security or privacy?

But Microsoft, as you might expect, probably exemplifies the "three screens and a cloud" philosophy most ideally. Between the consumer and enterprise markets, it probably has the most robust cloud service, and between Windows 7/8, Windows Phone 7/7.5, and Xbox 360, it has three major screens and operating systems to run on them. All three increasingly borrow UI elements from each other and share backend and development resources, but they also pair and partner well.

I think this is crucial to understanding the metastasis both of the cloud metaphor and the idea of the personal computer as the "digital hub." There is no hub any more, no defined center of activity. Or if there is a center, it's no longer a workstation packed with ports, cables, a big processor and a bigger hard drive.

It's you. Plus the data center — which carries your electronic impression. And every interface in between.

Our multi-screen future

Already, according to Nielsen, 40 percent of smartphone and tablet owners are pairing them with a television. Over time, that ad hoc multi-screen network will become a formal one; Microsoft's is just one of several visions for how that formalization will work.

For instance, this is one reason many of usare particularly excited about the possibilities of Apple's new Siri virtual assistant. It already combines natural input, smart cloud processing, and non-hierarchical, non-app-dependent push and pull of information. It doesn't matter whether the AI moves either beyond the iPhone to work natively on Apple's entire product line, or goes through it, like Microsoft's Windows Phones or the iPhone and iPad remote control applications today, as a secondary input device. Either way, it opens up not just one node in the new personal computing network, but several.

The trick is that because of the cloud, designing for individual form factors — one or more of Microsoft's "three screens" — is both more and less important than ever before. It's more important because it helps determine the interface and targets both our behavior and our time. You could say it's less important than ever before, because all of the devices are more capable, and more of the storage and computing action happens off site.

But, once the devices start communicating effortlessly to each other, then it becomes something else, neither more nor less important, but transformative. Instead of designing for specific devices, you have to design for device networks. Increasingly, the design won't just enable communication between post-PC devices, but will assume it.

The precise nature of those assumptions still need to be worked out. Some of these networks may be partial — not everyone will have every kind of device. Others may be heterogeneous. Right now, most of the synchronicity is between all-Microsoft, all-Apple or all-Google setups. History suggests that if the big companies don't find ways to get their devices talking to each other, users will.

But remember, the first GUI personal computers from Xerox, Apple and Microsoft required redesigning computing for that first "network" of computer, monitor, keyboard, mouse and printer. Those assumptions weren't automatic either. And in fact they're evolving still. We're still finding ways to push the definition of personal computing a little farther forward.

See Also: