A few weeks ago I attended an event at Microsoft Paris, NUIDay 2016. It's a mix of demo sessions, panels and most importantly hands-on time. Alongside trying on a pair of Hololens for a few minutes, I got to chat with developers who had been doing some work for it in recent months. Here are some remarks from the notes I took.
The object itself
The industrial design and build quality were above what I expected to see for this (maybe relatively low volume) 1.0 iteration.
It's a standalone product: the batteries, computer, scanning and projection systems, microphone and speakers are all integrated. Everything you see with them is computed on the device, it's not tethered to anything but the Internet.
It is comfortable to wear for an object of this size, even while wearing prescription glasses like I do. I can't speak for long periods of times, but from what I heard the battery dies before it becomes an itch to wear, around the two hours mark depending on how intensive the tasks you're performing are.
It doesn't seem to heat up that much.
The projection area allowing you to see the holograms is made of two lenses sitting behind the main visor. They allow for a projection field of vision similar to a 30 inches screen placed 50cm in front of you, from what I could estimate. Why couldn't it be wider? The explanation I got is cost. The lenses are supposedly the most expensive parts on the device, and to be able to sell it at an "affordable" price for devs, they couldn't have them any larger.
What you see
This is one of the app I tried: a carousel of holiday season champagne packagings for Moët Hennessy brands (many thanks to the designer and developer who kindly gave their feedback).
Impressed by the presence of the holograms in the environment: they have a real optical density. The light in the room where we were able to try the Hololens was mainly artificial, so it's possible it doesn't work as well in a sunlight bathed environment. None of the holograms I saw cast shadows even in a primitive way. Whether this is a design decision or a computational limitation I didn't think to ask.
The Hololens scans the environment it sees through its Kinect and memorizes it. Holograms are then anchored to a point in this scanned space by the user (all the demos I saw started with this anchoring step). This relation is memorized by the device; come back three months later at the same place, and apparently holograms you left there will be waiting for you.
Views are computed on the go, so if your models are rich (textures, transparency or light refraction for example) they will bring down the frame rate. But not the frame rate of the whole projection, as it would happen if you overwhelmed your computer's graphic card. What positively surprised me is that if you have a simple object and a more complex one side by side in the projection, the former will keep a very fluid frame rate while the later's will drop independently.
This indicates a safer bet on a lean visual expression to preserve a fluid experience for now.
The interaction models used in the different demos were based on 2D paradigms (flat rectangular buttons arranged in column and/or rows) in a display solution providing 3 dimensions. An inheritance from working on screen based interfaces for two decades or more I guess, but that needs to be overcome on if we want to get all the potential these devices offer. You can do anything: make things disappear, scale up or down, move up 4 meters high or out in the distance. There are no excuses not to break free from screen constraints when it comes to information display.
What you do
Gaze: this is a small dot fixed right in front of you that follows the direction of your… gaze. It basically is an augmented reality mouse cursor. I have doubts about it being the adequate way to interact with holographic targets. Kinesthetically, it induces continuous and precise neck movements, depending on target size and proximity when action is required. Apparently it takes an hour to get used to, so muscle memory might help here. That been said, one of the on stage demos started with a missed hit launching the wrong app… We naturally act on three dimensional objects by reaching them with arms and hands; interacting with them through an abstraction layer like the gaze will never feel as natural and right as what humans have been used to for thousands of generations. Also worth considering, this interaction generates a body language that is, well, not putting the user in a favorable disposition. Looking at someone using gaze on the Hololens is like watching someone seeing things, fixing their eyes in the distance while doing small head movements.
Which brings us to the "tap", the way to trigger an active area you've successfully reached with the gaze. There are two ways to perform a tap: with a clicker, or with a gesture. In this demo from Boeing you can see it when they drag palettes across the scene. I called it the quack, as in duck quack. Gaze…quack…gaze…quack…quack. The touchscreen gesture equivalent of this holographic tap would be hovering a finger 5cm over a mobile phone, moving it until it sits over the point to reach, hit the screen in a fast down and up movement, and come back to the initial hovering position. A gesture a bit over acted. Again, it's neither natural nor smart looking to go quack quack when wearing these glasses from the future. Touch screens feel natural to use because we touch the elements we intend to modify and they react instantly under our fingers, as would real matter react. Holograms won't be different, effortless interactions with them will eventually happen when they can react to the presence of our fingers on them.
Last gesture, the bloom: opening your fingers like a flower in front of you invokes a navigation menu. It's a nice touch: easy to perform, not common (making it less likely to trigger the menu inadvertently), and expressive.
The Hololens also has Cortana, so voice commands are available when supported by the apps.
All the holograms have to be modelled and put together in an interactive environment where their role, behaviour and relations to one another are set. Most of the apps I saw relied on the Unity game engine for this last part.
One valuable feedback given by the developper who's been working on this Moet Hennessy app was that the 3D models capital availability is crucial for the pace of these projects. Modelling is very time consuming, so if models can be handed to you, all the better.
Microsoft relying on a third party solutions like Unity or Vuforia to get the Hololens to display anything feels odd, strategically.
Potential of AR
Here's a quick list of situations where AR will help and where VR would be inferior:
- e-commerce (how would that chair fit in the kitchen, how does this coffee machine looks on the counter, will my tablet fit in this bag,…)
- pointing to where I left my keys/bag/phone/charger/…
- way finding and cartography
- IKEA instructions
- first aid gesture assistance
- learning math concepts, chemistry, cell biology, history, computer science, ecology, dance,…
- 4 dimensions data charts
- userflow analytics
To finish these notes, here are some topics I feel will need to be addressed before AR can reach a large public and scale to an economically viable set of products.
To be truly helpful and efficient, projections need to be visible in the user's peripheral vision. You should be able to detect holographic content on your sides, and look at it by moving your eyes while your head stays still. As it is now, it feels as if you need to scan around to be sure you're not missing out something, and everything needs to happen right in front of you for you to notice. Hopefully this is a projection lens size issue money and time will solve.
It needs to allow for more natural postures and gestures to get people feel good about using it, personally and socially. The AR system needs to read and understand our hands as finely and subtly as these two can express meaning: their position and the timing of their movement are equally important. Here are some examples of the LEAP motion being used with a Hololens, pointing in this direction.
Speaking of semantics, some identical gestures have very different meanings around the world. Choosing the right gestures to trigger recurrent events for an AR app or platform is not only selecting a gesture that is meaningful towards the resulting action, but also choosing one compatible with all cultures the product could address. Global brand still make questionable choices with words today, so I expect friction to be found on this topic.
I wonder about the impact of having light sources so close to the eyes on one's vision. I'd wait for advice before engaging in repeated and/or long sessions.
An accessible and easy way to produce 3D elements will be necessary for AR adoption. Google's Sketchup is in a very good spot in this regard. Producing a 3D model from a smartphone camera will also get more valuable for people getting content on AR platforms.
Text readability is on the verge of getting decent on computer screens, and this thing comes along, throwing us back a few decades. Perspective is inherent (try reading this by looking at your screen sideways), pixel density is low. This is a whole area of experimentations and optimisations to be made before a decent experience of reading text can happen in these new spaces.