I’ve been heads-down wrapping up my AU content, so I thought I’d publish at least some of it here. Here’s the handout for my “AU kick-off” class which is at 8am on the first day of the conference – the morning after the ADN party – so despite the fact the class is currently full, we’ll see if that turns out to be the case. ;-)
VR comes of age
Presenting 2D images that allow the brain to reconstruct 3D has its roots in the early stereoscopes invented in the late 1830s. These were eventually democratized by Sawyers and their iconic View-Master brand, which initially focused on virtual tourism but branched out into entertainment and education. The technology was used by the military during WWII to help train people identifying planes, for instance, as well as being used to help doctors learn anatomy. There are strong parallels between the use of stereoscopes in the 20th century and the use of VR, today.
Static images evolved to be dynamic: the 1950s were the heyday of 3D cinema, and marked the early days of research into VR. Autodesk initially felt the time was right to democratize VR in the late 1980s: we were early innovators in the VR space with the Cyberspace Developers Kit.
For a sense of why this effort ultimately failed, it’s worth considering the Gartner Hype Cycle from 1995 and 2015.
The one from 1995 shows – along with the apparently lamentable state of software that drew splines, back then – that VR was already on its way into the “Trough of Disillusionment”.
The equivalent graphic from 2015 shows a different story…
Here we see VR is climbing gradually onto the “Plateau of Productivity” – although, surprisingly for some, it’s still considered to be 5-10 years from being fully viable.
The drivers that have finally allowed VR to head along the path to viability have their roots in the explosion of smartphone technology: investments in high resolution, low latency displays and low power IMUs (Inertial Measurement Units) have been huge enablers for VR, in general.
It’s this technology that is powering all modern VR efforts, even the ones that are currently tethered. In many the ways the most groundbreaking of these – in that it went a huge way towards democratizing VR – was the August 2012 Kickstarter campaign for Oculus Rift:
The next watershed moment for VR was at Google I/O in July 2014, when Google Cardboard was unveiled. The premise was simple: today’s smartphones have everything you need to do pretty decent VR (and guess what? Moore’s Law says that they’re only going to get better at it). Inserting a smartphone into a simple cardboard case with a couple of plastic lenses brings VR to the masses at the cost of just a few dollars.
Google Cardboard was laughed off by many, at first. Some changed their minds after trying it for the first time: there is something very compelling about the experience you get with close to no effort. For sure there is a way to go, but as smartphones get designed with VR in mind, things will improve quickly.
The first low-cost, Android-powered, dedicated VR headset has just been fully funded on Kickstarter. Powered, of course, by smartphone tech.
Using the View & Data API to implement VR
While I was intrigued by the Oculus Rift – and was happy to try both the DK1 and DK2, when they came out – it was only when Google Cardboard launched that I really became passionate about VR. For me the experience is going to have to be untethered – and ultimately web-based – for VR to be really compelling.
My first prototyping efforts for Google Cardboard started before I had access to a Cardboard: I coded up a simple web-page based on the Autodesk View & Data API, and sent the link across to a colleague in San Francisco to test. You can imagine my delight when I finally had the chance to test it – on the drive from SFO to the city – and found the experience to be genuinely three dimensional.
The initial prototype embedded the View and Data viewing component twice – one for each eye. The trick to creating a 3D effect is to have the camera positioned slightly apart but pointing at the same target. You can either do this with the distance between a human’s eyes – 8 cm or so – or you can do as I did and use a percentage (in my case 4%) of the distance between the camera and the target. This has the advantage of being unit independent and working well for all models.
Here’s the relevant JavaScript function (which you can find as part of this source file) that transfers the camera position from one viewer to the other:
function offsetCameraPos(source, pos, trg, leftToRight) {
// Use a small fraction of the distance for the camera offset
var disp = pos.distanceTo(trg) * 0.04;
// Clone the camera and return its X translated position
var clone = source.autocamCamera.clone();
clone.translateX(leftToRight ? disp : -disp);
return clone.position;
}
The very earliest implementation kept the views in sync by watching for “camera-changed” events on each and then transferring the view to the other view using the above function. For subsequent versions – while the original mode was left in – it made sense to have both views controlled by the tilting of the device.
HTML5 provides a very handy “deviceorientation” event that gives your application alpha, beta & gamma (which can be mapped to roll, pitch & yaw) values for the position of the device. This allows applications to determine the movement of the head – directionally, at least, even if it isn’t enough to provide full head-tracking.
There were a couple of navigation options available: we could have placed the model somewhere in 3D space and then when you happened to look in that direction you’d see it. This is probably better for architectural/site/cityscape navigation… it’s more of a walkthrough model of navigation.
As we’re fairly constrained with our model size, I opted rather for a widget-viewer navigation style: the object is on a virtual turntable and will rotate as you turn your head to the left/right (or up/down). I chose to constrain the rotation to be a turntable effect – effectively limiting the rotation to two axes – because it gave a better user experience than having full 3-axis rotations possible.
Here’s the deviceorientation event handler. Don’t worry about the specifics, just that it distills the device orientation data down to two values that get used to oribit the views themselves:
function orb(e) {
if (!e.alpha || !e.beta || !e.gamma || _updatingLeft || _updatingRight) return;
// Remove our handlers watching for camera updates,
// as we'll make any changes manually
// (we won't actually bother adding them back, afterwards,
// as this means we're in mobile mode and probably inside
// a Google Cardboard holder)
unwatchCameras();
// gamma is the front-to-back in degrees (with
// this screen orientation) with +90/-90 being
// vertical and negative numbers being 'downwards'
// with positive being 'upwards'
var ab = Math.abs(e.beta);
var flipped = (ab < 90 && e.gamma < 0) || (ab > 90 && e.gamma > 0);
var vert = ((flipped ? e.gamma : -e.gamma) + (ab < 90 ? 90 : -90)) * _deg2rad;
// When the orientation changes, reset the base direction
if (_wasFlipped != flipped) {
// If the angle goes below/above the horizontal, we don't
// flip direction (we let it go a bit further)
if (Math.abs(e.gamma) < 45) {
flipped = _wasFlipped;
} else {
// Our base direction allows us to make relative horizontal
// rotations when we rotate left & right
_wasFlipped = flipped;
_baseDir = e.alpha;
}
}
// alpha is the compass direction the device is
// facing in degrees. This equates to the
// left - right rotation in landscape
// orientation (with 0-360 degrees)
var horiz = (e.alpha - _baseDir) * _deg2rad;
// Save the latest horiz and vert values for use in zoom
_lastHoriz = horiz;
_lastVert = vert;
orbitViews(vert, horiz);
}
An interesting secondary – and very positive – effect of choosing this style of navigation is that, while feeling fairly natural, the brain doesn’t try to resolve the view changes with the real world. Which means you really don’t feel “VR sickness”… something typically caused by a phenomenon known as the vergence-accommodation conflict, where the eyes accommodate to the screen but rotate to fix the apparent image.
Once the basic prototype was working, I went and added support for multiple models. Placing these in a separate menu had an additional advantage: in HTML5 it’s not possible to force a page to be viewed full-screen without there being some kind of user-driven event (unless the full-screen API is called from a UI event handler, it doesn’t do anything).
The fact that this prototype uses two separate viewer instances is clearly less than ideal – it would be much better to have a single viewer with a stereo renderer – but the question only really concerns scalability: our prototype works well enough for smaller models, and with a stereo renderer we could presumably work with models that are approaching twice the complexity.
You can try the basic implementation here.
User input for VR
One of the major problems yet to be solved effectively for VR is how to manage user input. It’s clearly not possible to touch the screen of your smartphone while it’s inserted in a Google Cardboard holder, for instance.
People are attempting to address this is a number of ways, whether by measuring the time your gaze spends on a particular object, or attaching external peripherals such as Leap Motion or Kinect to recognize hand gestures or body movements while you’re immersed in a virtual space.
One possibility that I believe to have largely been overlooked is voice: there are straightforward – even web-based – voice recognition APIs that allow you to add voice commands easily to your VR environment. The second major iteration of the VR prototype made use of the Google Speech Recognition API – via a handy JavaScript wrapper called Annyang – to add support for a number of different voice commands:
EXPLODE, COMBINE, zoom IN and OUT, FRONT, BACK, LEFT, RIGHT, TOP, BOTTOM
The code making use of Annyang to implement a series of voice commands can be found here. And here’s a YouTube video showing the implementation in action.
Using the SDKs for Google Cardboard and Gear VR with our web-based viewer
There are various benefits to creating a native Android application for Google Cardboard. For instance, the SDK gives you access to an event telling you when the magnet-based trigger has been pulled (although not all holders come with a magnetic trigger, these days), as well as automatically disabling the display “auto off” feature – something you otherwise will want to do manually when working with pure WebVR.
Other than that, it’s very feasible to have the Cardboard app load your web-page into an Android WebView – assuming the target device is running Android Lollipop or higher, as versions prior to this didn’t have WebGL support from their WebViews.
Here you can see some code inside Android Studio that shows how to respond to the trigger event:
The back() and enter() functions simply call JavaScript functions in the WebView using code such as this:
mWebView.loadUrl("javascript:openSelected()");
This is the general approach for native applications – whether targeting Cardboard or Gear VR – to call into the code behind their embedded web-pages.
Speaking of Gear VR, this is the Cardboard-like device being brought to market by Samsung. It benefits from having a runtime engine developed by Oculus – in fact it’s the way Oculus intends to address the needs of the broader consumer VR market – as well as having some excellent hardware: the first class optics with their 96 degree field of view are really a step up from the standard Cardboard offering.
There are also additional hardware buttons, as well as a separate Bluetooth gamepad:
When using the Oculus Mobile SDK, you can also hook up these UI components to functions in your embedded page’s implementation, much as we saw previously when using the Google Cardboard SDK.
Both SDKs present some interesting possibilities for applications, being more directly tied to a particular hardware platform. Here’s a prototype UI that displays a carrousel of models, which can be selected and opened via the magnetic trigger on Google Cardboard or the touchpad on Gear VR.
Implementing collaborative features in web-based VR
As mentioned previously, one of the big issues with VR relates to user input. Beyond voice, another avenue I explored with some colleagues at another VR Hackathon, earlier in the year, was to enable someone to help guide you through the VR experience.
Imagine…
The architect of your dream home showing your whole family around the 3D model, changing the features – perhaps cosmetic, perhaps even structural – based on your real-time feedback. They control which room you happen to be looking at, but each family member can be looking in the direction that interests them.
Or a doctor showing you a blown up view of an MRI of your knee, highlighting the specifics of your surgical options.
Or a teacher taking their class on a virtual field trip, visiting the Great Wall of China and the bottom of the Pacific Ocean, all in one day. (This last one was announced the week after the Hackathon… expect Google Expeditions to come to a classroom near you, soon!)
We took this concept – and the initial prototype I’d created 6 months before – and fleshed it out during the course of the Hackathon weekend. The result was a master page hosting the full View and Data GUI viewer (with the full A360 UI), than controls an arbitrary number of clients connected to its session. We had initially targeted a single client, but realised that if we did it right we’d get as many clients as we wanted, basically for free.
Here’s the main page:
Clients with a compatible device – pretty much anything running Chrome, although WebGL on iOS continues to be a bit hit or miss – can connect into the session by scanning the dynamically generated QR Code, at which point they’ll see the same model as the master:
An important difference being, you have control over the viewing angle of the model – you can inspect it from whatever direction you please.
On the master page, any changes made by the controller are captured via events on the viewer component:
These then get propagated down to the various viewing clients:
Once the basic communications infrastructure was in place – we used Node.js on the server in conjunction with Socket.io for WebSocket communication between the master and the clients – it was trivial to extend support for the various operations. If you explode on the master or isolate components via the model tree, these changes will be seen on the various clients. The same is even true for sectioning.
The most difficult to implement ended up being zoom: we wanted the master to be able to control the zoom factor on the various clients, but not the view direction. So we had to jump through a few hoops to make that work properly.
One nice feature is the ability to upload your own models (although we set a cap at 2MB, to set expectations appropriately). For instance, you can head on over to the Herman Miller web-site and download models of their chairs (I found the best results, material-wise, with their 3ds Max models):
Here’s the Sayl chair loaded into Vrok-It:
We can see the model loaded onto our various clients, too:
Vrok-It is live, if you want to try it for yourself. You can also find the full source-code for the application posted to GitHub.
A big thank you to Lars Schneider and Oleg Dedkow for their part in developing the Vrok-It application.