WebXR: What you can do today & what's coming tomorrow, by Ada Rose Cannon (Samsung Internet)
Hello. My name's Ada Rose Cannon. I'm a developer advocate for the web browser, Samsung Internet.
I'm also co-chair of the W3C Immersive Web Groups. These are the groups that are developing the WebXR APIs that let you work with immersive technologies such as augmented reality and virtual reality. This talk is about WebXR, with a focus on what you can build today with the existing APIs and also what's coming in the near future. The WebXR Groups were formed in 2018 to develop the WebXR API, with the goal of bringing virtual and augmented reality features to the web. WebXR is a successor to the early experimental WebVR API. which was aimed at just VR headset.
By the end of 2019, WebXR had wide support for virtual reality. This enabled us to deprecate the previously experimental WebVR API. Shortly after that, the augmented reality feature started to land, which let it work with augmented reality headsets and also mobile phones.
And since then, work has continued on many new features. These works are being done in parallel, enabling many features to be developed at the same time. WebXR at its core is an API for accessing the hardware features of immersive technologies. So WebXR usage is heavily tied to the popularity of XR deivces, for example, the Oculus Quest 2.
The chart below tracks the usage of virtual reality over the past year. And on October last year, the Oculus Quest 2 was released. This was a very popular headset, and you can see the uptaking usage of the WebXR Device API as more people buy this hardware. Although as developer advocates and people who work in standards, we tried to push the development and adoption, of some of the new, more advanced, very exciting features. But surprisingly, it's some of the simplest hardware and simplest experiences would have had the widest adoption. Hardware-wise, handset-based hardware is the most commonly used.
So the simple cardboard headset, which has support to the WebXR API on Android, are some of the most common use users of virtual reality. And phone-based augmented reality is the most widely available form of augmented reality you can get today. In addition of these experiences, the experiences which are most built today and most popularly used are simple 360 or 360 3D videos, which work great for simple immersive hardware, even though the WebXR Device API is able to support much more powerful and immersive experiences.
The most powerful form of XR we see today is that which is powered by desktop computers. Currently, this is a more minimal use case. But as OpenXR gains more support, the WebXR implementation is built on top of it, will also gain support. And hopefully also gain more popularity. The main topic I want to talk about today are the features you can use today and how you can use them.
So for the listed features, and we'll talk a little bit about the feature and what it does. and then I'm gonna show a code sample of how to use that. So hopefully this will help you get started working with the API.
So firstly, WebXR Core. This was the first WebXR module that was built. And it's the core module which is designed to be plugged into by all the other modules and act as a base for them. This modular approach is really important, because WebXR technology is still in its infancy, and evolving very rapidly, at least compared to the Web, which does evolve pretty slowly. To use WebXR, You use the xr object on window.navigator. This lets you detect if a session is supported.
And then if it is, you can then use that to request particular session you want. The WebXR core only supports virtual reality. Augmented reality came later, in the augmented reality module. The AR module is very simple.
It allows WebXR scenes to display the virtual content they've rendered with WebXR on top of a pass through camera or a transparent display. It doesn't have any of the advanced, augmented reality features in it. Those are additional WebXR modules, many of which I'll talk about during this talk.
Using augmented reality is very similar to virtual reality. But instead of using an immersive-vr session, you use an immersive AR session. Because of the similarities between the code passed to use for virtual reality and augmented reality, it's very simple to design a single experience that works with both VR and AR, letting you reach the widest possible audience.
Next, I want to talk about DOM Overlays. DOM overlays are really interesting, because they let you use some HTML in the scene. Because right now WebXR is entirely based around WebGL, with no in-built support for HTML or CSS. So this means any interfaces you want to build, you have to construct entirely out of WebGL primitives, DOM overlays let you use HTML and CSS as if they were overlaid full screen on the window that currently only works for handheld augmented reality, because that's the only situation where it makes sense. The way it works is that when starting your session, you request the dom overlay feature.
And you tell it the element, would you want to be made full screen over your scene. This effectively gives you three layers on your hand-held AR. So like a smart phone, so on the top, you have the HTML and CSS, displayed full screen from whatever element you picked.
Below that, you have your WebGL content, which is being rendered by WebGL and sent to the WebXR Device API. And below that, you have the camera feed from the device. This may seem simple and perhaps limited, but it's definitely a start. Because using HTML and CSS in WebXR, it's a complicated problem to solve.
The next feature we have is one for both virtual reality and augmented reality. And this is gamepads. So it lets you access the associated gamepad object for a WebXR input source. The WebXR input source is any kind of controller used to access the scene, and different devices of different kinds of controllers. So some actually have real pieces of hardware, kind of like a joystick that you hold in one hand, with some buttons on it. And it can detect the position and rotation of this.
Anyway, all of the buttons on these objects will be exposed as a gamepad. So you're going to take button pressures and joystick movements. And the API is almost identical to the existing gamepad API, so it should be very familiar to work with. To get the gamepad information, you look over the input sources from the session.
And if the input source has a gamepad object, that means that is probably a real piece of physical hardware with buttons on it. And the state of those buttons can be read using the gamepad object itself. Next we have Hand Input. Because some headset don't use controller at all, and use cameras to look at your hands and work out their position, and use that to control the scene. Some headsets have a mixture of hands or gamepads, and will switch between them depending on what the user is doing. And you could even do that on the fly.
So to tell if an input source is a real human hand, then like with gamepads, you look over your input sources. And then if your source has a hand object, then that's gonna be all the information need about the hand. And what this is is a way to access the position of each joint in the hand, which you can use to animate a 3D model of the user's hands, or use it to interact with the user's environment.
Using the user's hands to interact with the scene, so they can grab stuff or poke stuff or point at things. It's a fantastic way for users to interact with the virtual environment, and a really great thing to support. Next we have the Hit Test API. This one's for augmented reality. It lets your augmented reality scene cast a ray out into the real world geometry. So for example, in this GIF, the ray's reaching out and is hitting this picture.
It then gives me the position in 3D space, for where the ray intersected. And then I can place objects onto that place. To use it, it's a little bit complicated. So I have to talk about a little bit about spaces. So each piece of hardware, or each item, the WebXR system is detecting has some kind of space associated with it. For example, input sources have a targetRaySpace and a gripSpace.
GripSpace is the place you would put something if the user was holding it. And the targetRaySpace is a space for if there was something being launched out the front of the controller. So in this case, we are firing a ray out of the front of the controller. And we have to declare this beforehand, because it needs to be ready for us to use when the frame lands. It can't be calculated on the fly.
So this is why you set up your hit test source in advance. On each frame, you request the hitTestResults from each hit test source, where it's already worked out the position of the hit test. You can then compare this position into the reference space you're using for your placing of 3D objects. This gives you its position, its orientation of this object in your virtual environment.
And that's where you can place the 3D object. This may seem particularly complicated. But once you get used to the concept of spaces, it's not so bad. The next thing I want to talk about is Layers.
And Layers are incredibly useful. What they do is that they let you display static images and video elements in WebXR. Initially, this may seem not very useful. After all, we can already display images and videos using WebGL.
So we have no need for it in WebXR. But it has one big advantage: By letting the WebXR Device API handle it itself, you gain temporary projection. So what this means is that text appears really crisp and clear. There's no fuzziness around it.
It gives a really clear image to the user. This is perfect for things like videos. It means you also don't have to continually be copying the video texture into the graphics. The browser will take care of that for you. This makes it much more performant, letting you use these precious cycles for other bits in your scene. If you have to do any kind of text rendering, Layers is where you want to do it.
Because viewing text on a WebGL texture in WebXR, is a really poor experience. But by putting it on to a WebXR Layer, it makes it significantly better. And because of this, in the future, Layers may act as the basis for using DOM content in WebXR. But that's still very new and something that might come in the future.
But I'll talk about that later. To use WebXR Layers, you use the XRMediaBinding. So if you create a new MediaBinding, and then you have a video element in your DOM.
And you tell the MediaBinding to create a shape. So there's… you can use a plane, a cylinder, or a sphere. You give it the content, such as a particular video element. And you give it the shape, and you tell the information about whether it's a 3D video or not. It then takes this information and use it to display it in your scene. But you should be aware that it is displayed on top of any other content.
The final API I wanna talk about and going into detail about is Light Estimation. Now this is really new. This actually just landed in Samsung Internet. Today, like the day I'm recording this. You can try it out in the 15.0, Samsung Internet Beta. And I recommend giving it a go because it's very cool.
It's a feature for augmented reality. And so the way it works is that as your AR scene is running, the AI uses computer vision to build up a rough picture of the 3D environment you're in, where the light sources are and the way the light is coming in. And even give you a… …and even give you a cube map you can use for reflections. So you can make objects appear with the correct lighting in your scene, And you can even have shiny objects, reflects, the real environment. So you can have virtual objects with reflection of the real enviroment. And that's incredibly cool.
It's very, very exciting. You use light estimation by requesting a lightProbe. You can then use your WebGL binding to generate a cube map texture from that lightProbe. And then you can use that as a texture for your very 3D objects. When the "reflectionchange" event happens, you can then update that cube map.
You can also use this lightProbe to get the spherical harmonics of the light in your scene, which are designed to work out the box with ThreeJS. So that's really cool. The next group of features I really wanted to talk about are still in development So these may not land in browsers at all. Or if they do, the APIs may look very different to how they look today. So because of this, I won't show you any code samples.
But If you want to try them out, take a look at the repositories for them in the immersive web Github. Here you'll have information how to use it. They should have explanans that you can read through to see how the code works. And hopefully, you can find out where they're available.
The first feature I want to talk about is Depth Sensing. So Depth Sensing is the first step to writing believable interactions with the real world environment. And what it does, it provides a continuous depth map of what the camera can see, to your code. So this enables virtual CObjects to interact with the real world. For example, you could use the depth map to do stuff with your physics engine. Or you could have a virtual object, such as a virtual cartoon character, walking behind the table legs of a table, or walking behind another person, or disappearing behind a wall.
So this is a really… this gives a really believable sense that the object is part of the scene, rather than always being placed on top of it. So the use of Anchors is a very subtle one, but it's extremely powerful and very useful for giving a good, believable experience. And the problem it solves is this: So, augmented reality hardware doesn't know everything about the space your end when it first starts.
The longer the session runs, the more information it gains, and it builds up a better information of the environment. But what this means is that sometimes, the assumptions, it makes are incorrect. So it might think a table is smaller or larger than it is.
So when it corrects for this, objects you've placed on that table, will now be out of position, because the table is in a different size than it was expected. But, if you were to place an anchor on the table, then it would associate that anchor, with some of the nearby physical features, for example, the corner of a table or its edge or something like that, or maybe a pattern on the table. And so by having these mark, so by having these anchors which are associated by a physical position, when the scene updates, the anchors may move around in relation to the like (0, 0) coordinate of the scene.
But they will stay in the correct place, relative to the real physical environment, which is the exactly the behavior you want. The other thing that's really exciting about this is that anchors in theory can be stored. and be made persistent. So that you could place an object, and you could come back later and work on it. The persistent features are something that's definitely wouldn't be happening in the initial release of anchors.
But it's something which it could be used to develop on in the future. Another future feature, which is very exciting, is sharing anchors. So If you can share an anchor with another person, their device can also use the same physical descriptions to place the object in the correct place. So this lets two people interact with the same object at the same time, which I think it's very, very cool. But I want to reiterate, the work that has currently being done in anchors, it's only for doing the attaching virtual content to real spaces.
The sharing and the persistence is something that hasn't even begun to be worked on yet. The next topic I want to talk about is computer vision. So this is the general topic of letting the computer see the environment. And there are two ways we kind of want to take this.
And users probably don't want to do that and that's a very scary capability to tell users a website can do. So that's kind of like a… a bit of a dodgy situation. So we're trying to work out the best way that we can balance this. The alternative is a less risky but less powerful solution. So we expose the native shape detection abilities of augmented reality hardware.
So it can detect pop common use cases, such as QR codes, text, faces, certain images, so you can do image markers and stuff like that. This is also a popular use case because it will solve what many people want to do with computer vision, and is a lot less dangerous to use this, but it doesn't solve every problem. So we'll probably end up having both implemented the other way things are going.
Like one which is for tracking images and things like that, and one for giving raw camera access, which is a lot more dangerous, and we'll have the appropriate warnings to use this. But right now, it's still early days. So the form of these take may differ. Next, we have Real World Geometry. This is very similar to depth sensing.
But instead of just giving you a depth map of the world, we give you more usable, processed information. So you'll get, like a 3D mesh of the environment, which you could feed directly into your physics simulation. Or we tell you where planes for various objects are, so you can do better physics or occlusion. So yeah, that was… this is an interesting one, similar use cases to depth sensing, But a very different approach.
But also some use cases, much more prefer real geometry, and some much prefer depth sensing. So we're probably gonna end up with both of these eventually. Next, we have a geographic alignment. So, this one's kind of cool. So you can use the real world coordinate systems in your augmented reality scene. So you could place a 3D object on a real location.
So if you're somewhere like a museum, or a place of geographic interest, or a historical monument. You could place objects around it, so that users can find it and interact with it together by traveling in the real world. And that's really cool. There is lots of technical issues with this and also privacy issues. But it is a popular feature for some very powerful experiences. So this one is very exciting.
So here I've saved the best for last, Navigation, the ability to navigate from one web page to another whilst remaining in virtual reality. Many consider this like the most fundamental feature you'll need to make the metaphors. The… the slang name for like… the imaginary internet that's made of virtual reality, the very sci-fi concept. And it is incredibly cool and very interesting. Unfortunately, it does have some huge security issues, because when you're in virtual reality, you can't see any of the browser information.
So you can't tell what… you are on your own. And even if you could see the browser, you can't tell if it's the real browser or a fake browser being rendered by the WebXR environment. So this is a very difficult security problem to solve.
And right now, navigation is currently being experimented within the Oculus browser for only traveling between websites, for only traveling between web pages, in the same first-party domain. So for example, you go from mywebsite.com/page1 to mywebsite.com/page2.
And a really nice thing about this is that you can have multiple people building different kinds of WebXR experiences on the same domain. And then you can move in between them very simply, without needing to develop some kind of switching mechanism, which is really nice and I really hope that we can solve these problems with navigation to make it work from traveling between any website. But this is very cool and very exciting. And I look forward to more features that get developed for the WebXR. And the very final thing I want to talk about is the next steps for WebXR. So one of our current really high priority tasks is to take some of the more completed modules such as WebXR at the core, WebXR Core, the augmented reality module, the gamepad module, and some of the other really stable ones, we want to take them to candidate recommendation.
The other focus is to bring more web features to WebXR, so features that allow to access DOM content and do web-like things such as navigation. So that's really important for us. And the other thing that's really important for us is developing the advanced XR features These features are coming out all the time as XR technology evolves and develops.
People are gonna want to use them in WebXR as well. So we have to make sure that WebXR is always ready to work on the next new thing. So thank you so much for listening. I hope this has inspired you to check out WebXR and give it a go.
It's built into many 3D libraries such as ThreeJS or AFrame or BabylonJS, or PlayCanvas. so If you want to have a go at working with any of these, check out the website, immerseweb.dev, which has some getting started guides with a bunch of different frameworks. If you want to take part in the immersive web standardization effort, you can check out our work on Github.
It's all done publicly in the open, at github.com/immerse-web. You can subscribe to the immersivewebweekly newsletter that comes out on a Tuesday. Or you can join the Immersive Web Community Group or Working Group, where these standards are developed. Thank you so much for listening. I've been Ada Rose Cannon from Samsung, and I hope you enjoyed the talk.