How Owlchemy Adapted Its VR Titles for Apple Vision Pro

Game Developer Deep Dives is an ongoing series that aims to shed light on specific design, artistic, or technical features within a video game, to show how seemingly simple, fundamental design decisions are actually not so simple at all.

Previous episodes covered topics such as How GOG Perfected the Imperfect with the Re-release of Alpha Protocol, How Ishtar Games Designed a New Race of Dwarves in The Last SpellAND How Krillbite Studio Created a Tasty Food Prep Experience in Fruit Bus.

In this edition, the team at Owlchemy Labs detail the technical challenges they faced in porting their VR titles to the Apple Vision Pro.

The launch of the Apple Vision Pro in February marked a milestone in the VR community, being the first six-degrees-of-freedom headset to ship without controllers.

Senior platform engineer Phillip Johnson is happy to explain how we brought Work Simulator AND Vacation Simulator to the Apple Vision Pro. We’ll go over the techniques we used to implement hand tracking and the challenges we faced with the shader and audio systems. By sharing our experience, we hope to see more amazing, fully immersive titles come to the visionOS platform.

Hand tracking at 30Hz in a 90Hz game

Probably the biggest challenge we faced during the entire production of this port was compensating for the 30Hz manual tracking refresh. Both Work Simulator AND Vacation Simulator are deeply interactive experiences. Updating hand poses only once every three frames had some consequences when we first started working on these ports. Grabbing and throwing objects was nearly impossible. The speed of the hands would be exaggerated, causing destructible objects like plates to break in our hands. We also regularly lost tracking when looking away from our hands. Our titles were unplayable and it was unclear when an update would come, so our team set out to fix our hand tracking issues with what was available at the time.

Senior Gameplay Engineer Greg Tamargo ponders how to make hand tracking smoother through extrapolation.

Since hand tracking updates at 30Hz while the rest of the game updates at 90Hz, for every frame containing hand pose data, there would be at least two frames without updated data. Because of this, we had to modify the Unity XR VisionOS package to know if the data was “fresh” or “stale” and compensate accordingly. We found that simply covering up the “stale” frames by blending in the most recent “fresh” hand poses was too slow and felt unresponsive, so we opted to use extrapolation to predict where the hands would be while estimating where the next “fresh” hand pose would be before it happens. Anyone with experience programming online multiplayer games should be familiar with this technique. By keeping track of at least two recent fresh hand poses, we can calculate a velocity and angular velocity and use that to infer what the pose might be, given how much time has passed since the most recent frame of fresh data. Implementing this technique has created a huge improvement in the functionality and feel of the game. It's also worth noting that by tracking additional frames of new hand data, we can create more complex extrapolations to predict where the next hand pose data will be, but when we implemented this feature, it wasn't immediately obvious whether it would actually further improve the feel of the game.

However, simply implementing this pose extrapolation on the hand wrist poses yielded a huge improvement. When we attempted to continue this approach with the rest of the hand data to smooth out the motion of each individual finger, the results were much less promising. So, we decided to try something else to smooth out the poses of all the finger joints.

4Office_2560x1440.png

Marc Huet, senior systems engineer and handtracking specialist, provides some insights into the decisions we made regarding poses.

For hand poses, we wanted to avoid the possibility of creating unnatural poses, so we tried to work with real pose data from the device as much as possible, rather than generating our own.

To accommodate the low update rate, we introduced a delay so that we could interpolate joint rotations between the two most recent poses while waiting for the next one. To ensure that this delay does not negatively impact gameplay, we only use the most recently received pose when detecting actions like “grab” and “release”, while the smoothed pose is reserved for presentation via the hand model.

We also take a conservative approach to filling gaps when individual joints lose tracking. Instead of attempting to create new data for the missing joint with IK, we copy the missing parent-child relationships from the previous pose while leaving all parent-child relationships along the chain intact (see graph).

Both of these techniques were made significantly easier by memorizing and working with the orientations of the joints relative to the parent joint, rather than relative to the wrist or the origin of the world.

OwlchemyLabs_GiuntiTracciati.PNG

Apple has since announced that it will support 90Hz hands-free in the VisionOS 2.0 update, and we’ll be sure to update our content as soon as the update becomes available.

Creating shaders and jokes

Unity compiles and caches shaders the first time they are displayed. This compilation causes brief frame rate drops, which is unacceptable on spatial platforms as it causes motion sickness. Due to the spatial nature of VisionOS, there are some restrictions that force us to rethink how and when we can create shaders. VisionOS requires its applications to draw a frame every two seconds or the app will kill; this makes sense in a spatial environment where users may have multiple applications running, but in a game it is common to hide shader builds during loading sequences. With the two second restriction, we were unable to use the standard shader creation process, so we had to develop a new method from scratch.

Our lead graphics engineer, Ben Hopkins, pioneered our solution. To properly build shaders, we needed every unique shader variant and vertex layout that would be rendered once, off-screen, during the boot sequence. To do this, we developed a simple tool that would collect and log vertex layouts from every mesh in the game. These logs would feed into our warm-up system, where players would encounter a large shader warm-up the first time they ran Vacation Simulator. The sequence would dynamically create a quad for each vertex layout and run our shader variants through each one. Admittedly, it takes a painful three or four minutes to complete, so we tried to soften the experience a bit with the best jokes the porting team could write in an hour to keep the player occupied. Once the shaders are created, the game will launch immediately.

IMG_0022.PNG

Spatialization

Daniel Perry, Audio Director at Owlchemy Labs, explains how we solved the audio issues with our visionOS ports.

The biggest challenge we had to solve in audio is that Fully Immersive didn't have access to the Apple Spatializer in Unity, and spatialized audio is critical to our experiences in bringing out the environment and creating a vivid, responsive sound field. We had to find a solution that was compatible with the architecture of both. Work Simulator AND Vacation SimulatorApple has PHASE (Physical Audio Spatialization Engine) that works with Unity, but using it would require significant changes to our audio stream, including routing, processing, and file loading capabilities.

Currently, the market is still lacking in spatial solutions for Unity and most of the existing ones do not support VisionOS.

The Resonance Audio spatializer is open source and cross-platform, but has been low maintenance for a while and has not been compiled for VisionOS. Fortunately, because the source code is available, we were able to modify it so that it can be compiled for VisionOS.

Due to Resonance's limited routing approach, we had to create a custom solution for Reverb. For performance reasons on mobile platforms, we've always used a minimal amount of simple reverb algorithms with presets for different rooms and environments, and multiple audio mixer groups to sum the effects in-game. While we couldn't replicate all the effects in the audio mixer group chain, it was critical to maintain the overall atmosphere and feel of the world, so we created our own solution of a pre-spatialized send/receive system that sends audio from all audio sources to summed streaming audio sources that are then sent to a non-spatialized reverb AudioMixer.

While not the ideal processing order, it allowed us to use Resonance and still get some similar capabilities for post-processing groups and maintain an overall similarity to our game on other platforms, while still maintaining optimized audio processing performance. Resonance ended up being a bit more compatible with the way our audio system is structured.

Conclusion

When we started porting to Apple Vision Pro, we had no idea if the issues holding us back from launching would be resolved in a month or a year, but we knew we wanted to be there as soon as possible. Apple shares our passion for hand-tracking-only experiences, as we believe they are more accessible to a mainstream audience. Because of our ability to build our own tools to solve some of our own issues, our titles were able to launch on Apple Vision Pro months before the VisionOS 2.0 update. We are proud of the work we have done porting Work Simulator AND Vacation Simulator on VisionOS and we can't wait for new players to experience our award-winning titles.

Leave a Comment

url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url