If you have read my previous posts, you’ll know that I am a fan of cheap, accessible technology. So far, I have explained why I believe it’s important and where it can lead us, but I haven’t covered the work I have done to push that future. Today, I will show one of the more recent experiments I have done to test low-cost tracking.
A couple of months ago, Oculus released the Oculus Go, a standalone VR device for about $200. It requires no external phone (aside from initial setup and management), so it is one of the cheapest consumer VR products to date. However, low costs can come with limitations: in the Go’s case, we only get head rotation (3DOF) tracking, instead of rotation and positional (6DOF) tracking. Most positional tracking systems can get quite expensive! Depending on your system, position tracking can cost anywhere from several hundred dollars to several thousands, and even more. This makes sense, as we need robust positional tracking to prevent motion sickness while in VR.
But does it always need to be expensive?
Is there a solution out there that uses nothing but a stock webcam and a sheet of paper?
Hint: I would not be writing this article if there wasn’t.
When analyzing the problem in front of us, we decided that most positional tracking systems were overkill. They try to allow for users to move in a relatively large space, using specialized equipment to try to monitor every single millimeter within the space. This is usually the right call, as dead zones in your VR space can quickly kill the experience for the user. However, we imagined a more constrained scenario. Imagine I am sitting at my desk and working in VR there. I do not need to move or look around too much, but having a bit of positional flexibility can allow for a deeper sense of immersion. For example, a user could look around a digital clay model that they are sculpting to gain new perspectives on the model, or two people video conferencing could take advantage of the subtle head movements that would otherwise be lost in a simple 3DOF tracking system.
Our initial experiments tested a tried and true system. If you have spent much time working in AR, you have probably seen tracking codes. They look like this:
There are many established algorithms to track these codes in terms of their position. While they are not very robust for orientation, we are okay with that because the Oculus Go already gives us very high quality rotation tracking. Using OpenCV’s Aruco library, we were able to write a quick program that tracked the code pictured above.
This worked well… under certain conditions. It required there to be a combination of good computer hardware, a quality camera, and proper lighting conditions. The main issue we ran into when any of those conditions were not met was motion blur. The algorithms we used could not detect a moving tracker consistently. Since we rely entirely on this tracking algorithm to get our position, and have no position data otherwise, this meant that every untracked frame would freeze the user’s head position! From a VR user experience perspective, this was completely unacceptable.
We began to discuss potential alternatives. Should we look towards de-blurring algorithms? Turn towards the world of machine learning? Both of those are certainly possibilities, and ones we may pursue later. However, we were trying to get a solution together quickly as a proof of concept. This is where we remembered our specific conditions: we have a single user sitting in a small space with limited head movement. This meant that even AR codes were overkill; we only need to track one thing!
So, what is even easier to track than an AR code? What fundamental shape is incredibly easy for a computer to track (and also very easy to design)? This is what we came up with:
Yes, that is just a black circle. So, how does it work?
We started with OpenCV’s method to find ellipses detailed here. We needed to track ellipses and not just circles, as the angle of the user’s head could deform the printout circle into an ellipse. This, of course, produced many false positives. Elliptical contours exist everywhere! Even blobby shadows that were vaguely circular were being reported as the tracked dot. So, we needed a set of rules to find the right circle:
- The circle and its contents should be as dark as possible. This filtered out many circles that existed in the scene that were other colors. We did experiment with other colors, as black is fairly common, but we discovered that black performed the best in many different lighting conditions.
- The circle must be within a certain size range. A circle that is too small would be too far away for our application (remember the user should be sitting at their desk), and similarly a circle that is too large means the tracker is too close. In both cases, it is probably not the circle that we are looking for.
Applying these constraints, we were able to track the circle’s position quite well! From here, we could use the camera’s intrinsic properties such as field of view and resolution to convert our circle position from screen space to world space, and finally, we used a simple socketing technique to report that position over WiFi to the headset. Once on the headset, we used a low pass filter to smooth out a little bit of the positional jitter, which was particularly noticeable in the z-axis, or the axis pointing out of the screen.
In the end, we got a reasonable demo that excited our friends at Oculus Research when they came to visit. I think there’s a lot of things we could explore in the future, such as the machine learning route for faster and more robust tracking, but I believe that this was a neat enough idea to share with the world. I hope you agree and had fun reading!