Researchers from ETH Zurich and Microsoft present “PixLoc”: a neural network for aligning features with a 3D model of the environment



In known scenes, estimating the pose of the camera is an intriguing task of 3D geometry recently tackled by many learning algorithms. Many of these techniques try to find geometric quantities like poses or 3D points, which are precise enough for this kind of work – but it can be difficult when there is no guarantee that they will generalize well beyond. beyond what has been seen before.

Researchers at ETH Zurich and Microsoft have created an end-to-end solution for estimating camera pose. To do this, they did not regress geometric quantities by teaching the fundamentals of deep networks or 3D map encoding as previous approaches that used this approach would. Instead, the research team will Back to functionality. They show that learning robust and generic features is sufficient for precise localization by taking advantage of classic image alignment with existing 3D maps. The researchers developed this new trainable algorithm, ‘PixLoc, ‘ locate 3D images using CNN (convolutional neural network). In other words, PixLoc is a scene-independent neural network that estimates a precise 6-degree-of-freedom pose from an image and a 3D model.


With the help of classical geometric optimization, the network does not need to be trained on the pose regression. Instead, it extracts the appropriate characteristics and becomes precise in any scene. PixLoc was formed from start to finish, from pixels to pose. The team ran the direct line-up and only supervised their poses for training purposes.

The proposed formulation is capable of generating simple but precise localization models. It can even compete with more complex edge approaches when trained by scene. PixLoc is the lightweight post-processing step that allows you to refine the poses estimated by any existing system.

According to the researchers, “PixLoc is the first end-to-end trainable approach capable of being deployed in new, vastly different scenes of its training data without recycling or fine-tuning. “




Leave A Reply

Your email address will not be published.