Single View Metrology: In The Wild
The classical approach (think Antonio Criminisi’s seminal work at Microsoft Research in the late 1990s) relied on a clever hack: . If you can identify three orthogonal vanishing points in an image (say, the X, Y, and Z axes of a building), you can recover the camera’s intrinsic parameters and, crucially, set up a 3D coordinate system.
But the real world is neither clean nor obedient. single view metrology in the wild
Large-scale deep learning models have now seen millions of images. They don't "calculate" depth so much as recognize it. A model knows that a door is usually 2 meters tall, a car tire is roughly 70 cm in diameter, and a human torso is about 45 cm wide. In the wild, the model uses these semantic anchors as a virtual tape measure. Large-scale deep learning models have now seen millions
For decades, the golden rule of metrology—the science of measurement—was simple: You cannot measure what you cannot touch. In the wild, the model uses these semantic
If you wanted to know the height of a doorway, the width of a warehouse, or the distance between two streetlamps, you needed a physical tool: a laser, a tape measure, or at least a stereo camera rig. Then came the constraint of "controlled environments." Labs with checkerboard patterns. Studios with calibrated lighting. Clean, tidy, obedient data.
Here is how state-of-the-art systems (like those from Meta, Google Research, or academic labs at ETH Zurich) operate in the wild today: