Researchers at the University of Surrey have successfully trained an artificial intelligence (AI) system to anticipate the 3D posture of a dog based on a 2D image. Utilizing many virtual dogs created within the video game Grand Theft Auto, the team led by postgraduate research student Moira Shooter has achieved a breakthrough with broad potential applications across various fields.
Shooter emphasized the versatility of this innovation, stating, “From ecology to animation, this clever solution holds promise for numerous applications.”
Traditionally, teaching AI to interpret 3D information from 2D images involves presenting it with photographs alongside corresponding 3D spatial data. However, as humans use motion capture suits for this purpose, such methods could be more practical for animals like dogs. To overcome this hurdle, researchers modified the Grand Theft Auto V code, substituting the game’s protagonist with various breeds of dogs, a process known as modding.
The team then generated a diverse array of videos depicting dogs in different activities and environmental conditions, creating a comprehensive DigiDogs database comprising 27,900 frames.
Their next step involves refining the system using Meta’s DINOv2 model to ensure accuracy in predicting 3D poses from real dog images. Shooter explained, “Although our model was initially trained on CGI dogs, we have successfully applied it to generate 3D skeletal models from photographs of real animals.” This capability could have significant implications, enabling conservationists to identify injured wildlife and aiding artists in crafting more lifelike animals in virtual environments.
Shooter concluded, emphasizing the richness of information contained within 3D poses compared to 2D photographs.