Microsoft published a research paper this week highlighting a new AI model called VASA-1 that can transform a single picture and audio clip of a person into a realistic video of them lip-syncing — with facial expressions, head movements, and all.
The AI model was trained on AI-generated images from generators like DALL·E-3, which the researchers then layered with audio clips. The results are images-turned-videos of talking faces.
The researchers built on technology from competitors such as Runway and Nvidia, but state in the paper that their method of doing things is higher-quality, more realistic, and "significantly outperforms" existing methods.
Please select this link to read the complete article from Entrepreneur.