📺 Stream EntrepreneurTV for Free 📺

Microsoft's New AI Can Make Photographs Sing and Talk — and It Already Has the Mona Lisa Lip-Syncing The VASA-1 AI model was not trained on the Mona Lisa but could animate it anyway.

By Sherin Shibu

Key Takeaways

  • Microsoft researchers have come up with a way to turn an image of someone into a video of them lip-syncing to an unrelated audio clip.
  • The framework they came up with is called VASA-1.
  • They wrote that they were "opposed to any behavior to create misleading or harmful contents of real persons."
entrepreneur daily

Microsoft published a research paper this week highlighting a new AI model called VASA-1 that can transform a single picture and audio clip of a person into a realistic video of them lip-syncing — with facial expressions, head movements, and all.

The AI model was trained on AI-generated images from generators like DALL·E-3, which the researchers then layered with audio clips. The results are images-turned-videos of talking faces.

The researchers built on technology from competitors such as Runway and Nvidia, but state in the paper that their method of doing things is higher-quality, more realistic, and "significantly outperforms" existing methods.

Related: Adobe's Firefly Image Generator Was Partially Trained on AI Images From Midjourney

The researchers said the model can take in audio of any length and generate a talking face in accordance with the clip.

The only image that wasn't AI-generated that the researchers experimented with was the Mona Lisa. They made the iconic image lip-sync to Anne Hathaway's "Paparazzi," which starts with the lines "Yo I'm a paparazzi, I don't play no yahtzee."
A screenshot of the video mid-frame. Credit: Entrepreneur

The Mona Lisa was one example of a photo input that the AI model was not trained on — but could manipulate anyway. The model could also transform artistic photos, take in singing audios, and handle speech in languages that weren't English.

The researchers emphasized that the model could work in real-time with a demo video that showed the model instantly animating images with head movements and facial expressions.

Deepfakes, or digitally altered media of a person that could spread misinformation or take someone's likeness without permission, are a risk posed by advanced AI that can generate digital media with relatively few reference points.

Related: Tennessee Passes Law Protecting Musicians From AI Deepfakes

Microsoft addressed that concern generally in the paper, with the researchers stating, "We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection."

The researchers stated that their technique had potentially positive applications too, like improving accessibility and enhancing educational efforts.

Google demoed a similar research project last month, showcasing an AI capable of taking a photo and creating a video from it that the user can then control with their voice. The AI was able to add head movements, blinks, and hand gestures.
Sherin Shibu

Entrepreneur Staff

News Reporter

Sherin Shibu is a business news reporter at Entrepreneur.com. She previously worked for PCMag, Business Insider, The Messenger, and ZDNET as a reporter and copyeditor. Her areas of coverage encompass tech, business, strategy, finance, and even space. She is a Columbia University graduate.

Want to be an Entrepreneur Leadership Network contributor? Apply now to join.

Editor's Pick

Business News

Elon Musk Reveals His Tactics for Building Successful Companies, Including Sleeping Under His Desk and 'Working Every Waking Hour'

Musk shared the secrets on a podcast with Nicolai Tangen, CEO of the $1.6 trillion Norges Bank.

Business News

Ring Camera Owners Will Receive $5.6 Million in Payments After FTC-Amazon Settlement. Here's How Many Customers Are Eligible — And How They'll Get the Cash.

The payouts are a result of a June 2023 settlement with Amazon over privacy violation allegations against the camera company.

Business News

'My Mouth Dropped': Woman Goes Viral For Sharing Hilarious Cake Decorating Mishap at Walmart

Peyton Chimack has received over 703,000 views on her TikTok post of her birthday cake.

Side Hustle

3 Secrets to Starting a Small Business Side Hustle That Gives Your Day Job a Run for Its Money, According to People Who Did Just That — and Made Millions

Almost anyone can start a side hustle — but only those ready to level up can use it to out-earn their 9-5s.

Business News

Jeff Bezos and Amazon Execs Used An Encrypted Messaging App to Talk About 'Sensitive Business Matters,' FTC Alleges

The FTC's filing claims Bezos and other execs used a disappearing message feature even after Amazon knew it was being investigated.

Real Estate

Is It More Profitable to Buy a Single-Family or Multi-Unit Property? Here's What Beginner Real Estate Investors Need to Know.

Making the most profitable decision means having a clear eye towards risk tolerance, time availability and management training enthusiasm, among other factors.