Is China pulling ahead in AI video synthesis? We put Minimax to the test

Skip to direct

With China’s AI video mills pushing memes into uncommon territory, it used to be time to examine one out.

A unruffled shot from an AI-generated Minimax video-01 video with the advised: “A highly-intelligent person reading ‘Ars Technica’ on their computer when the screen explodes” Credit score: Minimax

If 2022 used to be the year AI image mills went mainstream2024 has arguably been the year that AI video synthesis models exploded in capability. These models, while not but supreme, can generate novel videos from textual direct descriptions known as prompts, unruffled photos, or existing videos. After OpenAI made waves with Sora in February, two foremost AI models emerged from China: Kuaishou Technology’s Kling and Minimax’s video-01.

Each and each Chinese models private already powered a lot of viral AI-generated video projects, accelerating meme custom in uncommon novel techniquesalong with a recent shot-for-shot translation of the Princess Mononoke trailer the exercise of Kling that inspired loss of life threats and a series of videos created with Minimax’s platform. The videos value a synthesized version of TV chef Gordon Ramsay doing ridiculous things.

After 22 million views and thousands of loss of life threats, I felt cherish I wished to rob this put up down for my very private mental health.
This trailer used to be an EXPERIMENT to value my 300 chums on X how far we have coming in 16 months.
I’m striking it assist up to take care of the conversation going. 🧵 pic.twitter.com/tFpRPm9BMv

— PJ Ace (@PJacetturo)”https://twitter.com/PJaccetturo/status/1843737222031519910?ref_src=twsrc%5Etfw”>October 8, 2024

Kling first emerged in June, and it’ll generate two minutes of 1080p HD video at 30 frames per 2d with a stage of mutter and coherency that some keep in mind surpasses Sora. It is currently handiest accessible to folks with a Chinese phone number, and we now haven’t but veteran it ourselves.

Around September 1, Minimax debuted the aforementioned video-01 as segment of its Hailuo AI platform. That area lets anybody generate videos in step with a advised, and preliminary outcomes looked linked to Kling, so we decided to race some of our Runway Gen-3 prompts thru it to peep what occurs.

Placing Minimax to the test

We generated each of the six-2d-long 720p videos viewed under the exercise of Minimax’s free Hailuo AI platform. Every video technology took up to 5 to 10 minutes to total, likely attributable to being in a queue with diversified free video customers. (At one level, the total thing iced over up on us for just a few days, so we didn’t win of endeavor to generate a flaming cheeseburger.)

In the spirit of not cherry-deciding on any outcomes, the entirety you peep used to be the first technology we got for the advised listed above it.

“A highly intelligent person reading ‘Ars Technica’ on their computer when the screen explodes”

“A cat in a car drinking a can of beer, beer commercial”

“Will Smith eating spaghetti

“Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens”

“A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts”

“A herd of one million cats running on a hillside, aerial view”

“Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy”

“A muscular barbarian breaking a CRT television set with a weapon, cinematic, 8K, studio lighting”

Barriers of video synthesis models

Total, the Minimax video-01 outcomes viewed above feel somewhat linked to Gen-3’s outputs, with some variations, cherish the inability of a superstar filter on Will Smith (who sadly failed to in actuality indulge in the spaghetti in our assessments), and the extra life like cat palms and licking circulate. Some outcomes were far worse, cherish the 1 million cats and the Ars Technica reader.

As we defined in our palms-on test for Runway’s Gen-3 Alphatextual direct-to-video models in general excel at combining ideas demonstrate in their training files (existing video samples veteran to manufacture the model), taking into consideration inventive mashups of existing themes and styles. Alternatively, these AI models regularly strive against with generalization, which formula they’ve danger applying learned files to entirely original situations not represented in their training files.

This limitation can lead to surprising or unintended outcomes when customers request situations that deviate too far from the model’s training examples. While we saw a in actuality silly consequence for the cat ingesting beer in the Gen-3 test, Minimax rendered a extra life like-searching consequence, and that could well maybe strategy down to better parsing of the advised, diversified training files, extra compute in training the model, or a diversified model structure. In the smash, there’s unruffled a lot of trial and mistake in producing a coherent consequence.

It is value noting that while China’s models appear to examine US video synthesis models from earlier this year, American tech companies are not standing unruffled. Google confirmed off I see in May per chance likely maybe merely with some very spectacular-searching demos. And last week, we reported on Meta’s Movie Gen model, which appears to be like (with out the exercise of Meta’s model ourselves) to likely be a step ahead of Minimax and Kling. But China’s servers are doubtlessly cranking away at training novel AI video models as we discuss, so this deepfake hands stride likely received’t decelerate any time rapidly.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the positioning’s dedicated AI beat in 2022. He is also a tech historian with nearly two decades of trip. In his free time, he writes and files music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

43 Feedback

Be taught Extra

Scroll to Top