DynamiCrafter: Animating Open-domain
Images with Video Diffusion Priors

Supplementary Material

Unless stated otherwise, video results are mainly produced by our DynamiCrafter at a resolution of 256×256.


Showcases produced by our DynamiCrafter1024 (1024×576)

 


Showcases produced by our DynamiCrafter512 (512×320)

Hover over to view the input still images and text prompts.

Hover image
time-lapse of a blooming flower on a stem
Hover image
a train traveling through a field of flowers and grasses
Hover image
pouring honey onto some slices of bread
Hover image
a lighthouse with waving ocean
Hover image
a woman in a hat walking down a path in a forest
Hover image
a person riding a motorcycle on a city street at night
Hover image
a mouse wearing sunglasses playing dj
Hover image
a bonfire is lit in the middle of a field

 


Showcases produced by our DynamiCrafter (256×256)

Hover over to view the input still images and text prompts.

Hover image
bear playing guitar happily, snowing
Hover image
boy walking on the street
Hover image
cat dancing
Hover image
cowboy riding a bull over a fence
Hover image
zoom-in, a landscape, springtime
Hover image
two people dancing
Hover image
explode colorful smoke coming out
Hover image
A blonde woman rides on top of a moving washing machine into the sunset.
Hover image
girl talking and blinking
Hover image
sailing ship in the ocean, waves are surging
Hover image
a woman in a moving boat with lanterns
Hover image
man riding a motocycle down the street
Hover image
man playing piano
Hover image
two rabits playing
Hover image
robot walking in a field
Hover image
A girl talking
Hover image
man playing violin in the rain
Hover image
A fit track and field female athlete is seen stretching on the field in anime style.
Hover image
A regal Great Dane wearing a golden crown stands in front of a neon-lit, cyberpunk cityscape, adorned in metallic armor. The Meta Fonts advertisement in the starry sky above indicates the city is the hub of the digital currency universe, with a high-quality and futuristic design.
Hover image
A bird on the tree branch.
Hover image
An old house is being demolished at a construction site.
Hover image
Some people walks on a road with pedestrian crossing.
Hover image
explode, colorful smoke
Hover image
A burger, fries, and a soda from a fast food restaurant.
Hover image
Man with fire buring on his head
Hover image
a beautiful landscape, springtime
Hover image
girl dancing, red smoke behind
Hover image
a man with fire burning
Hover image
A flying city filled with airships, contraptions, cogs, and gears, all illuminated by the dim gaslight.
Hover image
A futuristic protogen with special abilities poses in a masculine and epic mid-battle stance.
Hover image
A futuristic, steampunk-style planet with a bustling city and a large, industrial ship dominates the scene amidst narrow and winding streets, while a giant, mysterious mechanical structure looms in the distance with flying transportation vehicles dotting the sky.
Hover image
An old couple takes a peaceful stroll through a blooming cherry blossom field alongside a serene pond in this detailed oil on canvas painting created in Caravaggio style during the Baroque period.
Hover image
A little boy sits by a small river, crying, while a Belgian shepherd dog with good eyes looks at him.
Hover image
A veteran holding a plant that represents the hope and healing.
Hover image
An enraged nerd in Pixar Style is featured in a digital art cartoon, captured in a photoshoot under the bright and glowing nightclub lighting.
Hover image
A robot soldier is in the process of decimating his human enemy in a dynamic pose.
Hover image
A hip hop dancer performing in Madrid.
Hover image
A cute Alice model is portrayed wearing a blue dress with a bored expression against a pink background.
Hover image
A Viking is talking on a mobile phone.
Hover image
A Formula One driver walks towards an exploding car.
Hover image
a robot walking
Hover image
A girl wearing a white top and a gold emerald necklace in the Renaissance era.
Hover image
An obese raccoon wielding a sledgehammer performs a song by The Scorpions at a rock festival, looking into the camera lens, exuding lazy badassery in a photorealistic style.
Hover image
cat riding a scooter in the heavy rain
Hover image
A male Viking God warrior wielding an enormous axe fights in Valhalla.
Hover image
A wealthy bull smoking a cigar.
Hover image
An attractive female cyborg is holding a machine gun, looking ready for action.
Hover image
A modern city with a neighborhood in the center, featuring pristine white buildings and skyscrapers. In the heart of the neighborhood lies a lush and beautiful park, creating a bright and friendly atmosphere with a touch of futurism.
Hover image
A blond guy wearing green overalls, a black t-shirt, and green kitty ears headphones, giving a thumbs up. He is standing in front of an amusement park, exuding a traditional line, fun, and colorful style reminiscent of Jojo's Bizarre Adventure.
Hover image
A tall man with soft features wearing a light-colored sweater stands in the middle of an empty train station platform during golden hour in autumn.
Hover image
horse running in a field
Hover image
A Ford Mustang drives on a road through rain and snow.
Hover image
Man dancing and performing in front of a crowd
Hover image
Voldemort, a prisoner cook, prepares food while wearing an apron, dimly-lit prison kitchen.
Hover image
robot mecha dancing

 


Comparisons with baseline methods (1024×576)

 


Comparisons with baseline methods (256×256)

We compare our method against existing methods using still images with a wide range of content (e.g., landscape, human, animal, vehicle) and style (e.g., real-life, AI-generated, painting, clay, anime).

 


Motion control using text

"Man talking" PikaLabs Gen-2 DynamiCrafter (Ours) DynamiCrafterDCP (Ours)
"Man waving hands"
"Man clapping"

 


Applications

Storytelling with shots. We can use ChatGPT (enpowered by DALLĀ·E 3) to create several shots of a story and then generate storytelling videos by animating these shots.

"A disheartened bear sat by the lake, hanging its head." "He is meeting a girl and introducing himself."
"He chatted happily with that girl by the lake." "Before leaving, the girl told him to be positive."

Generative frame interpolation (@512×320 resolution).

Input starting frame Input ending frame Generated video

Looping video generation (@512×320 resolution).


 


Other controls

FPS control.

"An anime scene with windmills standing tall in a field and blue sky" FPS = 30 FPS = 10 FPS = 5
"A boat moving on the sea" FPS = 30 FPS = 10 FPS = 5

Multi-cond classifier free guidance. Higher stxt and simg indicates a more significant impact for the text prompt and image condition, respectively.

"A statue of two men with wings are dancing" stxt=simg=7.5 stxt=1.2, simg=7.5 stxt=7.5, simg=1.2

 


Ablation study

Dual-stream image injection.

"A camel in a zoo enclosure" Ours w/o ctx w/o VDG w/o λ OursG

Training paradigm. Visual comparisons of the context conditioning stream learned in one-stage and our two-stage adaption strategy.

"A man hiking in the mountains with a backpack" One-stage Our adaption

Training paradigm.

"A girl with short blue and pink hair speaking" Ours Fine-tuning entire. 1st frame condtion

 


Limitations

Challenging case in terms of image content understanding.

"Moving clouds in an anime scene" Output

Inability to generate specific motions since the dataset lacks precise motion descriptions.

"Girl rubbing her eyes" Output