Unless stated otherwise, video results are mainly produced by our DynamiCrafter at a resolution of 256×256.
Hover over to view the input still images and text prompts.
time-lapse of a blooming flower on a stem
|
a train traveling through a field of flowers and grasses
|
|---|---|
pouring honey onto some slices of bread
|
a lighthouse with waving ocean
|
a woman in a hat walking down a path in a forest
|
a person riding a motorcycle on a city street at night
|
a mouse wearing sunglasses playing dj
|
a bonfire is lit in the middle of a field
|
Hover over to view the input still images and text prompts.
bear playing guitar happily, snowing
|
boy walking on the street
|
cat dancing
|
cowboy riding a bull over a fence
|
zoom-in, a landscape, springtime
|
|---|---|---|---|---|
two people dancing
|
explode colorful smoke coming out
|
A blonde woman rides on top of a moving washing machine into the sunset.
|
girl talking and blinking
|
sailing ship in the ocean, waves are surging
|
a woman in a moving boat with lanterns
|
man riding a motocycle down the street
|
man playing piano
|
two rabits playing
|
robot walking in a field
|
A girl talking
|
man playing violin in the rain
|
A fit track and field female athlete is seen stretching on the field in anime style.
|
A regal Great Dane wearing a golden crown stands in front of a neon-lit, cyberpunk cityscape, adorned in metallic armor. The Meta Fonts advertisement in the starry sky above indicates the city is the hub of the digital currency universe, with a high-quality and futuristic design.
|
A bird on the tree branch.
|
An old house is being demolished at a construction site.
|
Some people walks on a road with pedestrian crossing.
|
explode, colorful smoke
|
A burger, fries, and a soda from a fast food restaurant.
|
Man with fire buring on his head
|
a beautiful landscape, springtime
|
girl dancing, red smoke behind
|
a man with fire burning
|
A flying city filled with airships, contraptions, cogs, and gears, all illuminated by the dim gaslight.
|
A futuristic protogen with special abilities poses in a masculine and epic mid-battle stance.
|
A futuristic, steampunk-style planet with a bustling city and a large, industrial ship dominates the scene amidst narrow and winding streets, while a giant, mysterious mechanical structure looms in the distance with flying transportation vehicles dotting the sky.
|
An old couple takes a peaceful stroll through a blooming cherry blossom field alongside a serene pond in this detailed oil on canvas painting created in Caravaggio style during the Baroque period.
|
A little boy sits by a small river, crying, while a Belgian shepherd dog with good eyes looks at him.
|
A veteran holding a plant that represents the hope and healing.
|
An enraged nerd in Pixar Style is featured in a digital art cartoon, captured in a photoshoot under the bright and glowing nightclub lighting.
|
A robot soldier is in the process of decimating his human enemy in a dynamic pose.
|
A hip hop dancer performing in Madrid.
|
A cute Alice model is portrayed wearing a blue dress with a bored expression against a pink background.
|
A Viking is talking on a mobile phone.
|
A Formula One driver walks towards an exploding car.
|
a robot walking
|
A girl wearing a white top and a gold emerald necklace in the Renaissance era.
|
An obese raccoon wielding a sledgehammer performs a song by The Scorpions at a rock festival, looking into the camera lens, exuding lazy badassery in a photorealistic style.
|
cat riding a scooter in the heavy rain
|
A male Viking God warrior wielding an enormous axe fights in Valhalla.
|
A wealthy bull smoking a cigar.
|
An attractive female cyborg is holding a machine gun, looking ready for action.
|
A modern city with a neighborhood in the center, featuring pristine white buildings and skyscrapers. In the heart of the neighborhood lies a lush and beautiful park, creating a bright and friendly atmosphere with a touch of futurism.
|
A blond guy wearing green overalls, a black t-shirt, and green kitty ears headphones, giving a thumbs up. He is standing in front of an amusement park, exuding a traditional line, fun, and colorful style reminiscent of Jojo's Bizarre Adventure.
|
A tall man with soft features wearing a light-colored sweater stands in the middle of an empty train station platform during golden hour in autumn.
|
horse running in a field
|
A Ford Mustang drives on a road through rain and snow.
|
Man dancing and performing in front of a crowd
|
Voldemort, a prisoner cook, prepares food while wearing an apron, dimly-lit prison kitchen.
|
robot mecha dancing
|
|
|
|---|
We compare our method against existing methods using still images with a wide range of content (e.g., landscape, human, animal, vehicle) and style (e.g., real-life, AI-generated, painting, clay, anime).
| "Man talking" | PikaLabs | Gen-2 | DynamiCrafter (Ours) | DynamiCrafterDCP (Ours) |
|---|---|---|---|---|
![]() |
||||
| "Man waving hands" | ||||
![]() |
||||
| "Man clapping" | ||||
![]() |
||||
Storytelling with shots. We can use ChatGPT (enpowered by DALLĀ·E 3) to create several shots of a story and then generate storytelling videos by animating these shots.
| "A disheartened bear sat by the lake, hanging its head." | "He is meeting a girl and introducing himself." | ||||||
|---|---|---|---|---|---|---|---|
![]() |
![]() |
||||||
![]() |
"He chatted happily with that girl by the lake." | "Before leaving, the girl told him to be positive." | ![]() |
||||
![]() |
![]() |
||||||
Generative frame interpolation (@512×320 resolution).
| Input starting frame | Input ending frame | Generated video |
|---|---|---|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
Looping video generation (@512×320 resolution).
FPS control.
| "An anime scene with windmills standing tall in a field and blue sky" | FPS = 30 | FPS = 10 | FPS = 5 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| "A boat moving on the sea" | FPS = 30 | FPS = 10 | FPS = 5 |
![]() |
![]() |
![]() |
![]() |
Multi-cond classifier free guidance. Higher stxt and simg indicates a more significant impact for the text prompt and image condition, respectively.
| "A statue of two men with wings are dancing" | stxt=simg=7.5 | stxt=1.2, simg=7.5 | stxt=7.5, simg=1.2 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Dual-stream image injection.
| "A camel in a zoo enclosure" | Ours | w/o ctx | w/o VDG | w/o λ | OursG |
|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Training paradigm. Visual comparisons of the context conditioning stream learned in one-stage and our two-stage adaption strategy.
| "A man hiking in the mountains with a backpack" | One-stage | Our adaption |
|---|---|---|
![]() |
![]() |
![]() |
Training paradigm.
| "A girl with short blue and pink hair speaking" | Ours | Fine-tuning entire. | 1st frame condtion |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Challenging case in terms of image content understanding.
| "Moving clouds in an anime scene" | Output |
|---|---|
![]() |
![]() |
Inability to generate specific motions since the dataset lacks precise motion descriptions.
| "Girl rubbing her eyes" | Output |
|---|---|
![]() |
![]() |