🏆 #1 on Video Arena · Launched 2026-04-26NEW · Happy Horse 1.0 by Alibaba · 2026-04

Happy Horse 1.0 AI Video Generator with Lip-Sync

Alibaba's newest AI video model — ranked #1 on Artificial Analysis Video Arena (Text-to-Video Elo 1333, Image-to-Video Elo 1392), above Sora 2, Veo 3.1, and Kling.

Joint audio-video generation in a single pass. 1080p output. Multilingual lip-sync across 7 languages: English, Mandarin, Cantonese, Japanese, Korean, German, French.

3-15s Range
Native Audio + Lip-Sync
5 Aspect Ratios
Mode
Model
Happy Horse 1.060
Duration
Aspect Ratio
Resolution

Audio

Happy Horse 1.0 generates audio + video together with lip-sync — no separate audio toggle. Output always includes native synced audio.

Prompt

Video Preview

Enter a prompt and click generate to create your Happy Horse video

Happy Horse 1.0 Model

Alibaba's #1 AI video model with joint audio-video and 7-language lip-sync.

#1 VIDEO ARENA

Happy Horse 1.0

Joint audio-video generation with multilingual lip-sync

#1 Video Arena
  • Joint audio-video output (single pass)
  • 3-15 second range
  • Multilingual lip-sync (7 languages)
  • Up to 1080p resolution
From 36 credits / 3s @ 720p

Key Capabilities

Why Happy Horse 1.0 ranks #1 on Artificial Analysis Video Arena across both Text-to-Video and Image-to-Video benchmarks.

Native Audio + Lip-Sync

Joint diffusion of audio and video in a single forward pass — no post-production merge. Multilingual lip-sync across 7 languages for character dialogue.

Text to Video

Transform text descriptions into 3-15 second cinematic videos with native synced sound and lip-form alignment for any speech.

Image to Video

Animate still images with natural motion and synced audio. Upload a reference image and describe the motion + dialogue you want.

5 Aspect Ratios

Support for 16:9 (YouTube), 9:16 (TikTok / Reels), 1:1 (Instagram), 4:3 (legacy), and 3:4 (portrait). Pick at generation time.

Feature Deep Dive

How Happy Horse 1.0 delivers joint audio-video generation in a single forward pass.

Text to Video

Text-to-Video Generation

Create scenes with character dialogue, ambient sound, and expressive motion from text alone. Specify the spoken language and Happy Horse aligns lip motion in 7 languages.

Prompt example

A barista in Tokyo welcomes a customer in Japanese ("いらっしゃいませ"), warm cafe ambience, soft jazz, slow dolly forward.

Text-to-Video Generation
Image to Video

Image-to-Video Animation

Bring a still photo to life with natural motion and synced audio. Drop in any reference image and describe the action + dialogue.

Prompt example

The street vendor smiles and says "Hello, my friend!" in English, neon signs flicker, drizzle catches the light.

Image-to-Video Animation
Audio + Lip-Sync

Joint Audio-Video Generation

Audio is co-generated, not bolted on. Lip-sync alignment lands in 7 languages: English, Mandarin, Cantonese, Japanese, Korean, German, French.

Prompt example

A French chef explains a recipe in French ("On commence par le beurre…"), pan sizzles, knife taps cutting board, hand-held camera.

Joint Audio-Video Generation
Aspect Ratios

Five Aspect Ratios

Pick a ratio at generation time. Optimized for the major social platforms — no cropping, no letterbox.

Prompt example

9:16 vertical clip — a skateboarder lands a kickflip, ambient street, cheering crowd.

Five Aspect Ratios

Credits Pricing

12 credits per second @ 720p · 24 credits per second @ 1080p (about $0.06/credit on Plykit Pro).

1 credit ≈ $0.06 on Plykit Pro. Audio is always included — Happy Horse generates audio + video in one pass.
Duration720p1080p
3s40 credits80 credits
5s60 credits120 credits
8s96 credits192 credits
10s120 credits240 credits
12s144 credits288 credits
15s180 credits360 credits

How to Use Happy Horse 1.0

Generate your first Happy Horse video in three steps.

Step 1

Pick a mode

Text-to-Video starts from scratch. Image-to-Video animates a reference image you upload. For both, write the action + dialogue (specify language for lip-sync).

Step 2

Configure size + duration

Pick 3-15 seconds and 720p or 1080p. Pick aspect ratio for your target platform. We suggest a 5s 720p test first, then re-run at 1080p for the keeper.

Step 3

Generate and download

Click Generate Video. About 38 seconds for 1080p — you get back the video with native synced audio and aligned lip-sync.

Gallery

A sample of videos created with Happy Horse 1.0.

Tokyo Barista — Japanese Dialogue

A barista in Tokyo welcomes a customer in Japanese, warm cafe ambience, soft jazz, slow dolly forward.

Multilingual lip-sync: Japanese dialogue lands cleanly with native ambience.

Hong Kong Skateboarder

9:16 vertical clip — a skateboarder lands a kickflip on a Hong Kong rooftop, ambient street, cheering crowd.

Human motion: skating physics + crowd reaction in one pass.

French Chef Tutorial

A French chef explains a recipe in French, pan sizzles, knife taps cutting board, hand-held camera.

Audio sync: pan-sizzle + knife-tap match visual motion frame-perfect.

New York Street Vendor

A street vendor smiles and says "Hello, my friend!" in English, neon signs flicker, drizzle catches the light.

I2V: still photograph animated with motion, weather, and synced English greeting.

Creators Love Happy Horse 1.0

Early feedback from creators using Happy Horse on Plykit.

The lip-sync in Mandarin is shockingly clean — better than anything I've used. No post-production matching needed.

Lin — Travel Vlogger

Joint audio-video is a game-changer. I can prototype tutorial videos with native French dialogue in under a minute.

Marc — French Cooking Channel

1080p in 38 seconds with synced audio at this price beats every API I've tested.

Asha — Indie Filmmaker

Explore More Video Models

Compare Happy Horse 1.0 with other AI video generators on Plykit.

Kling

Video

Cost-effective AI video with native audio by Kuaishou.

Try now

Sora 2

Video

OpenAI's advanced video model with cinematic quality.

Try now

Veo 3.1

Video

Google DeepMind's video model with best-in-class audio.

Try now

Flux 2

Image

Top open-source image model by Black Forest Labs with high fidelity.

Try now

Nano Banana

Image

Our flagship image model powered by Gemini for creative magic.

Try now

FAQ

Common questions about Happy Horse 1.0 on Plykit.

Ready to Create Videos with Synced Audio?

Generate AI videos with native audio + multilingual lip-sync using Happy Horse 1.0 — Alibaba's #1 video model.