Skip to main content
POST
/
vendors
/
alibaba
/
v1
/
wan2
/
video
/
generation
Wan2 Video Generation
curl --request POST \
  --url https://api.mulerouter.ai/vendors/alibaba/v1/wan2/video/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "wan2.5-t2v-preview",
  "prompt": "<string>",
  "negative_prompt": "<string>",
  "audio": true,
  "audio_url": "<string>",
  "size": "1280*720",
  "duration": 5,
  "prompt_extend": true,
  "seed": 1073741823
}
'
{
  "task_info": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "status": "pending",
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z",
    "error": {
      "code": 123,
      "title": "<string>",
      "detail": "<string>"
    }
  }
}
This API supports multiple Alibaba Tongyi Wanxiang (Wan2) video generation models. Please refer to Alibaba Cloud’s official documentation for more details.

Overview

Generate videos from text prompts or images using various Wan2 models, each optimized for different use cases.

Supported Models

ModelDescriptionAudioFirst FrameFirst & Last FrameResolutionDurationFPSFormat
wan2.5-t2v-previewText-to-video with auto sound or custom audio480P/720P/1080P5s/10s24fpsMP4
wan2.5-i2v-previewImage-to-video with auto sound or custom audio480P/720P/1080P5s/10s24fpsMP4
wan2.2-i2v-flashFast version, 50% speed improvement480P/720P/1080P5s30fpsMP4
wan2.2-i2v-plusProfessional version, enhanced stability480P/1080P5s30fpsMP4
wan2.1-vace-plusMulti-modal support, video editing720P5s30fpsMP4
wan2.1-kf2v-plusFirst & last frame (keyframe-to-video)720P5s30fpsMP4

Resolution Options

480P

  • 832×480 (16:9)
  • 480×832 (9:16)
  • 624×624 (1:1)

720P

  • 1280×720 (16:9)
  • 720×1280 (9:16)
  • 960×960 (1:1)
  • 1088×832 (4:3)
  • 832×1088 (3:4)

1080P

  • 1920×1080 (16:9)
  • 1080×1920 (9:16)
  • 1440×1440 (1:1)
  • 1632×1248 (4:3)
  • 1248×1632 (3:4)

Audio Features (Wan2.5 only)

Auto-generated Audio

  • Enabled by default for wan2.5-t2v-preview and wan2.5-i2v-preview
  • Automatically generates synchronized audio based on video content

Custom Audio

  • Supported formats: WAV, MP3
  • Duration: 3-30 seconds
  • Max file size: 15MB
  • Behavior: If audio is shorter than video, remaining portion is silent; if longer, it’s truncated

Example Requests

Text-to-Video (wan2.5-t2v-preview)

{
  "model": "wan2.5-t2v-preview",
  "prompt": "A small cat running on a grassy field in the moonlight",
  "size": "1920*1080",
  "duration": 10,
  "audio": true
}

Image-to-Video (wan2.5-i2v-preview)

{
  "model": "wan2.5-i2v-preview",
  "prompt": "The cat starts running forward",
  "image": "https://example.com/cat.jpg",
  "size": "1280*720",
  "duration": 5
}

Fast Generation (wan2.2-i2v-flash)

{
  "model": "wan2.2-i2v-flash",
  "prompt": "Gentle motion, camera slowly pans right",
  "image": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
  "size": "1920*1080"
}

Keyframe Interpolation (wan2.1-kf2v-plus)

{
  "model": "wan2.1-kf2v-plus",
  "prompt": "A black cat looks curiously at the sky, camera gradually rises from eye-level to overhead",
  "image": "https://example.com/first_frame.jpg",
  "last_frame": "https://example.com/last_frame.jpg",
  "size": "1280*720"
}

With Custom Audio (wan2.5-t2v-preview)

{
  "model": "wan2.5-t2v-preview",
  "prompt": "A person walking through a forest, birds chirping",
  "audio_url": "https://example.com/forest_sounds.mp3",
  "size": "1920*1080",
  "duration": 10
}

With Video Effect Template (wan2.1)

{
  "model": "wan2.1-vace-plus",
  "prompt": "Magical levitation effect",
  "image": "https://example.com/subject.jpg",
  "template": "flying",
  "size": "1280*720"
}

Image Requirements (for i2v and kf2v models)

PropertyRequirement
FormatsJPEG, JPG, PNG (no transparency), BMP, WEBP
Dimensions[360, 2000] pixels for both width and height
File SizeMax 10MB
InputPublic URL or Base64 encoded data

Parameters

size vs resolution

  • Text-to-video models use size parameter with exact dimensions (e.g., “1920*1080”)
  • Image-to-video models may use resolution parameter with quality tier (e.g., “1080P”)
The model automatically scales or matches the aspect ratio based on input

duration

Available options depend on model:
  • wan2.5: 5 or 10 seconds
  • wan2.2: 5 seconds (fixed)
  • wan2.1: 3, 4, or 5 seconds (varies by model)

prompt_extend

  • Default: true
  • Effect: Uses AI to enhance short prompts
  • Trade-off: Better results but increases processing time

Prompt Tips

For best results when describing motion:
  • Specify camera movement (pan left, zoom in, dolly shot)
  • Describe subject motion (walks forward, turns around)
  • Include environment details (windy, foggy, sunlit)
  • For keyframe interpolation, describe the transition between frames

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Select the Wan2 video generation model you want to call. Each model exposes a tailored parameter set.

model
enum<string>
required

Fixed model name.

Available options:
wan2.5-t2v-preview
prompt
string
required

Text description for the desired video content (max 2000 characters).

Maximum string length: 2000
negative_prompt
string

Negative prompt describing unwanted content (max 500 characters).

Maximum string length: 500
audio
boolean | null
default:true

Enable automatic audio generation. Set to false to force a silent output.

audio_url
string<uri> | null

Custom audio file URL (wav/mp3, 3-30s, ≤15MB). Overrides the audio flag.

size
enum<string>
default:1280*720

Output resolution ("width*height"). Supported tiers:

  • 480P: 832*480 (16:9), 480*832 (9:16), 624*624 (1:1)
  • 720P: 1280*720 (16:9), 720*1280 (9:16), 960*960 (1:1), 1088*832 (4:3), 832*1088 (3:4)
  • 1080P: 1920*1080 (16:9), 1080*1920 (9:16), 1440*1440 (1:1), 1632*1248 (4:3), 1248*1632 (3:4)
Available options:
832*480,
480*832,
624*624,
1280*720,
720*1280,
960*960,
1088*832,
832*1088,
1920*1080,
1080*1920,
1440*1440,
1632*1248,
1248*1632
duration
enum<integer>

Video duration in seconds (24 fps). Supported values 5 or 10.

Available options:
5,
10
prompt_extend
boolean
default:true

Enable intelligent prompt rewriting (slightly longer latency, better detail).

seed
integer

Random seed [0, 2147483647].

Required range: 0 <= x <= 2147483647

Response

202 - application/json

Accepted - Task created successfully

task_info
object