Alibaba’s Qwen team launched the Qwen3.5-Omni multimodal model series, including Plus, Flash and Light Instruct versions. It supports 256k context, over 10 hours of audio input and more than 400 seconds of 720P video. Trained on massive text, visual and 100+ million hours of audio-video data, it delivers strong multimodal perception and generation. Compared with Qwen3-Omni, it significantly improves multilingual capabilities, supporting speech recognition in 113 languages and speech generation i

2026-03-30

Alibaba’s Qwen team launched the Qwen3.5-Omni multimodal model series, including Plus, Flash and Light Instruct versions. It supports 256k context, over 10 hours of audio input and more than 400 seconds of 720P video. Trained on massive text, visual and 100+ million hours of audio-video data, it delivers strong multimodal perception and generation. Compared with Qwen3-Omni, it significantly improves multilingual capabilities, supporting speech recognition in 113 languages and speech generation in 36.