Alibaba’s Qwen team launched the Qwen3.5-Omni multimodal model series, including
Plus, Flash and Light Instruct versions. It supports 256k context, over 10 hours
of audio input and more than 400 seconds of 720P video. Trained on massive text,
visual and 100+ million hours of audio-video data, it delivers strong multimodal
perception and generation. Compared with Qwen3-Omni, it significantly improves
multilingual capabilities, supporting speech recognition in 113 languages and
speech generation in 36.