Skip to main content
← All Tags

Multimodal AI

4 articles in this category

AI NewsMultimodal AISmall Language Model

Microsoft Phi-4-Reasoning-Vision-15B: A 15B Parameter Multimodal Model for GUI and Math Reasoning

Microsoft launches Phi-4-Reasoning-Vision-15B, a compact 15B parameter multimodal model optimized for GUI grounding and scientific reasoning.

Read more
AI NewsMultimodal AIComputer Vision

Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio And Large Scale Multimodal Retrieval

Meta AI released PE-AV, a multimodal encoder achieving state-of-the-art performance on audio and video benchmarks with a 10.4 R@1 improvement on AudioCaps.

Read more
AI NewsVision Language ModelMultimodal AI

Jina AI Releases Jina-VLM: A 2.4B Multilingual Vision Language Model Focused on Token Efficient Visual QA

Jina AI released Jina-VLM, a 2.4B parameter multilingual vision language model achieving state-of-the-art results on multilingual benchmarks like MMMB and Multilingual MMBench.

Read more
AI NewsMultimodal AIVideo Analysis

MMCTAgent enables multimodal reasoning over large video collections

Microsoft's MMCTAgent boosts video analysis accuracy by 14% on MM-Vet, using Planner-Critic architecture for iterative reasoning.

Read more