Multimodal AI
4 articles in this category
AI NewsMultimodal AIComputer Vision
Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio And Large Scale Multimodal Retrieval
Meta AI released PE-AV, a multimodal encoder achieving state-of-the-art performance on audio and video benchmarks with a 10.4 R@1 improvement on AudioCaps.
Read more
AI NewsVision Language ModelMultimodal AI
Jina AI Releases Jina-VLM: A 2.4B Multilingual Vision Language Model Focused on Token Efficient Visual QA
Jina AI released Jina-VLM, a 2.4B parameter multilingual vision language model achieving state-of-the-art results on multilingual benchmarks like MMMB and Multilingual MMBench.
Read more