Multimodal AI

Models that combine images, text, audio, and clinical signals. We cover fusion architectures, missing-modality robustness, and cross-modal alignment, with an emphasis on what actually improves when modalities are combined.

RideJudge: How an 8B Model Outperforms 32B Baselines at Ride-Hailing Dispute Resolution

RideJudge: How an 8B Model Outperforms 32B Baselines at Ride-Hailing Dispute Resolution

RideJudge: How an 8B Model Outperforms 32B Baselines at Ride-Hailing Dispute Resolution | AI Trend Blend AITrendBlend Machine Learning Computer Vision About LLM Reasoning · Applied AI · arXiv:2603.17328 · Nanjing University & Didi Chuxing (2026) · 19 min read RideJudge: Teaching an 8B Model to Out-Think 32B Rivals on the Hardest Calls in Ride-Hailing […]

RideJudge: How an 8B Model Outperforms 32B Baselines at Ride-Hailing Dispute Resolution Read More »

Think Before You Segment: How TGS-Agent Teaches AI to Reason About Sound Before Picking Up a Brush.

Think Before You Segment: How TGS-Agent Teaches AI to Reason About Sound Before Picking Up a Brush

Think Before You Segment: How TGS-Agent Teaches AI to Reason About Sound Before Picking Up a Brush | AI Trend Blend AITrendBlend Computer Vision Machine Learning About Audio-Visual AI · AAAI 2026 · Mohamed Bin Zayed University of AI · 26 min read Think Before You Segment: How TGS-Agent Teaches AI to Reason About Sound

Think Before You Segment: How TGS-Agent Teaches AI to Reason About Sound Before Picking Up a Brush Read More »