RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning
RETTA: Revolutionizing Zero-Shot Video Captioning with Retrieval-Enhanced Test-Time Adaptation In the rapidly evolving field of vision-language modeling, the ability to automatically generate accurate and contextually relevant descriptions of video content—known as video captioning—has become a cornerstone for applications ranging from assistive technology for the visually impaired to intelligent video search engines. While supervised models have […]
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning Read More »










