Skip to main content

Gvenet And Alice (2025)

: Pure text pre-training does not adapt well to visual grounding; the AG-ALICE integration requires careful tuning of attention temperature.