Multi-Agent Game 🎮 Generation and Evaluation via Audio-Visual Recordings 📹

|| Paper | Code ||

Generating interactive multimedia content such as video games or animations (with sound) is a challenging task. In this work, I propose an evaluation metric for assessing and a multi-agent system for generating such content.

AVR-Eval is an evaluation metric for multimedia content (video games, animations, etc.) through an omni-modal model (processing text, video, and audio) that compares the Audio-Visual Recordings (AVR) of two contents. Contrary to metrics like FVD or JEDi, it does not require any dataset. Contrary to WebDev Arena, it does not require human evaluators.

AVR-Agent is a multi-agent framework leveraging both coding and omni-modal models for video-game generation through AVR feedback and a bank of multimedia assets made by artists (images, sound, music, 3D models).

I show below examples of generated games. Read the paper for more details. The code for AVR-Eval and AVR-Agent is available here.

AVR-Agent