Multi-Agent Game 🎮 Generation and Evaluation via Audio-Visual Recordings 📹
Generating interactive multimedia content such as video games or animations (with sound) is a challenging task. In this work, I propose an evaluation metric for assessing and a multi-agent system for generating such content.
AVR-Eval is an evaluation metric for multimedia content (video games, animations, etc.) through an omni-modal model (processing text, video, and audio) that compares the Audio-Visual Recordings (AVR) of two contents. Contrary to metrics like FVD or JEDi, it does not require any dataset. Contrary to WebDev Arena, it does not require human evaluators.
AVR-Agent is a multi-agent framework leveraging both coding and omni-modal models for video-game generation through AVR feedback and a bank of multimedia assets made by artists (images, sound, music, 3D models).
I show below examples of generated games. Read the paper for more details. The code for AVR-Eval and AVR-Agent is available here.
Examples games made with Kimi-K2
Beat em up game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Incremental game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Platformer game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Examples games made with Qwen3-Coder
Beat em up game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Incremental game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Bowling game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Solitaire game
One-shot with assets
AVR-Agent (10 steps) with assets and AVR feedback
AVR-Agent (10 steps) without assets nor AVR feedback
Evaluation metric: AVR-eval
Multi-agent framework: AVR-Agent