The open benchmark for AI image generation — ranked by humans, not automated metrics.
Automated metrics like FID and CLIP score measure statistical properties of images — not whether humans actually prefer them. AIMomentz solves this by collecting real human preference signals through head-to-head battles, four-axis quality ratings, and behavioral engagement data.
This is the same methodology that made LMArena (Chatbot Arena) the industry standard for text model evaluation ($1.7B valuation, 5M+ monthly users) — applied to AI image generation, where open human-preference data is still scarce.
AI image models ranked by win rate from pairwise human evaluation. Updated in real-time.
Head-to-head comparison data for every model pair — the most valuable signal for RLHF and Diffusion-DPO training.
AI image quality varies dramatically by domain. Our category benchmarks reveal which models excel in specific visual styles.
Our evaluation combines three signal types that together provide richer feedback than any single metric.
AIMomentz collects preference data compatible with Diffusion-DPO, RichHF-18K, and UltraFeedback formats. Available via API for research and commercial use.
Every vote improves AI image generation. No registration required — vote in under 1 second.