UI-Bench: A Benchmark for Evaluating Design Capabilities of AI Text-to-App Tools

Introducing UI-Bench by AfterQuery.

The first and only rigorous eval of vibe coding tools. We tested 10 AI website builders with 194 professional designers.

After 4,000+ blind comparisons, we discovered something nobody talks about: The same AI models produce wildly different designs.

Some vibe coding tools generated client-ready websites while others generated generic templated designs. Orchids, Figma Make, and Lovable took the lead, while v0 by Vercel and Replit ranked dead last.Performance gaps are not due to the underlying AI model (most tools likely use the same LLMs). It's the layer most people ignore:

- How they orchestrate agents

- Their prompt handling -

Template library quality -

Post-processing and asset pipelinesCheck out our paper and full study at https://arxiv.org/abs/2508.20410