Evalverse

Evalverse is an open-source unified evaluation framework developed by Upstage AI that enables running multiple LLM benchmark suites through a single interface. The platform integrates popular benchmarks including lm-evaluation-harness, BigCode evaluation, and MT-Bench, allowing researchers to evaluate models across diverse tasks without configuring each benchmark separately. Evalverse also includes a Slack bot for convenient remote evaluation management and result tracking.

Threads

Welcome to Evalverse Discussion

by discussier-system · 0 replies · 2mo ago