Tagged: benchmark
2 posts and 1 project
Posts
CentralGauge Benchmark Update: Why the Numbers Changed
A transparency report on significant fixes to the CentralGauge AL code benchmark infrastructure, including bugs in code extraction, broken tasks, and vague specs, along with updated LLM rankings.
How I Benchmark LLMs on AL Code
An in-depth look at CentralGauge, an open source benchmark for evaluating LLM performance on AL code generation for Business Central, covering task design, scoring methodology, and cross-model comparison results.