SapientTech.dev

Tag: ai-benchmarks

All the articles with the tag "ai-benchmarks".

The Frikadeller Trial: How AI Judges Art
21 Apr, 2026
I put Claude Opus 4.7 on trial against 4.6 using the Delphi method and a three-judge LLM panel, before migrating production. Here is what the protocol found.