Benchmark · AVeriTeC dev

The AVeriTeC sample, claim by claim.

The exact stratified-100 sample from the AVeriTeC dev set, the gold evidence each claim is judged on, and — once the run completes — the engine's verdict beside the ground-truth label. Golden-evidence condition: the engine argues only from the evidence shown here, no live web.

Loading…