An AI agent created by UC Berkeley researchers successfully hacked and achieved near-perfect scores on eight major AI benchmarks, including SWE-bench Pro and Terminal-Bench.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results