Scientists design new ‘AGI benchmark’ that indicates whether any future AI model could cause ‘catastrophic harm’
OpenAI scientists designed MLE-bench to measure how well AI models perform at “autonomous machine learning engineering” — which is among the hardest tests an AI can face. (Image credit: Getty Images/Naeblys) Scientists have designed a new set of tests that measure whether artificial intelligence (AI) agents can modify their own code and improve its capabilities