Scientists design new ‘AGI benchmark’ that indicates whether any future AI model could cause ‘catastrophic harm’

OpenAI scientists designed MLE-bench to measure how well AI models perform at “autonomous machine learning engineering” — which is among the hardest tests an AI can face. (Image credit: Getty Images/Naeblys) Scientists have designed a new set of tests that measure whether artificial intelligence (AI) agents can modify their own code and improve its capabilities

Scientists design new ‘AGI benchmark’ that indicates whether any future AI model could cause ‘catastrophic harm’ Read More »