Large Language Models (LLMs) and Artificial General Intelligence have been the talk of the town lately. But, have you ever wondered how these advanced AIs would perform on real-world (e.g. construction accounting) problems that require fundamental human abilities? GAIA's (Dec’ 23) measurement dataset debunks the myth of AI outperforming humans. Human respondents obtained 92% vs. 15% for GPT-4 equipped with plugins on GAIA's questions. This performance disparity contrasts with the current trend of LLMs outperforming humans on tasks requiring professional skills like accounting and law. The advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. If you're interested in learning more, check out GAIA's leaderboard with 466 questions and their answers. Link to the break-through paper.