Blog Viewer

How do humans perform against AI in close to real world problems?

By Rishi Srivastava posted 01-05-2024 11:04

  

Large Language Models (LLMs) and Artificial General Intelligence have been the talk of the town lately. But, have you ever wondered how these advanced AIs would perform on real-world (e.g. construction accounting) problems that require fundamental human abilities? GAIA's (Dec’ 23) measurement dataset debunks the myth of AI outperforming humans. Human respondents obtained 92% vs. 15% for GPT-4 equipped with plugins on GAIA's questions. This performance disparity contrasts with the current trend of LLMs outperforming humans on tasks requiring professional skills like accounting and law. The advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. If you're interested in learning more, check out GAIA's leaderboard with 466 questions and their answers.  Link to the break-through paper.

2 comments
25 views

Permalink

Comments

01-28-2024 17:30

Thanks for your comment David.

01-28-2024 16:54

Hi Rishi

Good find with respect to GAIA. However, like you we are intersted in real world performance that helps financial managers in construction. Not everything is suited to LLM, some things are about the money (numbers).  AI can, and does, improve financial forecasting by up to 100% in accuracy and can also give accurate early warning of future profit margin risk (about 40% earlier than is achievable by humans and traditional methids alone). To me that is direct tangable benefit to financial managers. This functionality is available now to pretty much any construction financial manager.