AI vs. Humans: Which Performs Certain Skills Better?
With ChatGPT’s explosive rise, AI has been making its presence felt for the masses, especially in traditional bastions of human capabilities—reading comprehension, speech recognition and image identification. In fact, in the chart above it’s clear that AI has surpassed human performance in quite a few areas, and looks set to overtake humans elsewhere.
How Performance Gets Tested
Using data from Contextual AI, we visualize how quickly AI models have started to beat database benchmarks, as well as whether or not they’ve yet reached human levels of skill.
Each database is devised around a certain skill, like handwriting recognition, language understanding, or reading comprehension, while each percentage score contrasts with the following benchmarks:
- 0% or “maximally performing baseline”: This is equal to the best-known performance by AI at the time of dataset creation.
- 100%: This mark is equal to human performance on the dataset.
By creating a scale between these two points, the progress of AI models on each dataset could be tracked. Each point on a line signifies a best result and as the line trends upwards, AI models get closer and closer to matching human performance.
See the full article on visualcapitalist.com
Dataset
Skill | Matched Human Performance | Database Used |
---|---|---|
Handwriting Recognition | 2018 | MNIST |
Speech Recognition | 2017 | Switchboard |
Image Recognition | 2015 | ImageNet |
Reading Comprehension | 2018 | SQuAD 1.1, 2.0 |
Language Understanding | 2020 | GLUE |
Common Sense Completion | 2023 | HellaSwag |
Grade School Math | N/A | GSK8k |
Code Generation | N/A | HumanEval |
Data sources
*For each benchmark, the maximally performing baseline reported in the benchmark paper is taken as the "starting point" which is set at 0%. Human performance number is set at 100%.