Unless you’ve been living under a digital rock for the last few months, you’ve probably heard of or already used ChatGPT, the much-hyped next-gen chatbot from OpenAI.
Since it was launched in November, its ‘revolutionary’ capabilities have been written about and tested extensively. Most recently, it made headlines for “passing” some of the toughest exams in the United States, including Wharton’s MBA exam, law school exams, and even the US medical licensing exam.
So we were curious to see how the AI would fare on one of Singapore’s most notorious exams: the Primary School Leaving Examination (PSLE) taken by students in their 6th year of primary school, when they’re around 12 years old, before entering secondary school.
The most feared section of the PSLE is the math section. Singapore is renowned internationally for the quality of its math education, with its students regularly ranking among the top internationally, so it’s not surprising that the PSLE exam features some fiendishly difficult questions. The math exams were so tough in 2021 that they reportedly left some students in tears.
So how did ChatGPT perform when we fed it a sampling of questions from recent PSLE exams?
Question #1: Ivan and Helen’s coins
We decided not to take it easy on the AI, so we started it off with a PSLE question from 2021 that was so notoriously difficult it became a viral meme.
To be clear, we would also probably start crying if somebody asked us to answer that question on an exam. But here’s how ChatGPT responded:
And those answers are…. WRONG.
ChatGPT made a mistake right off the bat by ignoring the part of the question that says Ivan and Helen have the same number of coins.
As you can see from this helpful diagram by Facebook user Ming Sui, them having the same number of coins is the key to getting the right answer.
Since Helen has 64 20-cent coins and Ivan has 104 20-cent coins, we know Helen’s first 64 20-cent coins and Ivan’s first 64 20-cent coins cancel each other out.
Both of their 50-cent coins beyond the 104 coins also cancel each other out.
This leaves Helen with 40 50-cent coins, worth $20, and Ivan with 40 20-cent coins, worth $8
So the correct answer for: a) is Helen has $12 more in coins
For question b) Ming Sui has another helpful diagram
So we know Helen’s coins weigh 1.134kg, or 1134g, and that Helen and Ivan have the same number of coins.
So the difference in weight between Helen’s and Ivan’s coins is the difference between Helen’s 40 50-cent coins and Ivan’s 40 20-cent coins, since the weight of the other coins cancels each other out.
Since we know a 20-cent coin is 2.7g lighter than a 40-cent coin, you can surmise that 40 20-cent coins weigh 108g less than 40 50-cent coins.
So 1134 – 108 = 1026g or 1.026kg
So the correct answer to b) is 1.026kg
Here’s a video explaining both answers in case you need it (we certainly did!)
Question #2: Jessie’s ribbons
This is a tricky question from the 2017 PSLE:
Here’s how ChatGPT responded:
So very WRONG. Right off the bat, the AI makes a very basic conversion error: 110 cm = 1.1 meters, not 0.11 meters!
But even if it got the conversion correct, this one is tricky in a way that seems unlikely the AI would grasp.
You might think 1.1m x 200 = 220m, so you need to buy 8 rolls, plus 1 more to cover the remaining 20m. So 9 rolls.
But the correct answer is 10. Since 25m ÷1.1m = 22.72, you can only get 22 ribbons out of each roll. 9 x 22 only equals 198 ribbons, so you actually need 10.
Question #3: Easy beady
Okay, clearly we need to go a little easier on the AI to give it a fighting chance. So here’s a relatively easy one from 2021.
Here’s ChatGPT’s answer:
And, once again, ChatGPT is WRONG. It seemed to totally forget about the brown beads after its first two steps.
Rather than assuming 100 beads, you can just remember the ratio of 3 green beads to 2 brown beads (3:2 = 60% to 40%)
Then, you can use the ratio of brown to green and the fact that there are 26% green beads in the end to find the percentage of brown beads in the end:
26 x (3/2) = 39% brown beads at the end
100% – (39% brown bead +26% green beads) = 35% yellow beads.
So the correct answer in 35%
So could ChatGPT pass a PSLE Math exam?
ChatGPT was able to get really basic questions that appear early in PSLE exams such as this one from 2020.
But for any questions much more difficult than that, those requiring multiple steps or inductive reasoning, like the questions above, the vaunted AI proved less than impressive. And, of course, it couldn’t answer any questions that require analyzing an image.
If it managed to get all of the easy text-based questions correct it could perhaps pass, but likely with an embarrassingly low score.
We should note that we’re not the first to test ChatGPT’s potential PSLE prowess. A few weeks ago Redditor u/gabrielwu84 did a similar experiment and got similar results, with the AI managing to only get one out of six questions right.
But on Jan 30, OpenAI said in the release notes for the latest iteration of the program that it had “upgraded the ChatGPT model with improved factuality and mathematical capabilities,” which is why we thought it might fare better this time around. Not so much the case.
Despite all of its other impressive capabilities, math is one subject that ChatGPT and other LLMs (large language models) have a surprising amount of trouble with. Although they are able to call upon a massive wealth of information, they are bad at abstracting all of that knowledge into logic and rules consistently.
So while ChatGPT may be smart enough to be a lawyer in the United States, it seems like Singapore’s top primary school students can feel safe in their mathematical superiority, for now.
More stories you should check out:
He even made her do counseling with him. Read more.
Ads and sponsored posts are quite the norm these days but imagine getting your mindless light-hearted scrolling sesh interrupted by a message from the Singapore Police Force like this. Read more.
Leave A Reply
You must be logged in as a Coconuts User to comment.