Can the College Board Fix the August SAT to Avoid Another June Scoring Debacle?
The College Board is in hot water. After administering an easy June SAT with a brutal curve, critics came out of the woodwork to voice their discontent, and tens of thousands of students signed an online petition demanding that their tests be rescored. June was not the first time the SAT administered an easier test that resulted in a harder curve – occasional variation of this nature does occur from time to time, and I initially attributed the June scoring debacle to such random variation. I’ve since come to understand that the scoring issues on the June SAT were not a reflection of natural variation, but resulted from specific decisions made by the College Board. The wound was essentially self-inflicted, and the College Board is at risk of repeating its errors on subsequent SATs. Will the College Board be able address the underlying issues before it administers another problematic test in August?
Shifting away from ETS
As it undertook the process of designing and writing the new SAT, the College Board decided to distance itself from its historic partner, the Educational Testing Service (ETS). The ETS, a non-profit organization, has for decades designed SATs, GREs, TOEFL tests and many other assessments. With the dawning of the Redesigned SAT, the College Board shifted the ETS from its test creation role to a consulting and test administration role, and the College Board took responsibility for generating and pre-testing questions.
Dropping the traditional method of pre-testing questions
To shorten the length of the SAT, putting it in line with the ACT, the College Board decided to retire its time-honored experimental section. For decades the College Board pre-tested items using experimental testing sections that appeared on every official SAT. Students across the country knew that one of their ten test sections would not count towards their score, but would help build future SATs.
Because students could not be certain which of the SAT test sections was experimental, they approached every section as if it counted. This provided a robust national sample of students to pre-test questions to gauge levels of difficulty and to ensure that questions functioned in a similar way for all students, without bias. When the SAT dropped the experimental section, it lost an effective means of pre-testing questions using a nationally representative sample of students.
Adding a flawed pre-testing method
To compensate for the loss of its experimental section, the College Board decided to pre-test questions on a subset of students who had opted out of the SAT essay. This decision was criticized, as pundits questioned whether the sample of pre-tested students would be representative and whether the data gathered would be robust. How seriously would students treat this new section, knowing that it would have zero bearing on their test scores? Would a subset of students who elected not to take the SAT’s optional essay be representative of the full national sample of students?
Although there are statistical methods to identify students who did not take the questions seriously and remove their data from the sample, the approach raises concerns. Would the students who took this test section seriously provide enough data to prepare questions for a national sample?
Students applying to the most selective colleges would be much more likely to take the SAT essay and avoid the experimental section. The College Board reported that 70% of SAT takers took the essay in 2017, and it’s safe to say that students shooting for the most selective schools in the nation were included in that 70%. Until recently, schools such as Harvard, Yale, Duke, Stanford, Princeton and others required the SAT essay for admission. Students applying to those top schools would naturally take the SAT essay and would never be exposed to the experimental SAT sections. Losing this group of students for the pre-testing process was a major factor in the College Board creating a test that was too easy.
Another issue may have affected the sampling strategy. Within the limited sample of students who received the experimental section, a higher proportion of male students may have dismissed the section.. Given the research on male and female student behavior in academia, this would not be remotely surprising. Losing more male students in the pretesting process would be problematic, especially for assessing the relative difficulty of math items. Male students are overrepresented at the top of the SAT math scale. In the 2016 Profile Report, the College Board reported that of the 117,067 students with SAT math scores in the 700-800 point range, males comprised 61.5% of the total and females 38.5% of the total. If too many male students ignored the experimental section, the College Board’s appraisal of the relatively difficulty of math items might have been compromised.
So the College Board spent several years pre-testing questions on a sample of students that was not representative of the broader population of test takers. It administered numerous SATs without incident between March 2016 and May 2018, but the June 2018 SAT revealed a fundamental weakness in the pre-testing strategy. The difficulty level of questions was established using a sample of students that did not include the strongest test takers. When students across the country took the June SAT, the pre-tested “hard” items did not function as expected, and too many students answered too many problems correctly. This resulted in a very steep scoring curve and a wave of student protests and widespread criticism. The challenge for the College Board is that the August SAT has been pre-tested in a similar fashion, and has the same potential weakness in terms of measured difficulty level. The College Board is aware of this risk and is working to ensure the scoring fiasco does not repeat.
Meanwhile, the ACT, Inc. just announced that it is adding an experimental section to the ACT, to be required by all students on every test. Pre-testing is serious business. Experimental sections have been a staple of standardized tests for years, and for good reason. If test writers don’t properly determine how test questions function, downstream scoring issues like the one we saw in June can occur with greater frequency. Will the College Board fix the August test in time? Most likely yes. There’s too much at stake to miss again. And we’ll know within a matter of weeks.