Once Upon a Time in the Test : Sex Differences in the Prediction of Academic Achievement and Job Performance
2013, Schult, Johannes
The present thesis answers several open questions regarding the gender fairness of scholastic aptitude tests and provides practical advice how to assess test fairness and minimize predictive bias.
There are several reasons to use aptitude tests in the college admission process: they offer standardized scores, provide incremental validity over high school records, and may influence the educational decisions of applicants. Despite the usefulness of these tests in practice, their construct validity, the reasons for group differences and other psychometric aspects usually remain unclear. Consequently, a closer look at the fairness of admission tools reveals many gray areas where improper test use and imprecise conceptualization cannot be easily distinguished.
Test fairness in a narrow, psychometric sense is based on the lack of systematic bias. Three types of bias are generally distinguished: differential item functioning (DIF; an item is more difficult (or easier) for a particular subgroup after controlling for the ability it is supposed to measure), differential validity (different criterion validities for subgroups), and differential prediction (performances of subgroups are systemically underpredicted (or overpredicted)).
Four studies have been conducted to shed light on the extent and possible explanations of sex-specific bias associated with scholastic aptitude testing and the prediction of academic and vocational performance. In the first two studies, special attention was given to the role of intelligence facets, because general mental ability (g) and scholastic aptitude overlap conceptually - reasoning is among the constructs assessed by most college admission tests - and are highly correlated.
Study 1 provides a detailed look at the situation in Germany. Three student samples show various levels of differential prediction. Across all samples, mathematical reasoning yields the most favorable predictions for men (i.e., men's college grades are overpredicted). High School Grade Point Average (HSGPA; "Abiturnote"), on the other hand, is the least favorable predictor for men's academic performance, although it still underpredicts women's performance in two of the samples.
Study 2 explores the construct validity of two German tests of subject-specific scholastic aptitude. The link between intelligence and aptitude test score is confirmed. Small sex differences in validities suggest a stronger relationship between verbal reasoning and scholastic aptitude for women than for men.
Study 3 broadens the scope by looking at the careers of university students two years after their graduation. Valid predictors for success at work include personal interests, occupational status, math grades, and conscientiousness. The gender pay gap remains even after controlling for socio-economic status and motivational factors.
Study 4 demonstrates the aggregation of differential prediction findings with meta-analytical methods. The underprediction of women's college grades by aptitude tests can be reduced (but not eliminated) by using HSGPA and test scores as predictors. Graduate tests do not show differential prediction.
Based on these findings, two promising explanations for differential prediction are scrutinized. On the one hand, sex differences in vocational interests exist which are associated with choice of major and career paths. On the other hand, women appear to approach academic challenges in a more holistic way than men, which interferes with their admission test performance, but facilitates their academic performance, eventually.
Although some topics still need further attention (e.g., construct validity of grades, availability of large-scale data sets, socio-economic consequences of admission testing), my findings clarify the psychometric properties of scholastic aptitude tests and provide immediate suggestions for weighting subscales in order to maximize gender fairness.