Once Upon a Time in the Test : Sex Differences in the Prediction of Academic Achievement and Job Performance

2013, Schult, Johannes

The present thesis answers several open questions regarding the gender fairness of scholastic aptitude tests and provides practical advice how to assess test fairness and minimize predictive bias.

There are several reasons to use aptitude tests in the college admission process: they offer standardized scores, provide incremental validity over high school records, and may influence the educational decisions of applicants. Despite the usefulness of these tests in practice, their construct validity, the reasons for group differences and other psychometric aspects usually remain unclear. Consequently, a closer look at the fairness of admission tools reveals many gray areas where improper test use and imprecise conceptualization cannot be easily distinguished.

Test fairness in a narrow, psychometric sense is based on the lack of systematic bias. Three types of bias are generally distinguished: differential item functioning (DIF; an item is more difficult (or easier) for a particular subgroup after controlling for the ability it is supposed to measure), differential validity (different criterion validities for subgroups), and differential prediction (performances of subgroups are systemically underpredicted (or overpredicted)).

Four studies have been conducted to shed light on the extent and possible explanations of sex-specific bias associated with scholastic aptitude testing and the prediction of academic and vocational performance. In the first two studies, special attention was given to the role of intelligence facets, because general mental ability (g) and scholastic aptitude overlap conceptually - reasoning is among the constructs assessed by most college admission tests - and are highly correlated.

Study 1 provides a detailed look at the situation in Germany. Three student samples show various levels of differential prediction. Across all samples, mathematical reasoning yields the most favorable predictions for men (i.e., men's college grades are overpredicted). High School Grade Point Average (HSGPA; "Abiturnote"), on the other hand, is the least favorable predictor for men's academic performance, although it still underpredicts women's performance in two of the samples.

Study 2 explores the construct validity of two German tests of subject-specific scholastic aptitude. The link between intelligence and aptitude test score is confirmed. Small sex differences in validities suggest a stronger relationship between verbal reasoning and scholastic aptitude for women than for men.

Study 3 broadens the scope by looking at the careers of university students two years after their graduation. Valid predictors for success at work include personal interests, occupational status, math grades, and conscientiousness. The gender pay gap remains even after controlling for socio-economic status and motivational factors.

Study 4 demonstrates the aggregation of differential prediction findings with meta-analytical methods. The underprediction of women's college grades by aptitude tests can be reduced (but not eliminated) by using HSGPA and test scores as predictors. Graduate tests do not show differential prediction.

Based on these findings, two promising explanations for differential prediction are scrutinized. On the one hand, sex differences in vocational interests exist which are associated with choice of major and career paths. On the other hand, women appear to approach academic challenges in a more holistic way than men, which interferes with their admission test performance, but facilitates their academic performance, eventually.

Although some topics still need further attention (e.g., construct validity of grades, availability of large-scale data sets, socio-economic consequences of admission testing), my findings clarify the psychometric properties of scholastic aptitude tests and provide immediate suggestions for weighting subscales in order to maximize gender fairness.


The Influence of Social Capital and Tolerance on Democratic Performance

2008, Schult, Johannes

The present paper deals with the questions to what extend does social capital influence democratic values directly and whether there is an indirect influence through interpersonal and political tolerance. Data from the German and the Swiss part of the European Values Survey (EVS) mid-90s wave are used for the empirical analysis. Conceptual and methodological effects influence the structural equation model. Individual social capital appears to have a moderate influence on democratic attitude, whereas the role of tolerance is of minor importance in Germany and neglectable in Switzerland. The results support the notion that social trust and civic associations foster democratic values. Both personal and political tolerance can be regarded as separate concepts with rather weak relationships in the given framework. The lack of influence of political tolerance on democratic attitude is remarkable, but can partly be attributed to the respective tolerance items, which are binary and do not differentiate between degrees of tolerance.


Categorial Differences in Affective Picture Perception

2007, Schult, Johannes

Arbeiten zur affektiven Modulation von Verhaltens- und physiologischen Parametern zeigen häufig einen Verarbeitungsvorteil von erregenden angenehmen und unangenehmen gegenüber neutralen Reizen. Davon ausgehend wurde die Erkennungsleistung von Bildern untersucht, wobei versucht wurde mögliche perzeptuelle Unterschiede zwischen den Kategorien zu minimieren. Eine Auswahl von jeweils 180 angenehmen, neutralen und unangenehmen Schwarzweißstimuli wurde verwendet. Die durchschnittliche Helligkeit und Komplexität der Bilder in diesen Valenzkategorien wurde kontrolliert. Ein sandwichmaskierter Zielreiz wurde präsentiert (13, 27 oder 40 ms). Anschließend mussten die Probanden entscheiden, ob es sich bei einem Kontrollbild um das Zielbild handelt oder nicht, sowie die subjektive Sicherheit ihrer Einschätzung angeben. Es zeigte sich ein linearer Effekt der Präsentationsdauer auf die Erkennungsleistung für alle Bildkategorien: Je länger der Zielreiz gezeigt wurde, desto mehr richtige Antworten gab es und desto kürzer waren die Antwortzeiten. Für die einzelnen Präsentationsdauern zeigte sich kein klarer Effekt der Zielbildvalenz auf die Erkennungsleistung. Bei drei der 19 Versuchspersonen zeigten sich in mindestens einer Präsentationsdauerbedingung signifikante Unterschiede zwischen den Valenzkategorien, die allerdings keine einheitliche Richtung hatten. In Durchgängen mit sehr erregenden Bildern gab es weniger richtige Antworten als in Durchgängen mit niedrig erregenden Bildern. Dieses Ergebnis deutet darauf hin, dass emotionale Prozesse, die in der Gegenwart von vielen Reizen, die um Aufmerksamkeit und Verarbeitung konkurrieren, vermutlich optimal ablaufen, die Bildidentifikation im aktuellen Experiment behindern.