Zurück zur Übersicht

Determining Perceived Text Complexity: An Evaluation of German Sentences Through Student Assessments

full text: PDF
author/s: Boris Thome, Friederike Hertweck, Stefan Conrad
type:Inproceedings
booktitle:Proceedings of the 17th International Conference on Educational Data Mining
pages:714-721
month:July
year:2024
keywords:text complexity, education, dataset, readability
Abstract

Tailoring written texts to a specific audience is of particular importance in settings where the embedded information affects decision-making. Existing methods for measuring text complexity commonly rely on quantitative linguistic features and ignore differences in the readers’ backgrounds. In this paper, we evaluate several machine learning models that determine the complexity of texts as perceived by teenagers in high school prior to deciding on their postsecondary pathways. The models are trained on data collected at German schools where a total of 3262 German sentences were annotated by 157 students with different demographic characteristics, school grades, and language abilities. In contrast to existing methods of determining text complexity, we build a model that is specialized to behave like the target audience, thereby accounting for the diverse backgrounds of the readers. We show that text complexity models benefit from including person-related features and that K-NearestNeighbors and ensemble models perform well in predicting the subjectively perceived text complexity. Furthermore, SHapley Additive exPlanation (SHAP) values reveal that these perceptions not only differ by the text’s linguistic features but also by the students’ math and language skills and by gender.

Heinrich Heine Universität

Datenbanken und Informationssysteme

Lehrstuhlinhaber

Prof. Dr. Stefan Conrad


Universitätsstr. 1
40225 Düsseldorf
Gebäude: 25.12
Etage/Raum: 02.24
Tel.: +49 211 81-14088

Sekretariat

Lisa Lorenz



Universitätsstr. 1
40225 Düsseldorf
Gebäude: 25.12
Etage/Raum: 02.22
Tel.: +49 211 81-11312
Verantwortlich für den Inhalt:  E-Mail senden Datenbanken & Informationssysteme