Determining Perceived Text Complexity: An Evaluation of German Sentences Through Student Assessments
full text: |
![]() |
author/s: | Boris Thome, Friederike Hertweck, Stefan Conrad |
type: | Inproceedings |
booktitle: | Proceedings of the 17th International Conference on Educational Data Mining |
pages: | 714-721 |
month: | July |
year: | 2024 |
keywords: | text complexity, education, dataset, readability |
Tailoring written texts to a specific audience is of particular
importance in settings where the embedded information affects decision-making. Existing methods for measuring text
complexity commonly rely on quantitative linguistic features
and ignore differences in the readers’ backgrounds. In this
paper, we evaluate several machine learning models that determine the complexity of texts as perceived by teenagers in
high school prior to deciding on their postsecondary pathways. The models are trained on data collected at German
schools where a total of 3262 German sentences were annotated by 157 students with different demographic characteristics, school grades, and language abilities. In contrast to
existing methods of determining text complexity, we build
a model that is specialized to behave like the target audience, thereby accounting for the diverse backgrounds of
the readers. We show that text complexity models benefit
from including person-related features and that K-NearestNeighbors and ensemble models perform well in predicting
the subjectively perceived text complexity. Furthermore,
SHapley Additive exPlanation (SHAP) values reveal that
these perceptions not only differ by the text’s linguistic features but also by the students’ math and language skills and
by gender.