Zurück zur Übersicht

Dealing With Data Bias in Classification: Can Generated Data Ensure Representation and Fairness?

full text: PDF
author/s: Manh Khoi Duong, Stefan Conrad
booktitle:Big Data Analytics and Knowledge Discovery: 25th International Conference, DaWaK 2023, Penang, Malaysia, August 28-30, 2023, Proceedings
publisher:Springer Nature
location:Penang, Malaysia

Fairness is a critical consideration in data analytics and knowledge discovery because biased data can perpetuate inequalities through further pipelines. In this paper, we propose a novel pre-processing method to address fairness issues in classification tasks by adding synthetic data points for more representativeness. Our approach utilizes a statistical model to generate new data points, which are evaluated for fairness using discrimination measures. These measures aim to quantify the disparities between demographic groups that may be induced by the bias in data. Our experimental results demonstrate that the proposed method effectively reduces bias for several machine learning classifiers without compromising prediction performance. Moreover, our method outperforms existing pre-processing methods on multiple datasets by Pareto-dominating them in terms of performance and fairness. Our findings suggest that our method can be a valuable tool for data analysts and knowledge discovery practitioners who seek to yield for fair, diverse, and representative data.

Heinrich Heine Universität

Datenbanken und Informationssysteme


Prof. Dr. Stefan Conrad

Universitätsstr. 1
40225 Düsseldorf
Gebäude: 25.12
Etage/Raum: 02.24
Tel.: +49 211 81-14088


Lisa Lorenz

Universitätsstr. 1
40225 Düsseldorf
Gebäude: 25.12
Etage/Raum: 02.22
Tel.: +49 211 81-11312
Verantwortlich für den Inhalt:  E-Mail senden Datenbanken & Informationssysteme