The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets

Yu, Zehui; Sen, Indira; Assenmacher, Dennis; Samory, Mattia; Fröhling, Leon; Dahn, Christina; Nozza, Debora; Wagner, Claudia

doi:10.1177/08944393241258771

The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets

dc.contributor.author	Yu, Zehui
dc.contributor.author	Sen, Indira
dc.contributor.author	Assenmacher, Dennis
dc.contributor.author	Samory, Mattia
dc.contributor.author	Fröhling, Leon
dc.contributor.author	Dahn, Christina
dc.contributor.author	Nozza, Debora
dc.contributor.author	Wagner, Claudia
dc.date.accessioned	2024-07-04T08:40:17Z
dc.date.available	2024-07-04T08:40:17Z
dc.date.issued	2025-10
dc.description.abstract	Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.
dc.description.version	published	deu
dc.identifier.doi	10.1177/08944393241258771
dc.identifier.ppn	193761252X
dc.identifier.uri	https://kops.uni-konstanz.de/handle/123456789/70309
dc.language.iso	eng
dc.rights	Attribution 4.0 International
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	data quality
dc.subject	hateful online communication
dc.subject	systematic review
dc.subject	hate targets
dc.subject	multilinguality
dc.subject.ddc	320
dc.title	The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets	eng
dc.type	JOURNAL_ARTICLE
dspace.entity.type	Publication
kops.citation.bibtex	@article{Yu2025-10Unsee-70309, title={The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets}, year={2025}, doi={10.1177/08944393241258771}, number={5}, volume={43}, issn={0894-4393}, journal={Social Science Computer Review}, pages={1114--1144}, author={Yu, Zehui and Sen, Indira and Assenmacher, Dennis and Samory, Mattia and Fröhling, Leon and Dahn, Christina and Nozza, Debora and Wagner, Claudia} }
kops.citation.iso690	YU, Zehui, Indira SEN, Dennis ASSENMACHER, Mattia SAMORY, Leon FRÖHLING, Christina DAHN, Debora NOZZA, Claudia WAGNER, 2025. The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets. In: Social Science Computer Review. Sage. 2025, 43(5), S. 1114-1144. ISSN 0894-4393. eISSN 1552-8286. Verfügbar unter: doi: 10.1177/08944393241258771	deu
kops.citation.iso690	YU, Zehui, Indira SEN, Dennis ASSENMACHER, Mattia SAMORY, Leon FRÖHLING, Christina DAHN, Debora NOZZA, Claudia WAGNER, 2025. The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets. In: Social Science Computer Review. Sage. 2025, 43(5), pp. 1114-1144. ISSN 0894-4393. eISSN 1552-8286. Available under: doi: 10.1177/08944393241258771	eng
kops.citation.rdf	<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/70309"> <dc:contributor>Wagner, Claudia</dc:contributor> <dc:language>eng</dc:language> <dc:creator>Nozza, Debora</dc:creator> <dc:contributor>Assenmacher, Dennis</dc:contributor> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/70309/1/Yu_2-16dpi2waj87j76.pdf"/> <dc:creator>Sen, Indira</dc:creator> <dc:contributor>Yu, Zehui</dc:contributor> <dcterms:title>The Unseen Targets of Hate : A Systematic Review of Hateful Communication Datasets</dcterms:title> <dc:creator>Assenmacher, Dennis</dc:creator> <dc:rights>Attribution 4.0 International</dc:rights> <dcterms:issued>2025-10</dcterms:issued> <dc:contributor>Samory, Mattia</dc:contributor> <dc:creator>Fröhling, Leon</dc:creator> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/> <dcterms:abstract>Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.</dcterms:abstract> <dc:creator>Samory, Mattia</dc:creator> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/70309"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/70309/1/Yu_2-16dpi2waj87j76.pdf"/> <dc:creator>Yu, Zehui</dc:creator> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/> <dc:creator>Wagner, Claudia</dc:creator> <dc:contributor>Fröhling, Leon</dc:contributor> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-07-04T08:40:17Z</dcterms:available> <dc:contributor>Nozza, Debora</dc:contributor> <dc:contributor>Sen, Indira</dc:contributor> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2024-07-04T08:40:17Z</dc:date> <dc:creator>Dahn, Christina</dc:creator> <dc:contributor>Dahn, Christina</dc:contributor> </rdf:Description> </rdf:RDF>
kops.description.openAccess	openaccesshybrid
kops.flag.isPeerReviewed	true
kops.flag.knbibliography	true
kops.identifier.nbn	urn:nbn:de:bsz:352-2-16dpi2waj87j76
kops.sourcefield	Social Science Computer Review. Sage. 2025, <b>43</b>(5), S. 1114-1144. ISSN 0894-4393. eISSN 1552-8286. Verfügbar unter: doi: 10.1177/08944393241258771	deu
kops.sourcefield.plain	Social Science Computer Review. Sage. 2025, 43(5), S. 1114-1144. ISSN 0894-4393. eISSN 1552-8286. Verfügbar unter: doi: 10.1177/08944393241258771	deu
kops.sourcefield.plain	Social Science Computer Review. Sage. 2025, 43(5), pp. 1114-1144. ISSN 0894-4393. eISSN 1552-8286. Available under: doi: 10.1177/08944393241258771	eng
relation.isAuthorOfPublication	a83c9eb2-9f4e-4db5-a517-6aa9c09e5811
relation.isAuthorOfPublication.latestForDiscovery	a83c9eb2-9f4e-4db5-a517-6aa9c09e5811
source.bibliographicInfo.fromPage	1114
source.bibliographicInfo.issue	5
source.bibliographicInfo.toPage	1144
source.bibliographicInfo.volume	43
source.identifier.eissn	1552-8286
source.identifier.issn	0894-4393
source.periodicalTitle	Social Science Computer Review
source.publisher	Sage

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Yu_2-16dpi2waj87j76.pdf
Größe:: 2.53 MB
Format:: Adobe Portable Document Format

Yu_2-16dpi2waj87j76.pdfGröße: 2.53 MBDownloads: 148

Sammlungen

Politik- und Verwaltungswissenschaft: Publikationen