Improving the selection of news reports for event coding using ensemble classification

dc.contributor.authorCroicu, Mihai
dc.contributor.authorWeidmann, Nils B.
dc.date.accessioned2016-02-02T14:58:41Z
dc.date.available2016-02-02T14:58:41Z
dc.date.issued2015eng
dc.description.abstractManual coding of political events from news reports is extremely expensive and time-consuming, whereas completely automatic coding has limitations when it comes to the precision and granularity of the data collected. In this paper, we introduce an alternative strategy by establishing a semi-automatic pipeline, where an automatic classification system eliminates irrelevant source material before further coding is done by humans. Our pipeline relies on a high-performance supervised heterogeneous ensemble classifier working on extremely unbalanced training classes. Deployed to the Mass Mobilization on Autocracies database on protest, the system is able to reduce the number of source articles to be human-coded by more than half, while keeping over 90% of the relevant material.eng
dc.description.versionpublishedeng
dc.identifier.doi10.1177/2053168015615596eng
dc.identifier.ppn455042357
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/32815
dc.language.isoengeng
dc.rightsAttribution-NonCommercial 3.0 Unported
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/
dc.subject.ddc320eng
dc.titleImproving the selection of news reports for event coding using ensemble classificationeng
dc.typeJOURNAL_ARTICLEeng
dspace.entity.typePublication
kops.citation.bibtex
@article{Croicu2015Impro-32815,
  year={2015},
  doi={10.1177/2053168015615596},
  title={Improving the selection of news reports for event coding using ensemble classification},
  number={4},
  volume={2},
  journal={Research and Politics},
  author={Croicu, Mihai and Weidmann, Nils B.}
}
kops.citation.iso690CROICU, Mihai, Nils B. WEIDMANN, 2015. Improving the selection of news reports for event coding using ensemble classification. In: Research and Politics. 2015, 2(4). eISSN 2053-1680. Available under: doi: 10.1177/2053168015615596deu
kops.citation.iso690CROICU, Mihai, Nils B. WEIDMANN, 2015. Improving the selection of news reports for event coding using ensemble classification. In: Research and Politics. 2015, 2(4). eISSN 2053-1680. Available under: doi: 10.1177/2053168015615596eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/32815">
    <dc:creator>Croicu, Mihai</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2016-02-02T14:58:41Z</dc:date>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <dcterms:abstract xml:lang="eng">Manual coding of political events from news reports is extremely expensive and time-consuming, whereas completely automatic coding has limitations when it comes to the precision and granularity of the data collected. In this paper, we introduce an alternative strategy by establishing a semi-automatic pipeline, where an automatic classification system eliminates irrelevant source material before further coding is done by humans. Our pipeline relies on a high-performance supervised heterogeneous ensemble classifier working on extremely unbalanced training classes. Deployed to the Mass Mobilization on Autocracies database on protest, the system is able to reduce the number of source articles to be human-coded by more than half, while keeping over 90% of the relevant material.</dcterms:abstract>
    <dc:creator>Weidmann, Nils B.</dc:creator>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/32815/3/Croicu_0-309570.pdf"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <dcterms:issued>2015</dcterms:issued>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/32815/3/Croicu_0-309570.pdf"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Croicu, Mihai</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/52"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/52"/>
    <dcterms:title>Improving the selection of news reports for event coding using ensemble classification</dcterms:title>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/32815"/>
    <dc:rights>Attribution-NonCommercial 3.0 Unported</dc:rights>
    <dc:language>eng</dc:language>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2016-02-02T14:58:41Z</dcterms:available>
    <dc:contributor>Weidmann, Nils B.</dc:contributor>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc/3.0/"/>
  </rdf:Description>
</rdf:RDF>
kops.description.openAccessopenaccessgoldeng
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-0-309570
kops.relation.uniknProjectTitleSofja Kovalevskaja-Preis: The Web as a Curse or Blessing? Ethnic Mobilization in the Information Age
kops.sourcefieldResearch and Politics. 2015, <b>2</b>(4). eISSN 2053-1680. Available under: doi: 10.1177/2053168015615596deu
kops.sourcefield.plainResearch and Politics. 2015, 2(4). eISSN 2053-1680. Available under: doi: 10.1177/2053168015615596deu
kops.sourcefield.plainResearch and Politics. 2015, 2(4). eISSN 2053-1680. Available under: doi: 10.1177/2053168015615596eng
relation.isAuthorOfPublication0d17e0e1-ceb4-4f29-b742-158f78d0aa95
relation.isAuthorOfPublication.latestForDiscovery0d17e0e1-ceb4-4f29-b742-158f78d0aa95
source.bibliographicInfo.issue4eng
source.bibliographicInfo.volume2eng
source.identifier.eissn2053-1680eng
source.periodicalTitleResearch and Politicseng

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Croicu_0-309570.pdf
Größe:
544.6 KB
Format:
Adobe Portable Document Format
Beschreibung:
Croicu_0-309570.pdf
Croicu_0-309570.pdfGröße: 544.6 KBDownloads: 467

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
3.88 KB
Format:
Item-specific license agreed upon to submission
Beschreibung:
license.txt
license.txtGröße: 3.88 KBDownloads: 0