Publikation:

Decoupling State Representation Methods from Reinforcement Learning in Car Racing

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2021

Autor:innen

Montoya, Juan
Vogt, Julia
Wiering, Marco

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

ROCHA, Ana Paula, ed., Luc STEELS, ed., Jaap VAN DEN HERIK, ed.. Proceedings of the 13th International Conference on Agents and Artificial Intelligence. Volume 2: ICAART. Setúbal, Portugal: SciTePress, 2021, pp. 752-759. eISSN 2184-433X. ISBN 9789897584848. Available under: doi: 10.5220/0010237507520759

Zusammenfassung

In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI’s car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; howev er, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations has several benefits for solving RL tasks.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

13th International Conference on Agents and Artificial Intelligence : ICAART, 4. Feb. 2021 - 6. Feb. 2021, Vienna, Austria
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690MONTOYA, Juan, Imant DAUNHAWER, Julia VOGT, Marco WIERING, 2021. Decoupling State Representation Methods from Reinforcement Learning in Car Racing. 13th International Conference on Agents and Artificial Intelligence : ICAART. Vienna, Austria, 4. Feb. 2021 - 6. Feb. 2021. In: ROCHA, Ana Paula, ed., Luc STEELS, ed., Jaap VAN DEN HERIK, ed.. Proceedings of the 13th International Conference on Agents and Artificial Intelligence. Volume 2: ICAART. Setúbal, Portugal: SciTePress, 2021, pp. 752-759. eISSN 2184-433X. ISBN 9789897584848. Available under: doi: 10.5220/0010237507520759
BibTex
@inproceedings{Montoya2021Decou-54329,
  year={2021},
  doi={10.5220/0010237507520759},
  title={Decoupling State Representation Methods from Reinforcement Learning in Car Racing},
  isbn={9789897584848},
  publisher={SciTePress},
  address={Setúbal, Portugal},
  booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence. Volume 2: ICAART},
  pages={752--759},
  editor={Rocha, Ana Paula and Steels, Luc and van den Herik, Jaap},
  author={Montoya, Juan and Daunhawer, Imant and Vogt, Julia and Wiering, Marco}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/54329">
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-07-16T12:48:47Z</dcterms:available>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-07-16T12:48:47Z</dc:date>
    <dcterms:title>Decoupling State Representation Methods from Reinforcement Learning in Car Racing</dcterms:title>
    <dc:creator>Vogt, Julia</dc:creator>
    <dcterms:issued>2021</dcterms:issued>
    <dc:creator>Montoya, Juan</dc:creator>
    <dc:contributor>Daunhawer, Imant</dc:contributor>
    <dc:contributor>Wiering, Marco</dc:contributor>
    <dc:contributor>Vogt, Julia</dc:contributor>
    <dc:creator>Wiering, Marco</dc:creator>
    <dc:language>eng</dc:language>
    <dc:creator>Daunhawer, Imant</dc:creator>
    <dc:contributor>Montoya, Juan</dc:contributor>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/54329"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:abstract xml:lang="eng">In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI’s car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; howev er, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations has several benefits for solving RL tasks.</dcterms:abstract>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen