Publikation:

Dance-to-Music Generation with Encoder-based Textual Inversion

Lade...
Vorschaubild

Dateien

Li_2-9tl19evibu7i0.pdf
Li_2-9tl19evibu7i0.pdfGröße: 8.05 MBDownloads: 15

Datum

2024

Autor:innen

Li, Sifei
Dong, Weiming
Zhang, Yuxin
Ma, Chongyang
Lee, Tong-Yee
Xu, Changsheng

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

National Natural Science Foundation of China: U20B2070
National Natural Science Foundation of China: 62102162
Deutsche Forschungsgemeinschaft (DFG): 508324734

Projekt

Open Access-Veröffentlichung
Open Access Bookpart
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

IGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562

Zusammenfassung

The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers’ movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Dance-to-music generation, Textual inversion, Diffusion models, Pre-trained music generative models

Konferenz

SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia, 3. Dez. 2024 - 6. Dez. 2024, Tokyo, Japan
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690LI, Sifei, Weiming DONG, Yuxin ZHANG, Fan TANG, Chongyang MA, Oliver DEUSSEN, Tong-Yee LEE, Changsheng XU, 2024. Dance-to-Music Generation with Encoder-based Textual Inversion. SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia. Tokyo, Japan, 3. Dez. 2024 - 6. Dez. 2024. In: IGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562
BibTex
@inproceedings{Li2024-12-03Dance-71846,
  year={2024},
  doi={10.1145/3680528.3687562},
  title={Dance-to-Music Generation with Encoder-based Textual Inversion},
  isbn={979-8-4007-1131-2},
  publisher={ACM},
  address={New York, NY, USA},
  booktitle={SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings},
  editor={Igarashi, Takeo and Shamir, Ariel and Zhang, Hao (Richard)},
  author={Li, Sifei and Dong, Weiming and Zhang, Yuxin and Tang, Fan and Ma, Chongyang and Deussen, Oliver and Lee, Tong-Yee and Xu, Changsheng},
  note={Article Number: 135}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/71846">
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Tang, Fan</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Li, Sifei</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/71846"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/71846/4/Li_2-9tl19evibu7i0.pdf"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-14T09:06:18Z</dc:date>
    <dc:contributor>Li, Sifei</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:language>eng</dc:language>
    <dc:rights>terms-of-use</dc:rights>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Xu, Changsheng</dc:contributor>
    <dc:contributor>Deussen, Oliver</dc:contributor>
    <dc:contributor>Ma, Chongyang</dc:contributor>
    <dc:contributor>Lee, Tong-Yee</dc:contributor>
    <dc:creator>Ma, Chongyang</dc:creator>
    <dc:creator>Dong, Weiming</dc:creator>
    <dc:contributor>Dong, Weiming</dc:contributor>
    <dc:creator>Tang, Fan</dc:creator>
    <dc:creator>Lee, Tong-Yee</dc:creator>
    <dc:creator>Deussen, Oliver</dc:creator>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/71846/4/Li_2-9tl19evibu7i0.pdf"/>
    <dc:creator>Zhang, Yuxin</dc:creator>
    <dcterms:issued>2024-12-03</dcterms:issued>
    <dc:creator>Xu, Changsheng</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-14T09:06:18Z</dcterms:available>
    <dc:contributor>Zhang, Yuxin</dc:contributor>
    <dcterms:title>Dance-to-Music Generation with Encoder-based Textual Inversion</dcterms:title>
    <dcterms:abstract>The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers’ movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.</dcterms:abstract>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen