Dance-to-Music Generation with Encoder-based Textual Inversion

dc.contributor.authorLi, Sifei
dc.contributor.authorDong, Weiming
dc.contributor.authorZhang, Yuxin
dc.contributor.authorTang, Fan
dc.contributor.authorMa, Chongyang
dc.contributor.authorDeussen, Oliver
dc.contributor.authorLee, Tong-Yee
dc.contributor.authorXu, Changsheng
dc.date.accessioned2025-01-14T09:06:18Z
dc.date.available2025-01-14T09:06:18Z
dc.date.issued2024-12-03
dc.description.abstractThe seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers’ movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.
dc.description.versionpublisheddeu
dc.identifier.doi10.1145/3680528.3687562
dc.identifier.ppn1914528786
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/71846
dc.language.isoeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subjectDance-to-music generation
dc.subjectTextual inversion
dc.subjectDiffusion models
dc.subjectPre-trained music generative models
dc.subject.ddc004
dc.titleDance-to-Music Generation with Encoder-based Textual Inversioneng
dc.typeINPROCEEDINGS
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Li2024-12-03Dance-71846,
  year={2024},
  doi={10.1145/3680528.3687562},
  title={Dance-to-Music Generation with Encoder-based Textual Inversion},
  isbn={979-8-4007-1131-2},
  publisher={ACM},
  address={New York, NY, USA},
  booktitle={SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings},
  editor={Igarashi, Takeo and Shamir, Ariel and Zhang, Hao (Richard)},
  author={Li, Sifei and Dong, Weiming and Zhang, Yuxin and Tang, Fan and Ma, Chongyang and Deussen, Oliver and Lee, Tong-Yee and Xu, Changsheng},
  note={Article Number: 135}
}
kops.citation.iso690LI, Sifei, Weiming DONG, Yuxin ZHANG, Fan TANG, Chongyang MA, Oliver DEUSSEN, Tong-Yee LEE, Changsheng XU, 2024. Dance-to-Music Generation with Encoder-based Textual Inversion. SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia. Tokyo, Japan, 3. Dez. 2024 - 6. Dez. 2024. In: IGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562deu
kops.citation.iso690LI, Sifei, Weiming DONG, Yuxin ZHANG, Fan TANG, Chongyang MA, Oliver DEUSSEN, Tong-Yee LEE, Changsheng XU, 2024. Dance-to-Music Generation with Encoder-based Textual Inversion. SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia. Tokyo, Japan, Dec 3, 2024 - Dec 6, 2024. In: IGARASHI, Takeo, ed., Ariel SHAMIR, ed., Hao (Richard) ZHANG, ed.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Available under: doi: 10.1145/3680528.3687562eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/71846">
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Tang, Fan</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Li, Sifei</dc:creator>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/71846"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/71846/4/Li_2-9tl19evibu7i0.pdf"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-14T09:06:18Z</dc:date>
    <dc:contributor>Li, Sifei</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:language>eng</dc:language>
    <dc:rights>terms-of-use</dc:rights>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Xu, Changsheng</dc:contributor>
    <dc:contributor>Deussen, Oliver</dc:contributor>
    <dc:contributor>Ma, Chongyang</dc:contributor>
    <dc:contributor>Lee, Tong-Yee</dc:contributor>
    <dc:creator>Ma, Chongyang</dc:creator>
    <dc:creator>Dong, Weiming</dc:creator>
    <dc:contributor>Dong, Weiming</dc:contributor>
    <dc:creator>Tang, Fan</dc:creator>
    <dc:creator>Lee, Tong-Yee</dc:creator>
    <dc:creator>Deussen, Oliver</dc:creator>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/71846/4/Li_2-9tl19evibu7i0.pdf"/>
    <dc:creator>Zhang, Yuxin</dc:creator>
    <dcterms:issued>2024-12-03</dcterms:issued>
    <dc:creator>Xu, Changsheng</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-14T09:06:18Z</dcterms:available>
    <dc:contributor>Zhang, Yuxin</dc:contributor>
    <dcterms:title>Dance-to-Music Generation with Encoder-based Textual Inversion</dcterms:title>
    <dcterms:abstract>The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers’ movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.</dcterms:abstract>
  </rdf:Description>
</rdf:RDF>
kops.conferencefieldSIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia, 3. Dez. 2024 - 6. Dez. 2024, Tokyo, Japandeu
kops.date.conferenceEnd2024-12-06
kops.date.conferenceStart2024-12-03
kops.description.funding{"first":"nsfc","second":"U20B2070"}
kops.description.funding{"first":"nsfc","second":"62102162"}
kops.description.funding{"first":"dfg","second":"508324734"}
kops.description.openAccessopenaccessbookpart
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-9tl19evibu7i0
kops.location.conferenceTokyo, Japan
kops.sourcefieldIGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. <i>SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings</i>. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562deu
kops.sourcefield.plainIGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562deu
kops.sourcefield.plainIGARASHI, Takeo, ed., Ariel SHAMIR, ed., Hao (Richard) ZHANG, ed.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Available under: doi: 10.1145/3680528.3687562eng
kops.title.conferenceSIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia
relation.isAuthorOfPublicationb73b5935-736c-45ce-b7c0-bdeaecbca1f0
relation.isAuthorOfPublication4e85f041-bb89-4e27-b7d6-acd814feacb8
relation.isAuthorOfPublication.latestForDiscoveryb73b5935-736c-45ce-b7c0-bdeaecbca1f0
source.bibliographicInfo.articleNumber135
source.contributor.editorIgarashi, Takeo
source.contributor.editorShamir, Ariel
source.contributor.editorZhang, Hao (Richard)
source.identifier.isbn979-8-4007-1131-2
source.publisherACM
source.publisher.locationNew York, NY, USA
source.titleSIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Li_2-9tl19evibu7i0.pdf
Größe:
8.05 MB
Format:
Adobe Portable Document Format
Li_2-9tl19evibu7i0.pdf
Li_2-9tl19evibu7i0.pdfGröße: 8.05 MBDownloads: 197

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
3.96 KB
Format:
Item-specific license agreed upon to submission
Beschreibung:
license.txt
license.txtGröße: 3.96 KBDownloads: 0