Dance-to-Music Generation with Encoder-based Textual Inversion

Li, Sifei; Dong, Weiming; Zhang, Yuxin; Tang, Fan; Ma, Chongyang; Deussen, Oliver; Lee, Tong-Yee; Xu, Changsheng

doi:10.1145/3680528.3687562

Dance-to-Music Generation with Encoder-based Textual Inversion

dc.contributor.author	Li, Sifei
dc.contributor.author	Dong, Weiming
dc.contributor.author	Zhang, Yuxin
dc.contributor.author	Tang, Fan
dc.contributor.author	Ma, Chongyang
dc.contributor.author	Deussen, Oliver
dc.contributor.author	Lee, Tong-Yee
dc.contributor.author	Xu, Changsheng
dc.date.accessioned	2025-01-14T09:06:18Z
dc.date.available	2025-01-14T09:06:18Z
dc.date.issued	2024-12-03
dc.description.abstract	The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers’ movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.
dc.description.version	published	deu
dc.identifier.doi	10.1145/3680528.3687562
dc.identifier.ppn	1914528786
dc.identifier.uri	https://kops.uni-konstanz.de/handle/123456789/71846
dc.language.iso	eng
dc.rights	terms-of-use
dc.rights.uri	https://rightsstatements.org/page/InC/1.0/
dc.subject	Dance-to-music generation
dc.subject	Textual inversion
dc.subject	Diffusion models
dc.subject	Pre-trained music generative models
dc.subject.ddc	004
dc.title	Dance-to-Music Generation with Encoder-based Textual Inversion	eng
dc.type	INPROCEEDINGS
dspace.entity.type	Publication
kops.citation.bibtex	@inproceedings{Li2024-12-03Dance-71846, year={2024}, doi={10.1145/3680528.3687562}, title={Dance-to-Music Generation with Encoder-based Textual Inversion}, isbn={979-8-4007-1131-2}, publisher={ACM}, address={New York, NY, USA}, booktitle={SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings}, editor={Igarashi, Takeo and Shamir, Ariel and Zhang, Hao (Richard)}, author={Li, Sifei and Dong, Weiming and Zhang, Yuxin and Tang, Fan and Ma, Chongyang and Deussen, Oliver and Lee, Tong-Yee and Xu, Changsheng}, note={Article Number: 135} }
kops.citation.iso690	LI, Sifei, Weiming DONG, Yuxin ZHANG, Fan TANG, Chongyang MA, Oliver DEUSSEN, Tong-Yee LEE, Changsheng XU, 2024. Dance-to-Music Generation with Encoder-based Textual Inversion. SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia. Tokyo, Japan, 3. Dez. 2024 - 6. Dez. 2024. In: IGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562	deu
kops.citation.iso690	LI, Sifei, Weiming DONG, Yuxin ZHANG, Fan TANG, Chongyang MA, Oliver DEUSSEN, Tong-Yee LEE, Changsheng XU, 2024. Dance-to-Music Generation with Encoder-based Textual Inversion. SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia. Tokyo, Japan, Dec 3, 2024 - Dec 6, 2024. In: IGARASHI, Takeo, ed., Ariel SHAMIR, ed., Hao (Richard) ZHANG, ed.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Available under: doi: 10.1145/3680528.3687562	eng
kops.citation.rdf	<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/71846"> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Tang, Fan</dc:contributor> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:creator>Li, Sifei</dc:creator> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/71846"/> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/71846/4/Li_2-9tl19evibu7i0.pdf"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-14T09:06:18Z</dc:date> <dc:contributor>Li, Sifei</dc:contributor> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:language>eng</dc:language> <dc:rights>terms-of-use</dc:rights> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:contributor>Xu, Changsheng</dc:contributor> <dc:contributor>Deussen, Oliver</dc:contributor> <dc:contributor>Ma, Chongyang</dc:contributor> <dc:contributor>Lee, Tong-Yee</dc:contributor> <dc:creator>Ma, Chongyang</dc:creator> <dc:creator>Dong, Weiming</dc:creator> <dc:contributor>Dong, Weiming</dc:contributor> <dc:creator>Tang, Fan</dc:creator> <dc:creator>Lee, Tong-Yee</dc:creator> <dc:creator>Deussen, Oliver</dc:creator> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/71846/4/Li_2-9tl19evibu7i0.pdf"/> <dc:creator>Zhang, Yuxin</dc:creator> <dcterms:issued>2024-12-03</dcterms:issued> <dc:creator>Xu, Changsheng</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-14T09:06:18Z</dcterms:available> <dc:contributor>Zhang, Yuxin</dc:contributor> <dcterms:title>Dance-to-Music Generation with Encoder-based Textual Inversion</dcterms:title> <dcterms:abstract>The seamless integration of music with dance movements is essential for communicating the artistic intent of a dance piece. This alignment also significantly improves the immersive quality of gaming experiences and animation productions. Although there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly focus on modulating overall characteristics such as genre and emotional tone. They often overlook the nuanced management of temporal rhythm, which is indispensable in crafting music for dance, since it intricately aligns the musical beats with the dancers’ movements. Recognizing this gap, we propose an encoder-based textual inversion technique to augment text-to-music models with visual control, facilitating personalized music generation. Specifically, we develop dual-path rhythm-genre inversion to effectively integrate the rhythm and genre of a dance motion sequence into the textual space of a text-to-music model. Contrary to traditional textual inversion methods, which directly update text embeddings to reconstruct a single target object, our approach utilizes separate rhythm and genre encoders to obtain text embeddings for two pseudo-words, adapting to the varying rhythms and genres. We collect a new dataset called In-the-wild Dance Videos (InDV) and demonstrate that our approach outperforms state-of-the-art methods across multiple evaluation metrics. Furthermore, our method is able to adapt to changes in tempo and effectively integrates with the inherent text-guided generation capability of the pre-trained model. Our source code and demo videos are available at https://github.com/lsfhuihuiff/Dance-to-music_Siggraph_Asia_2024.</dcterms:abstract> </rdf:Description> </rdf:RDF>
kops.conferencefield	SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia, 3. Dez. 2024 - 6. Dez. 2024, Tokyo, Japan	deu
kops.date.conferenceEnd	2024-12-06
kops.date.conferenceStart	2024-12-03
kops.description.funding	{"first":"nsfc","second":"U20B2070"}
kops.description.funding	{"first":"nsfc","second":"62102162"}
kops.description.funding	{"first":"dfg","second":"508324734"}
kops.description.openAccess	openaccessbookpart
kops.flag.knbibliography	true
kops.identifier.nbn	urn:nbn:de:bsz:352-2-9tl19evibu7i0
kops.location.conference	Tokyo, Japan
kops.sourcefield	IGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. <i>SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings</i>. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562	deu
kops.sourcefield.plain	IGARASHI, Takeo, Hrsg., Ariel SHAMIR, Hrsg., Hao (Richard) ZHANG, Hrsg.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Verfügbar unter: doi: 10.1145/3680528.3687562	deu
kops.sourcefield.plain	IGARASHI, Takeo, ed., Ariel SHAMIR, ed., Hao (Richard) ZHANG, ed.. SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings. New York, NY, USA: ACM, 2024, 135. ISBN 979-8-4007-1131-2. Available under: doi: 10.1145/3680528.3687562	eng
kops.title.conference	SIGGRAPH-ASIA '24 : Computer Graphics and Interactive Techniques-Asia
relation.isAuthorOfPublication	b73b5935-736c-45ce-b7c0-bdeaecbca1f0
relation.isAuthorOfPublication	4e85f041-bb89-4e27-b7d6-acd814feacb8
relation.isAuthorOfPublication.latestForDiscovery	b73b5935-736c-45ce-b7c0-bdeaecbca1f0
source.bibliographicInfo.articleNumber	135
source.contributor.editor	Igarashi, Takeo
source.contributor.editor	Shamir, Ariel
source.contributor.editor	Zhang, Hao (Richard)
source.identifier.isbn	979-8-4007-1131-2
source.publisher	ACM
source.publisher.location	New York, NY, USA
source.title	SIGGRAPH Asia 2024 Conference Papers (SA '24), Proceedings

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Li_2-9tl19evibu7i0.pdf
Größe:: 8.05 MB
Format:: Adobe Portable Document Format

Li_2-9tl19evibu7i0.pdfGröße: 8.05 MBDownloads: 197

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 3.96 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

license.txtGröße: 3.96 KBDownloads: 0

Sammlungen

Informatik und Informationswissenschaft: Publikationen