Publikation: ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Deutsche Forschungsgemeinschaft (DFG): 413891298
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
ZHANG, Yuxin, Weiming DONG, Fan TANG, Nisha HUANG, Haibin HUANG, Chongyang MA, Tong-Yee LEE, Oliver DEUSSEN, Changsheng XU, 2023. ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models. In: ACM Transactions on Graphics. Association for Computing Machinery (ACM). 2023, 42(6), 244. ISSN 0730-0301. eISSN 1557-7368. Available under: doi: 10.1145/3618342BibTex
@article{Zhang2023ProSp-68699, year={2023}, doi={10.1145/3618342}, title={ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models}, number={6}, volume={42}, issn={0730-0301}, journal={ACM Transactions on Graphics}, author={Zhang, Yuxin and Dong, Weiming and Tang, Fan and Huang, Nisha and Huang, Haibin and Ma, Chongyang and Lee, Tong-Yee and Deussen, Oliver and Xu, Changsheng}, note={Article Number: 244} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/68699"> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68699/1/Zhang_2-6ixd6hrt29k91.pdf"/> <dc:contributor>Tang, Fan</dc:contributor> <dc:contributor>Ma, Chongyang</dc:contributor> <dc:creator>Lee, Tong-Yee</dc:creator> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68699/1/Zhang_2-6ixd6hrt29k91.pdf"/> <dc:creator>Huang, Nisha</dc:creator> <dc:creator>Ma, Chongyang</dc:creator> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:contributor>Xu, Changsheng</dc:contributor> <dcterms:title>ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models</dcterms:title> <dc:creator>Deussen, Oliver</dc:creator> <dcterms:abstract>Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.</dcterms:abstract> <dcterms:issued>2023</dcterms:issued> <dc:rights>terms-of-use</dc:rights> <dc:creator>Huang, Haibin</dc:creator> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-12-13T07:45:33Z</dcterms:available> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:creator>Xu, Changsheng</dc:creator> <dc:contributor>Huang, Haibin</dc:contributor> <dc:language>eng</dc:language> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:creator>Dong, Weiming</dc:creator> <dc:contributor>Deussen, Oliver</dc:contributor> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/68699"/> <dc:creator>Tang, Fan</dc:creator> <dc:contributor>Huang, Nisha</dc:contributor> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dc:creator>Zhang, Yuxin</dc:creator> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-12-13T07:45:33Z</dc:date> <dc:contributor>Zhang, Yuxin</dc:contributor> <dc:contributor>Lee, Tong-Yee</dc:contributor> <dc:contributor>Dong, Weiming</dc:contributor> </rdf:Description> </rdf:RDF>