Publikation: ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Deutsche Forschungsgemeinschaft (DFG): 413891298
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
ZHANG, Yuxin, Weiming DONG, Fan TANG, Nisha HUANG, Haibin HUANG, Chongyang MA, Tong-Yee LEE, Oliver DEUSSEN, Changsheng XU, 2023. ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models. In: ACM Transactions on Graphics. Association for Computing Machinery (ACM). 2023, 42(6), 244. ISSN 0730-0301. eISSN 1557-7368. Available under: doi: 10.1145/3618342BibTex
@article{Zhang2023ProSp-68699,
year={2023},
doi={10.1145/3618342},
title={ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models},
number={6},
volume={42},
issn={0730-0301},
journal={ACM Transactions on Graphics},
author={Zhang, Yuxin and Dong, Weiming and Tang, Fan and Huang, Nisha and Huang, Haibin and Ma, Chongyang and Lee, Tong-Yee and Deussen, Oliver and Xu, Changsheng},
note={Article Number: 244}
}RDF
<rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/68699">
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68699/1/Zhang_2-6ixd6hrt29k91.pdf"/>
<dc:contributor>Tang, Fan</dc:contributor>
<dc:contributor>Ma, Chongyang</dc:contributor>
<dc:creator>Lee, Tong-Yee</dc:creator>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68699/1/Zhang_2-6ixd6hrt29k91.pdf"/>
<dc:creator>Huang, Nisha</dc:creator>
<dc:creator>Ma, Chongyang</dc:creator>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dc:contributor>Xu, Changsheng</dc:contributor>
<dcterms:title>ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models</dcterms:title>
<dc:creator>Deussen, Oliver</dc:creator>
<dcterms:abstract>Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.</dcterms:abstract>
<dcterms:issued>2023</dcterms:issued>
<dc:rights>terms-of-use</dc:rights>
<dc:creator>Huang, Haibin</dc:creator>
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-12-13T07:45:33Z</dcterms:available>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dc:creator>Xu, Changsheng</dc:creator>
<dc:contributor>Huang, Haibin</dc:contributor>
<dc:language>eng</dc:language>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dc:creator>Dong, Weiming</dc:creator>
<dc:contributor>Deussen, Oliver</dc:contributor>
<bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/68699"/>
<dc:creator>Tang, Fan</dc:creator>
<dc:contributor>Huang, Nisha</dc:contributor>
<dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
<dc:creator>Zhang, Yuxin</dc:creator>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-12-13T07:45:33Z</dc:date>
<dc:contributor>Zhang, Yuxin</dc:contributor>
<dc:contributor>Lee, Tong-Yee</dc:contributor>
<dc:contributor>Dong, Weiming</dc:contributor>
</rdf:Description>
</rdf:RDF>