Publikation:

ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

Lade...
Vorschaubild

Dateien

Zhang_2-6ixd6hrt29k91.pdf
Zhang_2-6ixd6hrt29k91.pdfGröße: 16.11 MBDownloads: 43

Datum

2023

Autor:innen

Zhang, Yuxin
Dong, Weiming
Huang, Nisha
Huang, Haibin
Ma, Chongyang
Lee, Tong-Yee
Xu, Changsheng

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

DOI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

National Natural Science Foundation of China: 61832016, 62102162, U20B2070
Deutsche Forschungsgemeinschaft (DFG): 413891298

Projekt

Open Access-Veröffentlichung
Open Access Hybrid
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

ACM Transactions on Graphics. Association for Computing Machinery (ACM). 2023, 42(6), 244. ISSN 0730-0301. eISSN 1557-7368. Available under: doi: 10.1145/3618342

Zusammenfassung

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690ZHANG, Yuxin, Weiming DONG, Fan TANG, Nisha HUANG, Haibin HUANG, Chongyang MA, Tong-Yee LEE, Oliver DEUSSEN, Changsheng XU, 2023. ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models. In: ACM Transactions on Graphics. Association for Computing Machinery (ACM). 2023, 42(6), 244. ISSN 0730-0301. eISSN 1557-7368. Available under: doi: 10.1145/3618342
BibTex
@article{Zhang2023ProSp-68699,
  year={2023},
  doi={10.1145/3618342},
  title={ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models},
  number={6},
  volume={42},
  issn={0730-0301},
  journal={ACM Transactions on Graphics},
  author={Zhang, Yuxin and Dong, Weiming and Tang, Fan and Huang, Nisha and Huang, Haibin and Ma, Chongyang and Lee, Tong-Yee and Deussen, Oliver and Xu, Changsheng},
  note={Article Number: 244}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/68699">
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68699/1/Zhang_2-6ixd6hrt29k91.pdf"/>
    <dc:contributor>Tang, Fan</dc:contributor>
    <dc:contributor>Ma, Chongyang</dc:contributor>
    <dc:creator>Lee, Tong-Yee</dc:creator>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/68699/1/Zhang_2-6ixd6hrt29k91.pdf"/>
    <dc:creator>Huang, Nisha</dc:creator>
    <dc:creator>Ma, Chongyang</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Xu, Changsheng</dc:contributor>
    <dcterms:title>ProSpect : Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models</dcterms:title>
    <dc:creator>Deussen, Oliver</dc:creator>
    <dcterms:abstract>Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.</dcterms:abstract>
    <dcterms:issued>2023</dcterms:issued>
    <dc:rights>terms-of-use</dc:rights>
    <dc:creator>Huang, Haibin</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-12-13T07:45:33Z</dcterms:available>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Xu, Changsheng</dc:creator>
    <dc:contributor>Huang, Haibin</dc:contributor>
    <dc:language>eng</dc:language>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Dong, Weiming</dc:creator>
    <dc:contributor>Deussen, Oliver</dc:contributor>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/68699"/>
    <dc:creator>Tang, Fan</dc:creator>
    <dc:contributor>Huang, Nisha</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:creator>Zhang, Yuxin</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-12-13T07:45:33Z</dc:date>
    <dc:contributor>Zhang, Yuxin</dc:contributor>
    <dc:contributor>Lee, Tong-Yee</dc:contributor>
    <dc:contributor>Dong, Weiming</dc:contributor>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Diese Publikation teilen