Column subset selection with applications to neuroimaging data

Thumbnail Image
Date
2014
Editors
Contact
Journal ISSN
Electronic ISSN
ISBN
Bibliographical data
Publisher
Series
URI (citable link)
DOI (citable link)
ArXiv-ID
International patent number
Link to the license
EU project number
Project
Open Access publication
Restricted until
Title in another language
Research Projects
Organizational Units
Journal Issue
Publication type
Dissertation
Publication status
Published in
Abstract
Column (subset) selection problems require to select a subset of the columns from a matrix in an unsupervised fashion and such that a matrix norm error criterion is minimised. Motivations for column selection are 1) data interpretation through identifying relevant columns, 2) speedup by performing computationally expensive operations on a small column subset. This work introduces structural and algorithmic improvements regarding both aspects, along with demonstrating applications to neuroimaging data.


Column selection for data interpretation: NNCX and Convex_cone

For CX factorisation (Drineas et al., SIAM J. Matrix Analysis and Applications, 2008), a $c$-subset of the columns of matrix $A$ ($m \times n$) should be selected into the $m \times c$ matrix $C$, and combined linearly with coefficients in $X$ ($c \times n$), such that the CX norm error $\left\| A - CX \right\|^2_{Fr}$ is minimised in the Frobenius norm. For non-negative CX (NNCX), the coefficients in $X$ are constrained to be non-negative, which has advantages with respect to data interpretation (Hyvönen et al., ACM SIGKDD, 2008).

The goal is to find good column selection strategies for NNCX, and to analyse the interpretability aspect of CX/NNCX column selection. To this end, a generative model for NNCX is introduced, where the columns of $A$ contain either one of $s$ generating pure signal columns or a linear combination (with non-negative coefficients) of several pure signal columns. An algorithm, Convex_cone, is proposed as a heuristic for selecting the extreme columns of $A$. These extreme columns correspond to the generating columns and they span a convex cone that contains the data points of $A$. The extreme columns are interpretable in the sense that they allow to understand how $A$ has been constructed, and they also serve to reduce the NNCX norm error $\left\| A - CX^{0+} \right\|^2_{Fr}$ (non-negativity indicated by $^{0+}$).

Empirical evaluation is performed against state-of-the-art algorithms for column selection. With respect to recovering the generating columns and to reducing the NNCX norm error, Convex_cone performs better than established algorithms for CX that have been modified to compute a NNCX factorisation.


Column selection for speedup: Weighted norm error and FastPCA

A fast approximation to Principal Component Analysis (PCA) is proposed: FastPCA. Similar to the paradigm of the Nyström extension (Williams & Seeger, NIPS, 2000), the rank-$k$ reconstruction $A_k$ of matrix $A$, that is based on the top-$k$ principal components of $A$, is approximated by $\widehat{A}_k$. The approximation $\widehat{A}_k$ is based on the top-$k$ principal components of a column subset $C \in A$ to be chosen such that the norm error $\left\| A_k - \widehat{A}_k \right\|_{Fr}$ is small.

FastPCA implements a column sampling scheme to quickly find a good $C$, in particular in the light of a priori information about the data type. A weighted norm error criterion is introduced where column weights specify column importance. The a priori information helps to set up appropriate column weights, and it gives rise to a data-dependent probability distribution over the columns that assigns relevant columns a high probability of being sampled.


Applications to neuroimaging data

Both algorithms can be applied to calcium imaging movies of insect brain activity. The imaging movies turn out to have a NNCX-like structure: They can be modelled by linear combination (with non-negative coefficients) of the pure signals of neural units. Among a large number of columns (pixels), Convex_cone selects a subset of columns that contain the pure signals of identifiable neural units. The NNCX factorisation computed by Convex_cone proves to be useful in practical data analysis and visualisation as demonstrated by biological publications.

For FastPCA, a priori information about the spatial aspect of neuroimaging data enables a column importance sampling that takes the spatial relationship of columns (pixels) into account. This is used to explicitly reduce the approximation error for columns with biological signals that are known to occur in spatially contiguous clusters. FastPCA is employed as a preprocessing step for fast dimensionality and noise reduction, ensuring scalability to large data sizes in neuroimaging.
Summary in another language
Subject (DDC)
004 Computer Science
Keywords
column subset selection, column-based matrix factorisation, neuroimaging data
Conference
Review
undefined / . - undefined, undefined. - (undefined; undefined)
Cite This
ISO 690STRAUCH, Martin, 2014. Column subset selection with applications to neuroimaging data [Dissertation]. Konstanz: University of Konstanz
BibTex
@phdthesis{Strauch2014Colum-29539,
  year={2014},
  title={Column subset selection with applications to neuroimaging data},
  author={Strauch, Martin},
  address={Konstanz},
  school={Universität Konstanz}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/29539">
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/29539/3/Strauch_0-269383.pdf"/>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-01-19T08:28:51Z</dcterms:available>
    <dcterms:issued>2014</dcterms:issued>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/29539"/>
    <dc:creator>Strauch, Martin</dc:creator>
    <dc:contributor>Strauch, Martin</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-01-19T08:28:51Z</dc:date>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/29539/3/Strauch_0-269383.pdf"/>
    <dcterms:title>Column subset selection with applications to neuroimaging data</dcterms:title>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:abstract xml:lang="eng">Column (subset) selection problems require to select a subset of the columns from a matrix in an unsupervised fashion and such that a matrix norm error criterion is minimised. Motivations for column selection are 1) data interpretation through identifying relevant columns, 2) speedup by performing computationally expensive operations on a small column subset. This work introduces structural and algorithmic improvements regarding both aspects, along with demonstrating applications to neuroimaging data.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Column selection for data interpretation: NNCX and Convex_cone&lt;br /&gt;&lt;br /&gt;For CX factorisation (Drineas et al., SIAM J. Matrix Analysis and Applications, 2008), a $c$-subset of the columns of matrix $A$ ($m \times n$) should be selected into the $m \times c$ matrix $C$, and combined linearly with coefficients in $X$ ($c \times n$), such that the CX norm error $\left\| A - CX \right\|^2_{Fr}$ is minimised in the Frobenius norm.  For non-negative CX (NNCX), the coefficients in $X$ are constrained to be non-negative, which has advantages with respect to data interpretation (Hyvönen et al., ACM SIGKDD, 2008).&lt;br /&gt;&lt;br /&gt;The goal is to find good column selection strategies for NNCX, and to analyse the interpretability aspect of CX/NNCX column selection. To this end, a generative model for NNCX is introduced, where the columns of $A$ contain either one of $s$ generating pure signal columns or a linear combination (with non-negative coefficients) of several pure signal columns. An algorithm, Convex_cone, is proposed as a heuristic for selecting the extreme columns of $A$. These extreme columns correspond to the generating columns and they span a convex cone that contains the data points of $A$. The extreme columns are interpretable in the sense that they allow to  understand how $A$ has been constructed, and they also serve to reduce the NNCX norm error $\left\| A - CX^{0+} \right\|^2_{Fr}$ (non-negativity indicated by $^{0+}$).&lt;br /&gt;&lt;br /&gt;Empirical evaluation is performed against state-of-the-art algorithms for column selection. With respect to recovering the generating columns and to reducing the NNCX norm error, Convex_cone performs better than established algorithms for CX that have been modified to compute a NNCX factorisation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Column selection for speedup: Weighted norm error and FastPCA&lt;br /&gt;&lt;br /&gt;A fast approximation to Principal Component Analysis (PCA) is proposed: FastPCA. Similar to the paradigm of the Nyström extension (Williams &amp; Seeger, NIPS, 2000), the rank-$k$ reconstruction $A_k$ of matrix $A$, that is based on the top-$k$ principal components of $A$, is approximated by $\widehat{A}_k$. The approximation $\widehat{A}_k$ is based on the top-$k$ principal components of a column subset $C \in A$ to be chosen such that the norm error $\left\| A_k - \widehat{A}_k \right\|_{Fr}$ is small.&lt;br /&gt;&lt;br /&gt;FastPCA implements a column sampling scheme to quickly find a good $C$, in particular in the light of a priori information about the data type. A weighted norm error criterion is introduced where column weights specify column importance. The a priori information helps to set up appropriate column weights, and it gives rise to a data-dependent probability distribution over the columns that assigns relevant columns a high probability of being sampled.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Applications to neuroimaging data&lt;br /&gt;&lt;br /&gt;Both algorithms can be applied to calcium imaging movies of insect brain activity. The imaging movies turn out to have a NNCX-like structure: They can be modelled by linear combination (with non-negative coefficients) of the pure signals of neural units. Among a large number of columns (pixels), Convex_cone selects a subset of columns that contain the pure signals of identifiable neural units. The NNCX factorisation computed by Convex_cone proves to be useful in practical data analysis and visualisation as demonstrated by biological publications.&lt;br /&gt;&lt;br /&gt;For FastPCA, a priori information about the spatial aspect of neuroimaging data enables a column importance sampling that takes the spatial relationship of columns (pixels) into account. This is used to explicitly reduce the approximation error for columns with biological signals that are known to occur in spatially contiguous clusters. FastPCA is employed as a preprocessing step for fast dimensionality and noise reduction, ensuring scalability to large data sizes in neuroimaging.</dcterms:abstract>
    <dc:language>eng</dc:language>
  </rdf:Description>
</rdf:RDF>
Internal note
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Contact
URL of original publication
Test date of URL
Examination date of dissertation
December 3, 2014
University note
Konstanz, Univ., Doctoral dissertation, 2014
Method of financing
Comment on publication
Alliance license
Corresponding Authors der Uni Konstanz vorhanden
International Co-Authors
Bibliography of Konstanz
No
Refereed