Clustering gene expression data with a penalized graph-based metric

Background The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitra...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Bayá, Ariel E., Granitto, Pablo M.
Formato: Artículo
Lenguaje:Inglés
Publicado: BioMed Central 2012
Acceso en línea:http://hdl.handle.net/2133/1859
http://hdl.handle.net/2133/1859
Aporte de:
id I15-R121-2133-1859
record_format dspace
institution Universidad Nacional de Rosario
institution_str I-15
repository_str R-121
collection Repositorio Hipermedial de la Universidad Nacional de Rosario (UNR)
language Inglés
orig_language_str_mv en
description Background The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. Results In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.
format Article
author Bayá, Ariel E.
Granitto, Pablo M.
spellingShingle Bayá, Ariel E.
Granitto, Pablo M.
Clustering gene expression data with a penalized graph-based metric
author_facet Bayá, Ariel E.
Granitto, Pablo M.
author_sort Bayá, Ariel E.
title Clustering gene expression data with a penalized graph-based metric
title_short Clustering gene expression data with a penalized graph-based metric
title_full Clustering gene expression data with a penalized graph-based metric
title_fullStr Clustering gene expression data with a penalized graph-based metric
title_full_unstemmed Clustering gene expression data with a penalized graph-based metric
title_sort clustering gene expression data with a penalized graph-based metric
publisher BioMed Central
publishDate 2012
url http://hdl.handle.net/2133/1859
http://hdl.handle.net/2133/1859
work_keys_str_mv AT bayaariele clusteringgeneexpressiondatawithapenalizedgraphbasedmetric
AT granittopablom clusteringgeneexpressiondatawithapenalizedgraphbasedmetric
bdutipo_str Repositorios
_version_ 1764820410566705153