| Title: | MST-kNN Clustering Algorithm |
|---|---|
| Description: | Implements the MST-kNN clustering algorithm proposed by Inostroza-Ponta (2008) <https://trove.nla.gov.au/work/28729389>. The algorithm determines the number of clusters automatically by recursively intersecting the Minimum Spanning Tree (MST) and the k-Nearest Neighbor (kNN) proximity graphs constructed from a pairwise distance matrix. The value of k is selected via a connectivity criterion (the smallest k such that the kNN graph is connected, bounded by floor(log(n))). The package requires only a distance matrix as input and returns cluster assignments, an igraph network, and partition metadata. |
| Authors: | Jorge Parraga-Alava [aut, cre] (ORCID: <https://orcid.org/0000-0001-8558-9122>), Pablo Moscato [aut], Mario Inostroza-Ponta [aut] |
| Maintainer: | Jorge Parraga-Alava <[email protected]> |
| License: | GPL-2 |
| Version: | 1.0.0 |
| Built: | 2026-05-12 03:17:51 UTC |
| Source: | https://github.com/jorgeklz/package-mstknnclust |
It contains the distances between 84 Indo-European languages based on the mean percent difference in cognacy, using the 200 Swadesh words.
data(dslanguages)data(dslanguages)
An data frame with 84 rows and 84 columns containing a distance matrix.
Once the data set is loaded, it can be accessed as an object of class dataframe called dslanguages.
Dyen, I., Kruskal, J., and Black, P. (1992). An indoeuropean classification: A lexicostatistical experiment. Transactions of the American Philosophical Society. 82, (5).
It contains the expression levels of 2467 genes on 79 samples corresponding to 8 different experiments of the budding yeast: alpha factor (18 samples), cdc15 (15 samples), cold shock (4 samples), diauxic shift (7 samples), DTT shock (4 samples), elutriation (14 samples), heat shock (6 samples) and sporulation (11 samples).
data(dsyeastexpression)data(dsyeastexpression)
An data frame with 2467 rows and 79 columns.
Once the data set is loaded, it can be accessed as an object of class dataframe called dsyeastexpression.
https://www.pnas.org/content/suppl/1998/12/08/95.25.14863.DC1/3917data.xls
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. (1998). Cluster analysis and display of genome-wideexpression patterns.Proceedings of the National Academy of Sciences, 95(25):14863–14868
Performs the MST-kNN clustering algorithm which generates a clustering solution with automatic number-of-clusters determination by recursively intersecting the Minimum Spanning Tree (MST) and the k-Nearest Neighbor (kNN) graphs.
mst.knn(distance.matrix, suggested.k)mst.knn(distance.matrix, suggested.k)
distance.matrix |
A numeric matrix or data.frame with equal numbers of rows and columns representing pairwise distances between objects. |
suggested.k |
Optional. A numeric value representing the suggested number of nearest neighbours. |
A list with elements cnumber, cluster,
partition, csize, network.
Mario Inostroza-Ponta, Jorge Parraga-Alava, Pablo Moscato
set.seed(1987) n <- 100; m <- 15 x <- matrix(runif(n * m, min = -5, max = 10), nrow = n, ncol = m) d <- base::as.matrix(stats::dist(x, method = "euclidean")) library("mstknnclust") results <- mst.knn(d) library("igraph") plot(results$network, vertex.size = 8, vertex.color = igraph::components(results$network)$membership, layout = igraph::layout_with_fr(results$network, niter = 10000), main = paste("MST-kNN | clusters =", results$cnumber))set.seed(1987) n <- 100; m <- 15 x <- matrix(runif(n * m, min = -5, max = 10), nrow = n, ncol = m) d <- base::as.matrix(stats::dist(x, method = "euclidean")) library("mstknnclust") results <- mst.knn(d) library("igraph") plot(results$network, vertex.size = 8, vertex.color = igraph::components(results$network)$membership, layout = igraph::layout_with_fr(results$network, niter = 10000), main = paste("MST-kNN | clusters =", results$cnumber))