There are many options for choosing distance and linkage functions for hclust. This function goes through various combinations of the two and helps find the one that is most "similar" to the original distance matrix.

dend_expend(
  x,
  dist_methods = c("euclidean", "maximum", "manhattan", "canberra", "binary",
    "minkowski"),
  hclust_methods = c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty",
    "median", "centroid"),
  hclust_fun = hclust,
  optim_fun = cor_cophenetic,
  ...
)

find_dend(x, ...)

Arguments

x

A matrix or a data.frame. Can also be a dist object.

dist_methods

A vector of possible dist methods.

hclust_methods

A vector of possible hclust methods.

hclust_fun

By default hclust.

optim_fun

A function that accepts a dend and a dist and returns how the two are in agreement. Default is cor_cophenetic.

...

options passed from find_dend to dend_expend.

Value

dend_expend: A list with three items. The first item is called "dends" and includes a dendlist with all the possible dendrogram combinations. The second is "dists" and includes a list with all the possible distance matrix combination. The third. "performance", is data.frame with three columns: dist_methods, hclust_methods, and optim. optim is calculated (by default) as the cophenetic correlation (see: cor_cophenetic) between the distance matrix and the cophenetic distance of the hclust object.

find_dend: A dendrogram which is "optimal" based on the output from dend_expend.

Examples


x <- datasets::mtcars
out <- dend_expend(x, dist_methods = c("euclidean", "manhattan"))
out$performance
#>    dist_methods hclust_methods     optim
#> 1     euclidean         ward.D 0.7627152
#> 2     manhattan         ward.D 0.7943439
#> 3     euclidean        ward.D2 0.7778775
#> 4     manhattan        ward.D2 0.8037024
#> 5     euclidean         single 0.6834445
#> 6     manhattan         single 0.7029393
#> 7     euclidean       complete 0.8110543
#> 8     manhattan       complete 0.8051400
#> 9     euclidean        average 0.7935237
#> 10    manhattan        average 0.8182836
#> 11    euclidean       mcquitty 0.8114282
#> 12    manhattan       mcquitty 0.8146632
#> 13    euclidean         median 0.7888620
#> 14    manhattan         median 0.8056585
#> 15    euclidean       centroid 0.7917852
#> 16    manhattan       centroid 0.8177614

dend_expend(dist(x))$performance
#>   dist_methods hclust_methods     optim
#> 1      unknown         ward.D 0.7627152
#> 2      unknown        ward.D2 0.7778775
#> 3      unknown         single 0.6834445
#> 4      unknown       complete 0.8110543
#> 5      unknown        average 0.7935237
#> 6      unknown       mcquitty 0.8114282
#> 7      unknown         median 0.7888620
#> 8      unknown       centroid 0.7917852

best_dend <- find_dend(x, dist_methods = c("euclidean", "manhattan"))
plot(best_dend)