LAMSAS in the Netherlands



Clustering with noise

Clustering is susceptible to noise. We can exploit this fact. We can deliberately add noise before clustering, repeat this many times, and see what the effect will be. A strong cluster border will not be as easily effected as a weak cluster border.

There are several choices to make in this procedure. After testing, these seem to work well in most cases:

  • a noise level of 0.5 times the standard deviation of the differences
  • using both Group Average clustering and Weighted Average clustering, combining the results
  • repeat 50 times
  • use the average cophenetic differences as the result
The cophenetic difference of two items is the difference of the two clusters they were part of at the point were these two cluster were joined into a single cluster containing both items. A dendrogram shows these cophenetic differences on the x-axis. See image at middle right.


Though this is probably the most honest map you can get, it is not the most clear one. You may want to extend this procedure with MDS.