Clustering Options

Unsupervised classification scheme using a k-means algorithm.

Select clustering options:
  • Initialization: best options will probably be with the first two
  • Clusters: the maximum number of clusters you will get. The program may return fewer clusters if there are not enough distinct clusters in n-dimensional space.
  • Iterations:  how long to keep going
  • Sampling: the clustering can use only a limited number of points, and the data set will be thinned by this factor in both x and y for grids, and pick every nth point for databases.  The original default should be the largest allowed.  You can pick a larger value of this parameter, but not a smaller one.
  • These options can greatly slow operations.
    • Scatterplots by cluster: get 2D graphs from each pair of variables, colored by cluster.
    • Scatterplots by mask:
    • Histograms by cluster, colored by cluster
    • Histograms by mask
  • Create grid: put the classification into a grid.
  • Classification distance power: the Euclidian distance uses the the square, but other powers can also be used.  The larger the power, the greater the effect of single outliers.

Last revision 6/4/2015