org.openimaj.ml.clustering.kmeans
Class HFloatKMeans

java.lang.Object
  extended by org.openimaj.ml.clustering.kmeans.HFloatKMeans
All Implemented Interfaces:
Readable, ReadableASCII, ReadableBinary, ReadWriteable, Writeable, WriteableASCII, WriteableBinary, Cluster<HFloatKMeans,float[]>, ReadWriteableCluster

public class HFloatKMeans
extends Object
implements Cluster<HFloatKMeans,float[]>

Hierarchical Float K-Means clustering (HFloatKMeans) is a simple hierarchical version of FastFloatKMeansCluster. The algorithm recursively applies FastFloatKMeansCluster to create more refined partitions of the data.


Nested Class Summary
static class HFloatKMeans.Node
          HFloatKMeans tree node The number of children is not bigger than the HFloatKMeans K parameter
 
Field Summary
 
Fields inherited from interface org.openimaj.ml.clustering.ReadWriteableCluster
CLUSTER_HEADER
 
Constructor Summary
HFloatKMeans()
           
HFloatKMeans(HKMeansMethod method)
          New HFloatKMeans tree
 
Method Summary
 String asciiHeader()
          Header for ascii input.
 byte[] binaryHeader()
          Header for binary input.
 int countActiveLeafNodes()
          Count number of active leaf nodes.
 int countLeafs()
          Total number of leaves assuming leaves = K^depth
 boolean equals(Object o)
           
 float[] getClusterCentroid(int[] path)
          Given a path, get the cluster centroid associated with the cluster index of the path.
 float[][] getClusters()
          Utility function useful for testing.
 int getDepth()
          Get depth
 int getIndex(int[] path)
          Translates a path down the KDTree as a cluster index.
static int getIndex(int[] path, int depth, int K)
          Translates a path down the KDTree as a cluster index.
 int getK()
           
 int getMaxIterations()
          Number of iterations for the underlying FastKMeans implementation
 int getNDims()
          Get data dimensionality
 int getNumberClusters()
          Get the number of centers K
 int[] getPath(int index)
          Given an index, what was the path down the hierarchy that lead to it.
static int[] getPath(int index, int depth, int K)
          Given an index, what was the path down the hierarchy that lead to it.
 HFloatKMeans.Node getRoot()
          Get maximum number of iterations
 void init(int M, int K, int depth)
          Initialize HFloatKMeans tree
 void init(int M, int K, int depth, int niters)
          Initialize HFloatKMeans tree
 void optimize(boolean exact)
          Prepare the cluster for pushing
 int[] push_one_path(float[] data)
          Project data down HFloatKMeans tree
 int push_one(float[] data)
          Project one datum to clusters
 int[] push_one(float[] data, int numNeighbours)
          Project one datum to clusters
 int[] push(float[][] data)
          Project data to clusters.
 int[][] push(float[][] data, int numNeighbours)
          Project data to clusters.
 int[][] pushPath(float[][] data)
          Project data down HFloatKMeans tree
 void readASCII(Scanner reader)
          Read internal state from in.
 void readBinary(DataInput dis)
          Read internal state from in.
 void setMaxIterations(int maxIters)
          Number of iterations for the underlying FastKMeans implementation
 String toString()
           
 int train(DataSource<float[]> data)
          Train clusters with a data source, can be more efficient
 int train(float[][] data)
          Train clusters
 void writeASCII(PrintWriter writer)
          Write the content of this as ascii to out.
 void writeBinary(DataOutput dos)
          Write the content of this as binary to out.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HFloatKMeans

public HFloatKMeans(HKMeansMethod method)
New HFloatKMeans tree

Parameters:
method - clustering method.

HFloatKMeans

public HFloatKMeans()
Method Detail

init

public void init(int M,
                 int K,
                 int depth)
Initialize HFloatKMeans tree

Parameters:
M - Data dimensionality.
K - Number of clusters per node.
depth - Tree depth.

init

public void init(int M,
                 int K,
                 int depth,
                 int niters)
Initialize HFloatKMeans tree

Parameters:
M - Data dimensionality.
K - Number of clusters per node.
depth - Tree depth.
niters - Maximum number of iterations

train

public int train(float[][] data)
Description copied from interface: Cluster
Train clusters

Specified by:
train in interface Cluster<HFloatKMeans,float[]>
Parameters:
data - data.
Returns:
-1 if an overflow may have occurred.

train

public int train(DataSource<float[]> data)
Description copied from interface: Cluster
Train clusters with a data source, can be more efficient

Specified by:
train in interface Cluster<HFloatKMeans,float[]>
Parameters:
data - data.
Returns:
-1 if an overflow may have occurred.

pushPath

public int[][] pushPath(float[][] data)
Project data down HFloatKMeans tree

Parameters:
data - Data to project.
Returns:
Path down the tree (out).

push_one_path

public int[] push_one_path(float[] data)
Project data down HFloatKMeans tree

Parameters:
data - Data to project.
Returns:
Path down the tree (out).

getNDims

public int getNDims()
Description copied from interface: Cluster
Get data dimensionality

Specified by:
getNDims in interface Cluster<HFloatKMeans,float[]>
Returns:
data dimensionality.

getK

public int getK()
Returns:
number of clusters per node

getDepth

public int getDepth()
Get depth

Returns:
depth.

getMaxIterations

public int getMaxIterations()
Number of iterations for the underlying FastKMeans implementation

Returns:
number of interations

getRoot

public HFloatKMeans.Node getRoot()
Get maximum number of iterations

Returns:
maximum number of iterations.

setMaxIterations

public void setMaxIterations(int maxIters)
Number of iterations for the underlying FastKMeans implementation

Parameters:
maxIters -

getIndex

public static int getIndex(int[] path,
                           int depth,
                           int K)
Translates a path down the KDTree as a cluster index. This allows the specification of K and the depth

Parameters:
path -
depth -
K -
Returns:
cluster index

getIndex

public int getIndex(int[] path)
Translates a path down the KDTree as a cluster index.

Parameters:
path -
Returns:
cluster index

getPath

public static int[] getPath(int index,
                            int depth,
                            int K)
Given an index, what was the path down the hierarchy that lead to it. Allows the specification of depth and number of clusters.

Parameters:
index -
depth -
K -
Returns:
a hierarchy path

getPath

public int[] getPath(int index)
Given an index, what was the path down the hierarchy that lead to it. Assumes the class depth and K.

Parameters:
index -
Returns:
a hierarchy path

countActiveLeafNodes

public int countActiveLeafNodes()
Count number of active leaf nodes.

Returns:
number of nodes.

optimize

public void optimize(boolean exact)
Description copied from interface: Cluster
Prepare the cluster for pushing

Specified by:
optimize in interface Cluster<HFloatKMeans,float[]>
Parameters:
exact - TODO

push

public int[] push(float[][] data)
Description copied from interface: Cluster
Project data to clusters.

Specified by:
push in interface Cluster<HFloatKMeans,float[]>
Parameters:
data - data.
Returns:
The cluster indecies which the data was pushed to

push_one

public int push_one(float[] data)
Description copied from interface: Cluster
Project one datum to clusters

Specified by:
push_one in interface Cluster<HFloatKMeans,float[]>
Parameters:
data - datum to project.
Returns:
the cluster index.

push

public int[][] push(float[][] data,
                    int numNeighbours)
Description copied from interface: Cluster
Project data to clusters.

Specified by:
push in interface Cluster<HFloatKMeans,float[]>
Parameters:
data - data.
numNeighbours - number of neighboring clusters to return also. When set to 1 this is equivalent to Cluster#push(DATATYPE[])
Returns:
The centers and neighbours for the data

push_one

public int[] push_one(float[] data,
                      int numNeighbours)
Description copied from interface: Cluster
Project one datum to clusters

Specified by:
push_one in interface Cluster<HFloatKMeans,float[]>
Parameters:
data - datum to project.
numNeighbours - number of neighbouring clusters to return also. When set to 1 this is equivalent to Cluster.push_one(Object)
Returns:
the cluster index and the index of neighbours.

toString

public String toString()
Overrides:
toString in class Object

countLeafs

public int countLeafs()
Total number of leaves assuming leaves = K^depth

Returns:
number of leaves

getNumberClusters

public int getNumberClusters()
Description copied from interface: Cluster
Get the number of centers K

Specified by:
getNumberClusters in interface Cluster<HFloatKMeans,float[]>
Returns:
number of centers K.

equals

public boolean equals(Object o)
Overrides:
equals in class Object

getClusterCentroid

public float[] getClusterCentroid(int[] path)
Given a path, get the cluster centroid associated with the cluster index of the path.

Parameters:
path -
Returns:
the centroid of a given path

asciiHeader

public String asciiHeader()
Description copied from interface: ReadableASCII
Header for ascii input. Will be automatically read by IOUtils when using readASCII().

Specified by:
asciiHeader in interface ReadableASCII
Specified by:
asciiHeader in interface WriteableASCII
Returns:
header

binaryHeader

public byte[] binaryHeader()
Description copied from interface: ReadableBinary
Header for binary input. Will be automatically read by IOUtils when using readBinary().

Specified by:
binaryHeader in interface ReadableBinary
Specified by:
binaryHeader in interface WriteableBinary
Returns:
header

readASCII

public void readASCII(Scanner reader)
               throws IOException
Description copied from interface: ReadableASCII
Read internal state from in.

Specified by:
readASCII in interface ReadableASCII
Parameters:
reader - source to read from.
Throws:
IOException - an error reading input

readBinary

public void readBinary(DataInput dis)
                throws IOException
Description copied from interface: ReadableBinary
Read internal state from in.

Specified by:
readBinary in interface ReadableBinary
Parameters:
dis - source to read from.
Throws:
IOException - an error reading input

writeASCII

public void writeASCII(PrintWriter writer)
                throws IOException
Description copied from interface: WriteableASCII
Write the content of this as ascii to out.

Specified by:
writeASCII in interface WriteableASCII
Parameters:
writer - sink to write to
Throws:
IOException - an error writing to out

writeBinary

public void writeBinary(DataOutput dos)
                 throws IOException
Description copied from interface: WriteableBinary
Write the content of this as binary to out.

Specified by:
writeBinary in interface WriteableBinary
Parameters:
dos - sink to write to
Throws:
IOException - an error writing to out

getClusters

public float[][] getClusters()
Description copied from interface: Cluster
Utility function useful for testing. The cluster must return something it considers to be it's cluster centroids. Different types of cluster will clearly return different data types here. This might (and often will be) null given that it often might not make any sense. It is a sign of a good cluster that can produce a set of centroids for itself.

Specified by:
getClusters in interface Cluster<HFloatKMeans,float[]>
Returns:
The cluster's centroids.


Copyright © 2011 The University of Southampton. All Rights Reserved.