Affinity Propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points. Unlike clustering algorithms such as k-means or k-medoids, AP does not require the number of clusters to be determined or estimated before running the algorithm. AP finds "exemplars", members of the input set that are representative of clusters.
More about Affinity Propagation:
Hierarchical AP (HiAP) solves large-scale clustering problem. It uses AP and Weighted AP (WAP) in the Divide-and-Conquer schema. It partitions the dataset, runs AP on each subset, and applies WAP to the collection of exemplars constructed from each subset. HiAP was shown to significantly decrease the computational cost (from quadratic to quasi-linear), with minimal distortion.
WAP integrates the neighboring points together and keeps spatial structure between them and other points. It makes sure that the clustering results of WAP on integrated points are equal to that of AP on non-integrated points.
Apro is a Java implementation of Affinity Propagation clustering algorithm. It is efficiently parallelized for use on multicore processors and NUMA architectures (using libnuma native library), offering a simple API for easy use in your projects.
HiAPBuilder). Comprehensive API of these classes allows you to set parallelization and NUMA parameters yourself, or let the builder set the parameters automatically. Builder's
.build()method provides you a ready-to-use Apro class.
Apro - Java Affinity Propagation Library // Parallelized
Copyright © 2014 Lovro Ilijašić, Cécile Germain and Xiangliang Zhang
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License LGPL-3.0 as published by the Free Software Foundation.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
Here are step-by-step instructions how to run your first Apro example.
apro.jarand other required (and provided)
.jars into your project.
.matformat). More datasets are available here.
MATLABProviderto load the similarity matrix and preferences from the MATLAB file. This particular example contains the structure with similarity matrix named
S, but with zero main diagonal (preferences). Therefore, preferences should be set to the value that is contained separately in the structure, named
File faceVideoFile = new File("/path/to/FaceVideo.mat"); MATLABProvider provider = new MATLABProvider(faceVideoFile, "S", "pref");
fr.lri.tao.apro.ap.Aprodirectly, using its constructors, or use
AproBuilderclass from the same package as a helper class:
AproBuilder builder = new AproBuilder(); Apro apro = builder.build(provider);
AproBuilderwill automatically set the number of threads to the number of available processors, and will not use NUMA unless you specify so. You can find more advanced examples in the next section.
int myExemplar = apro.getExemplar(42);
int Apro.getExemplar(int node)returns the 'representative' node (exemplar) of the node specified as a parameter. Nodes having the same exemplar are considered to be in the same cluster. You can also get the whole set of clustering results using methods
Apro.getExemplars(), which returns the array of exemplars for all nodes, or
Apro.getExemplarSet()to get the
Setof exemplars. These methods are actually members of
AbstractApro, which provides some common code that can be used for other AP implementations if needed.
In the Preferences file, each line contains the preference of the corresponding node. The number of lines defines the number of nodes.
In the Similarities file, each line contains three delimiter separated values:
node1, node2, similarity.
Default delimiters are comma, semicolon, tab and space, but you can provide your own custom string of delimiter characters in the
File preferences = new File("/path/to/preferences.csv"); File similarities = new File("/path/to/similarities.csv"); DSVProvider provider = new DSVProvider(preferences, similarities);
If you want to use your own similarity matrix (with preferences on the main diagonal), you can just wrap it up using
double s = new double; s = s = s = s = 3; s = s = 20; s = 7; s = s = 15; DataProvider provider = new MatrixProvider(s);
By default, AproBuilder automatically detects the available number of processors (cores), and sets the
threadCount to it.
In certain cases, you may want to specify it yourself, using
AproBuilder.setThreads(int threadCount), or use the
detected value, as by default (
AproBuilder builder = new AproBuilder(); builder.setThreads(1); // no parallelization Apro apro = builder.build(provider);
By default, using NUMA library for thread management is switched off. You can either:
AproBuilder.setNuma(Integer numNodes, Integer coresPerNode, Integer startNode), where
numNodesis the number of NUMA nodes to be used,
coresPerNodethe number of threads on run on each node, and
startNodeis the NUMA node on which to start allocating threads. Specifying
coresPerNodeautodetects the maximum value. Specifying
startNode, starts creating threads from the currently used node.
AproBuilder.setNumaAuto(), which is the same as calling
AproBuilder.setNuma(null, null, null).
AproBuilder.setFullAuto()decides both NUMA parameters and the maximum number of threads.
AproBuilder builder = new AproBuilder(); builder.setFullAuto(); // use all the available processors Apro apro = builder.build(provider);
HiAP requires that similarity between any two nodes (points) can be calculated.
The only DataProvider that HiAP currently supports is
PointsProvider which for each line in the specified file creates a
Point object with read numerical features (delimiter separated values).
You can also specify a custom
SimilarityMeasure between points (the default similarity is the negative squared Euclidean distance between points).
PointsProvider provider = new PointsProvider(new File("/path/to/points.csv")); HiAPBuilder builder = new HiAPBuilder(); builder.setNumaAuto(); builder.setSplits(5); // specify the number of data partitions builder.setWorkerIters(100); // set the number of iterations for the first-level AP builder.setWAPIters(100); // set the number of iterations for the second-level AP HiAP hiap = builder.build(provider); hiap.run(); Point exemplar = hiap.getExemplar(5); // get the exemplar of the point with id 5
This work has been partially funded by the Région Île-de-France in the frame of the French cooperative project TIMCO, Pôle de Compétitivité Systematic (FUI 13).