Apprentissage & Optimisation Team (TAO)
Laboratoire de Recherche en Informatique (LRI)
Paris, Saclay (France)

Apro
Java Affinity Propagation Library // Parallelized

Apro is a Java implementation of Affinity Propagation clustering algorithm. It is parallelized for easy and efficient use on multicore processors and NUMA architectures (using libnuma native library). Besides the basic (parallelized) algorithm, it also implements Hierarchical Affinity Propagation (HiAP), making it usable in practice even on huge datasets. Apro comes with a couple of ready to use datasource providers (for Delimiter Separated Values, MATLAB files, etc.).

  1. About Affinity Propagation in general
  2. Apro Java Library overview and documentation
  3. Getting Started the first example
  4. More Examples
  5. Downloads .jar and sources
  6. Contact and acknowledgements

1. About Affinity Propagation in General

Affinity Propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points. Unlike clustering algorithms such as k-means or k-medoids, AP does not require the number of clusters to be determined or estimated before running the algorithm. AP finds "exemplars", members of the input set that are representative of clusters.

More about Affinity Propagation:

Hierarchical Affinity Propagation

Hierarchical AP (HiAP) solves large-scale clustering problem. It uses AP and Weighted AP (WAP) in the Divide-and-Conquer schema. It partitions the dataset, runs AP on each subset, and applies WAP to the collection of exemplars constructed from each subset. HiAP was shown to significantly decrease the computational cost (from quadratic to quasi-linear), with minimal distortion.

WAP integrates the neighboring points together and keeps spatial structure between them and other points. It makes sure that the clustering results of WAP on integrated points are equal to that of AP on non-integrated points.

2. Apro Java Library

Apro is a Java implementation of Affinity Propagation clustering algorithm. It is efficiently parallelized for use on multicore processors and NUMA architectures (using libnuma native library), offering a simple API for easy use in your projects.

Main features of Apro

API documentation

JavaDoc is available here, as well as in the download package.

Planned features

Used libraries

License

Apro - Java Affinity Propagation Library // Parallelized
Copyright © 2014 Lovro Ilijašić, Cécile Germain and Xiangliang Zhang
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License LGPL-3.0 as published by the Free Software Foundation.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

Contact

3. Getting Started

Here are step-by-step instructions how to run your first Apro example.

  1. Download Apro package from the 'Downloads' section and include apro.jar and other required (and provided) .jars into your project.
  2. Download a sample dataset. You can start with FaceVideo dataset (9.5 MB, .mat format). More datasets are available here.
  3. Let's begin coding. First, we use MATLABProvider to load the similarity matrix and preferences from the MATLAB file. This particular example contains the structure with similarity matrix named S, but with zero main diagonal (preferences). Therefore, preferences should be set to the value that is contained separately in the structure, named pref:
    
          File faceVideoFile = new File("/path/to/FaceVideo.mat");
          MATLABProvider provider = new MATLABProvider(faceVideoFile, "S", "pref");
              
  4. Now that we have the input data, we need to create the object that will perform the Affinity Propagation algorithm. We can either construct an object of class fr.lri.tao.apro.ap.Apro directly, using its constructors, or use AproBuilder class from the same package as a helper class:
    
          AproBuilder builder = new AproBuilder();      
          Apro apro = builder.build(provider);
              
    By default, AproBuilder will automatically set the number of threads to the number of available processors, and will not use NUMA unless you specify so. You can find more advanced examples in the next section.
  5. Now we execute the desired number of iterations:
    
          apro.run(100);
              
  6. Finally, we get the clustering results from the Apro object:
    
          int myExemplar = apro.getExemplar(42);
              
    Method int Apro.getExemplar(int node) returns the 'representative' node (exemplar) of the node specified as a parameter. Nodes having the same exemplar are considered to be in the same cluster. You can also get the whole set of clustering results using methods Apro.getExemplars(), which returns the array of exemplars for all nodes, or Apro.getExemplarSet() to get the Set of exemplars. These methods are actually members of Apro's superclass AbstractApro, which provides some common code that can be used for other AP implementations if needed.

4. More Examples

Load Delimiter Separated Values

In the Preferences file, each line contains the preference of the corresponding node. The number of lines defines the number of nodes.

In the Similarities file, each line contains three delimiter separated values: node1, node2, similarity.

Default delimiters are comma, semicolon, tab and space, but you can provide your own custom string of delimiter characters in the DSVProvider constructor.

      
      File preferences = new File("/path/to/preferences.csv");
      File similarities = new File("/path/to/similarities.csv");
      DSVProvider provider = new DSVProvider(preferences, similarities);    
        

Provide custom similarity matrix

If you want to use your own similarity matrix (with preferences on the main diagonal), you can just wrap it up using MatrixProvider.


      double[][] s = new double[3][3];
      s[0][0] = s[0][2] = s[2][0] = s[2][1] = 3;
      s[0][1] = s[1][0] = 20;
      s[1][1] = 7;
      s[1][2] = s[2][2] = 15;
      
      DataProvider provider = new MatrixProvider(s);
        

Specify the number of threads

By default, AproBuilder automatically detects the available number of processors (cores), and sets the threadCount to it. In certain cases, you may want to specify it yourself, using AproBuilder.setThreads(int threadCount), or use the detected value, as by default (AproBuilder.setThreadsAuto()).

    
      AproBuilder builder = new AproBuilder();
      builder.setThreads(1); // no parallelization
      Apro apro = builder.build(provider);      
        

Specify NUMA parameters

By default, using NUMA library for thread management is switched off. You can either:


      AproBuilder builder = new AproBuilder();
      builder.setFullAuto(); // use all the available processors
      Apro apro = builder.build(provider); 
        

Hierarchical Affinity Propagation

HiAP requires that similarity between any two nodes (points) can be calculated. The only DataProvider that HiAP currently supports is PointsProvider which for each line in the specified file creates a Point object with read numerical features (delimiter separated values). You can also specify a custom SimilarityMeasure between points (the default similarity is the negative squared Euclidean distance between points).


      PointsProvider provider = new PointsProvider(new File("/path/to/points.csv"));

      HiAPBuilder builder = new HiAPBuilder();
      builder.setNumaAuto();
      builder.setSplits(5); // specify the number of data partitions
      builder.setWorkerIters(100); // set the number of iterations for the first-level AP
      builder.setWAPIters(100); // set the number of iterations for the second-level AP

      HiAP hiap = builder.build(provider);
      hiap.run();
      Point exemplar = hiap.getExemplar(5); // get the exemplar of the point with id 5
        

5. Downloads

The code is now hosted on GitHub

You can separately download and use NUMA library wrapper for Java that was written for and used in this project

6. Contact and Acknowledgements

Apro library was implemented by Apprentissage & Optimisation Team TAO at the Laboratoire de Recherche en Informatique LRI at Paris, Saclay (France).

This work has been partially funded by the Région Île-de-France in the frame of the French cooperative project TIMCO, Pôle de Compétitivité Systematic (FUI 13).

For any information, comment or error report, you can contact Lovro Ilijašić E-mail, who implemented this library in collaboration with Cécile Germain, Xiangliang Zhang and Joël Falcou.