PROXIMUS

PROXIMUS: Software for Summarization of Very High Dimensional Discrete-Valued Datasets

PROXIMUS is a software tool for error-bounded approximation of high-dimensional binary attributed datasets based on nonorthogonal decomposition of binary matrices. This tool can be used for analyzing data arising in a variety of domains ranging from commercial to scientific applications. Using a combination of innovative algorithms, novel data structures, and efficient implementation, PROXIMUS computes a concise representation for very large binary matrices, providing insights into common patterns in the rows and columns of the matrix. PROXIMUS has found application in many areas, including association rule mining, DNA microarray analysis, and business analytics. The original release of PROXIMUS is implemented in C and is freely available as open source below. It was also implemented in R within the CBA (Clustering for Business Analytics) by Christian Buchta and Michael Hahsler and in Java by Jaan Ubi.

Download

PROXIMUS source code (implemented in C)

Publications

M. Koyuturk, A. Grama, and N. Ramakrishnan, Non-orthogonal decomposition of binary matrices for bounded-error data compression and analysis, ACM Transactions on Mathematical Software, 32(1), 33-69, 2006. pdf

M. Koyuturk, A. Grama, and N. Ramakrishnan, Compression, clustering and pattern discovery in very high dimensional discrete-attribute datasets, IEEE Transactions on Knowledge and Data Engineering, 17(4), 447-461, 2005. pdf

J. Chi, M. Koyuturk, and A. Grama, Conquest: A coarse-grained algorithm for constructing summaries of distributed discrete datasets, Algorithmica, 45(3), 377-401, 2006. pdf

M. Koyuturk, W. Szpankowski, and A. Grama, Biclustering gene-feature matrices for statistically significant dense patterns, CSB'04, 2004. pdf

J. Chi, M. Koyuturk, and A. Grama, CONQUEST: A distributed tool for constructing summaries of high-dimensional discrete-attributed datasets, SIAM DM'04, 154-165, 2004. pdf

M. Koyuturk and A. Grama, PROXIMUS: A framework for analyzing very high dimensional discrete-attributed datasets, KDD'03, 147-156, 2003. pdf

M. Koyuturk, A. Grama, and W. Szpankowski, Algorithms for bounded-error correlation of high dimensional data in microarray experiments, CSB'03, 575-580, 2003. pdf

M. Koyuturk, A. Grama and N. Ramakrishnan, Algebraic techniques for analysis of large discrete-valued datasets, PKDD'02, 311-324, 2002. pdf