PROXIMUS: Software for Summarization of Very High Dimensional Discrete-Valued Datasets
PROXIMUS is a software tool for error-bounded approximation of high-dimensional binary attributed
datasets based on nonorthogonal decomposition of binary matrices. This tool can be used for analyzing
data arising in a variety of domains ranging from commercial to scientific applications. Using a
combination of innovative algorithms, novel data structures, and efficient implementation, PROXIMUS
computes a concise representation for very large binary matrices, providing insights into common
patterns in the rows and columns of the matrix.
PROXIMUS has found application in many areas, including association rule mining,
DNA microarray analysis, and business analytics.
The original release of PROXIMUS is implemented in C and is freely available as open
source below.
It was also implemented in
R
within the CBA (Clustering for Business
Analytics) by Christian Buchta and Michael Hahsler
and in Java
by Jaan Ubi.
Download
Publications
- M. Koyuturk, A. Grama, and N. Ramakrishnan,
Non-orthogonal decomposition of binary matrices for bounded-error data compression
and analysis,
ACM Transactions on Mathematical Software, 32(1), 33-69, 2006.
pdf
- M. Koyuturk, A. Grama, and N. Ramakrishnan,
Compression, clustering and pattern discovery in very high dimensional
discrete-attribute datasets,
IEEE Transactions on Knowledge and Data Engineering,
17(4), 447-461, 2005.
pdf
- J. Chi, M. Koyuturk, and A. Grama,
Conquest: A coarse-grained algorithm for constructing summaries of distributed
discrete datasets,
Algorithmica,
45(3), 377-401, 2006.
pdf
- M. Koyuturk, W. Szpankowski, and A. Grama,
Biclustering gene-feature matrices for statistically significant dense patterns,
CSB'04,
2004.
pdf
- J. Chi, M. Koyuturk, and A. Grama,
CONQUEST: A distributed tool for constructing summaries of high-dimensional
discrete-attributed datasets,
SIAM DM'04,
154-165, 2004.
pdf
- M. Koyuturk and A. Grama,
PROXIMUS: A framework for analyzing very high dimensional discrete-attributed datasets,
KDD'03,
147-156, 2003.
pdf
- M. Koyuturk, A. Grama, and W. Szpankowski,
Algorithms for bounded-error correlation of high dimensional data in microarray experiments,
CSB'03,
575-580, 2003.
pdf
- M. Koyuturk, A. Grama and N. Ramakrishnan,
Algebraic techniques for analysis of large discrete-valued datasets,
PKDD'02,
311-324, 2002.
pdf