Details

Title:Clustering Systems with Kolmogorov Complexity and MapReduce
Authors:Troisi, Louis R
Serial Number:2011-01
Publication Date:6- 2-2011
Abstract:In the field of value management, an important problem is quantifying the processes and capabilities of an organization's network and the machines within. When the organization is large, ever-changing, and responding to new demands, it is difficult to know at any given time what exactly is being run on the machines. So one could lose track of approved or, worse, not approved or even malicious software, as the machines become employed for various tasks. Moreover, the level of utilization of the machines may affect the maintenance and upkeep of the network. Our goal is to develop a tool that can cluster the machines on a network, in a meaningful way, using different attributes or features, and it does so autonomously, in a efficient and scalable system. The solution developed implements, at it's core, a streaming algorithm that in real-time takes meaningful operating data from a network, compresses it, and sends it to a MapReduce clustering algorithm. The clustering algorithm uses a normalized compression distance to measure the similarity of two machines. The goal for this project was to implement the solution and measure the overall effectiveness of the clusters. The implementation was successful in creating a software tool that can compress, determine the normalized compression distance and cluster the machines. More work however, needs to be done in using our system to extract more quantitative meaning from the clusters generated.
View ReportView bibtex