writeup/conclusion.tex

   1 \section{Conclusion}
   2 We have explored applying machine learning techniques learned in class to the problem of clustering program traces.  On the implementation side, we have developed a method for extracting java util method traces from any Java JAR executable, and we wrote an implementation of the $k$ Markov chains mixture model in MATLAB.
   3
   4 In terms of theory, we developed the $k$ Markov chain model on our own, based heavily off of the GMM mixture model and variable length HMM model developed in class.  We also explored the problem of determining the number of underlying mixtures to use by running cross-validation for different values of $k$ and examining the BIC score.  As expected, after a certain point, increasing $k$ does not yield better results, and both cross-validation and the BIC score agree on this tipping point.
   5
   6 In terms of results, we saw that individual programs did not appear to have their own, specific ways of interacting with the libraries we examined.  In retrospect, this is sensible since we expect there to be a fixed number of ways to interact with a library, so it is implausible for each program to be recovered just by looking at its traces.  Moreover, our model appeared to assign most traces to a single cluster, indicating that there may be a dominant mode of interaction.  Balancing this observation, however, is the observation that a single Markov chain can actually account for multiple modes of interaction, especially if those modes consist of disjoint sets of function calls.  This may cause our model to include more than one mode of interaction in a single cluster, and we may need some post-processing to break this into more useful clusters.  While the model may suffer from this, we were still able to gain insights on different types of interaction by examining both the contents of the clusters and the actual parameters of the underlying Markov chains.
   7
   8 The results of this work has many applications.  Given that we know that there are common use cases of an API, we could generate a useful set of example use cases for the programmer.  We could also use this information in code synthesis: knowing about different usage modes, and perhaps about the common cases, could greatly optimize the search space.  This information is also useful in test generation: we might be able to predict interactions with the code we write and make sure our code is robust to those cases.
   9
  10 If there had been no time constraints, we would have loved to look closely at other libraries, constructed traces from more programs, and used variations in classification such as collapsing repeated states into a single states or a set of states and handling variable length clusters in a more clever way.