Recent Midshipman Research Projects
Fast, Distributed Algorithms for Deep Networks: Ryan Burmeister, Trident Scholar
Neural networks are undergoing a resurgence in AI and Machine Learning; they are used to recognize handwritten digits on envelopes for the post office, they identify image subjects for Google Image Search, they play games, and they even write emails for you. However, neural nets require a huge amount of time (often, weeks) and terabytes of data in order to perform these tasks accurately. With this much data, and this much time, work on neural nets is time prohibitive for many researchers, and memory prohibitive for most computers.
For his Trident Scholar project, Ryan is writing algorithms for supercomputers that can help neural nets achieve better performance in less time. These highly-parallelized algorithms allow tens of thousands of computing cores to each process a portion of the data, while they collaborate on determining a single correct answer.
His work is partially based on previously published results authored by a USNA professor and two recently-graduated USNA midshipmen.
Advisor: Gavin Taylor
Identifying Network Security Attacks with Social Media: Stephen da Cruz, Ben Fry, Peter Goutzounis, James McMasters
As networks become more complex and federated, incorporating network based detection systems becomes ever more challenging with cross-administrative domain coordination. While it might not be in the best interest of public/private domains to publicly share insight into an ongoing attack, the users of these services may reveal information that things are not as they normally are. This research will investigate a broad-scale detection system for nationwide network attacks by leveraging social media. Advances in the fields of natural language processing and machine learning will be used to interpret informal comments about network trouble from millions of online users, and map the interpretation to (1) specific public and private networks, and (2) the type of network attack. The research will generate a near-real-time map of the nation’s critical network infrastructure, and provide alerts and reports of new and developing network attacks.
Advisor: Nate Chambers
Sparse Polynomial Computation: Whitman Groves
In symbolic math programs like Mathematica, cryptography applications, and elsewhere, polynomials are the most commonly-used object for computing. Things like adding polynomials, multiplying them, taking derivatives, and evaluating, need to work really quickly in order to solve big problems, or to solve small problems quickly. There is a popular open-source software library called Flint that is used within other software programs and by researchers around the world to do fast computations with polynomials.
Whitman is developing a new data structure for polynomials that accounts for the number of coefficients that are zero. For the kinds of polynomials that people commonly use in Mathematica for example, this new data structure can use much less memory. Combined with new ways to compute with polynomials stored in this new data structure, this can be much faster than the existing data structure for many of the kinds of computations that people want to do with polynomials. Whitman is writing an add-on package which we hope will be incorporated by Flint so that this data structure can be used by anyone around the world, for free.
The underlying technology is based on a research paper from last summer by a USNA professor.
Advisor: Dan Roche
Methods and Demographics in Collecting Android's Graphical Pattern Unlock Passwords: Justin Maguire
Justin is investigating methodological and demographic differences in Android's graphical password selection. A comprehensive comparison is being made between two major groups, individuals self-reporting their graphical password online, and individuals generating passwords using pen-and-paper. While there are strong consistencies between these groups, there are minor, but interesting, statically significant differences that should be accounted for, particularly, between gender, handed-ness, and the expected length, shape of graphical passwords reported.
Advisor: Adam Aviv
Oblivious Filesystem for the Cloud: Blair Mason
Right now, if you store stuff in Dropbox, Dropbox gets to read all your files. You can encrypt the files individually, but then Dropbox still gets to know how big your files are and when you change each one. Even that "metadata" can reveal significant private information about what you're doing, either to Dropbox or to some hacker that has infiltrated Dropbox's servers.
Blair is developing and implementing a new filesystem that can work on top of any synchronized folder, that might be on Dropbox, Google Drive, Ubuntu Unity or any similar "cloud" service. With this new filesystem, although everything looks just like regular files from the user's viewpoint, behind the scenes they are all encrypted and broken up in such a way that the synchronization service can't learn anything about what you're doing. That is, Dropbox has no idea which files you're accessing, or when, or how big they are.
The underlying technology is based on two recent research papers on a technology called Oblivious RAMs, written by USNA professors and published within the last year.
Calibration Methods for Improved Classification in Sparsely-Labeled Networks: Joshua King
Data about the world does not exist in a vacuum. Often, information is known not only about particular entities (people, places, documents), but about how those entities relate (or link) to one another via social contacts, transactions, hyperlinks, etc. Analyzing this data is the task of researchers in the recent field of Statistical Relational Learning (SRL), a subfield of Machine Learning and of Artificial Intelligence. For instance, consider the following problems:
- What topic is a certain new web page most likely to be concerned with?
- Given communication patterns, what mobile phones have probably been recently stolen?
- What people in a social network are likely to become friends in the next year?
Josh has been studying how such questions can be answered through the use of "collective classification," or more generally "link-based classification" (LBC). LBC's goal is to automatically assign labels to a set of inter-linked objects (such as documents or people). The key problem, however, is that the labels of the entire set of objects must be inferred simultaneously, since the most likely label for each object depends upon the estimated labels of objects to which it links. While these simultaneous inferences can often significantly improve accuracy, it can also occur that a few incorrect predictions lead to a cascade of errors, what we call the "flooding problem." Josh is particularly studying how novel calibration methods, based on randomization and/or predictions for the "reasonable" of a final result, can be applied to LBC methods to make them more resilient to such flooding, leading to consistently better accuracy.
Advisor: Luke McDowell