Project 3: Other Data Structures! Due 12/3 or 12/5
We've looked at a lot of data structures. Each data structure did some things well, and other things poorly. Of course, we've barely scratched the surface of things we could have covered in this class. There are data structures to optimize for things other than operations, like memory, or cache hits, or network activity. There are data structures specifically for points in large-dimensional spaces, or Strings, or other things. In this project, let's pick up the slack.
This is a project to be performed in a partnership. You will research a new data structure or concept, become an expert on it, and perform an oral report. In addition, you will answer some questions textually, and submit your answers to me. There is a good chance you will know a lot more about your data structure than I will!
Below are some suggestions for your project. If you know of one you want to present, run it by me; it's probably fine. Each section may only have one group work on a given data structure or concept.
Linked-Lists
- Skip Lists - linked lists with \(O(\log(n))\) lookup (may not be presented by a midshipman registered for Randomized Algorithms)
Balanced Trees
- Red-Black Trees - equivalent to 2-3-4 Trees
- Splay Trees - faster lookup for frequently-accessed elements
- Treap - combination of tree and heap (may not be presented by a midshipman registered for Randomized Algorithms)
- B-Tree - minimizes hard disk accesses when data is too large to be held in RAM
Hash Tables
- Incremental resizing - allows true \(O(1)\) inserts for situations where amoritizing the \(O(n)\) resize is unattractive
- Cuckoo hashing - collision handling that allows for \(O(1)\) worst-case lookup
- Coalesced hashing - combination of separate chaining and open addressing to avoid speed problems of open addressing without wasted memory of separate chaining
- Distributed Hash Tables - used to store the data of a hash table across multiple machines on a network; all data accessible from any of the machines
Heaps
- Binomial heap - insert in constant amortized time, merging of heaps in \(O(\log(n))\) time
- Fibonacci heap - insert, decrease key (like for Dijkstra's algorithm), and merge in constant amortized time
- Beap - insert, remove, and find in \(O(\sqrt{n})\)
Strings
- PATRICIA Tries - space-optimized string storage
- Rope - used for storing and modifying very long strings, like the entirety of a text document
Other Crazy Stuff
- Multidimensional range trees - used for storing large-dimensional point clouds
- Bloom filter - quickly returns a probably-correct answer to the question "Does this piece of data appear in this set?"
- Disjoint-set forests - used to handle multiple sets of data. Quickly merges sets and quickly answers "which set does this piece of data belong to?"
- Merkle tree - used to verify that data being received is uncorrupted and unchanged. Applications in security or unreliable data transmission
What you'll give me
You'll prepare both a report and a presentation, which will be e-mailed to me by 0700 the morning of your presentation. So that we can find the appropriate files and keep things moving, the filename of your slides and report should contain your last names.
The Report
Your report should be IN PDF FORMAT. Word and OpenOffice have lots of crazy versioning and formatting issues that make them awful formats for document dispersal. Computer people have given up on convincing non-computer people of this, but at least you can know better.
Each question merits a meaty answer of several paragraphs. I'm not going to give you a recommended length. If it's clear, answers all the questions, and is short, that's fine. If it's long, that's fine too. Diagrams and pictures are great and likely very helpful; they will be looked upon with favor. Ideas should be clearly communicated, and diagrams should be original in content (ie, don't just reproduce a diagram on wikipedia). The end of the document should include a references section to show me where you learned this material. After reading the report, I need to understand these concepts as well as you do.
- Introduce your data structure or concept. What problem does it try to solve? Is it related to something we already know? How is it built, and what methods does it perform?
- What applications is this data structure well suited for? Give me a few original examples of real-life problems that can be solved, in part, with your data structure.
- How do each of its methods work? What is their runtime? If your data structure isn't focused at optimizing runtime, what is it trying to optimize? If it is space, talk about how it optimizes space. If it's something else, talk about how this data structure optimizes the thing it is trying to optimize.
- What does this data structure do well? What does it do poorly? Under what circumstances would you use it, and under what circumstances would you use something else?
The Presentation
You will give a five-minute presentation, plus two minutes for questions for your classmates. This is an absurdly short amount of time, which allows for only an introduction. To do this well, you will have to be streamlined and practiced. Both midshipmen should speak. You will be cut off when you run out of time, and your presentation will be graded based on what was presented.
To ensure no procrastination, a rough draft of slides should be e-mailed to me by 2359, November 30.
Upon being asked how long it took to prepare a speech: "It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if an hour, I am ready now." -- Woodrow Wilson (likely apocryphal)
Grading
The report will be graded using my normal letter system, based on clarity, completeness, and correctness.
The presentation will be graded using this rubric.
In addition, I will set up a way for you to submit comments to the presenters. The usefulness and depth of your comments will be graded on a 1-5 scale.
60% of your project grade will come from the report. 35% will come from your grade on the presentation. The remaining 5% will come from your comments. Both partners will receive the same grade on the report and presentation.