Project 3: Research a Data Structure

This is the archived website of IC 312 from the Fall 2014 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

1 Overview

We've looked at a lot of data structures. Each data structure did some things well, and other things poorly. Of course, we've barely scratched the surface of things we could have covered in this class. There are data structures to optimize for things other than operations, like memory, or cache hits, or network activity. There are data structures specifically for points in large-dimensional spaces, or Strings, or other things. There are researchers who spend their whole careers thinking about how new data structures can be used to solve problems better, faster, simpler, or easier. In this project, let's learn about some of these!

This is a project to be performed in groups of two or three. You will research a new data structure or concept, become an expert on it, write up what you know, maybe write some code, and perform an oral report. There is a good chance you will know more about your data structure than I will!

2 Timeline

Because there are multiple parts to this project, there are multiple deadlines:

  • Wednesday, November 12: Groups and topics must be decided before your class starts on this date.
  • Monday, November 24: Your written paper, presentation slides, and implementation (if necessary) must be submitted electronically by the usual time on this date.
  • Last two weeks of class: Project presentations in class, according to the schedule you decided on.

3 Suggested Topics

Below are some suggestions for your project. If you know about some other data structure and want to work on that instead, then you should run it by Dr. Roche. (It'll probably be fine.) Each class section may only have one group work on a given data structure or concept.

(The topic list is randomly shuffled every time you refresh this page... because why not.)

  • Coalesced hashing: combination of separate chaining and open addressing to avoid speed problems of open addressing without wasted memory of separate chaining
  • Fibonacci heap: insert, decrease key (like for Dijkstra's algorithm), and merge in constant amortized time
  • Skip Lists: linked lists with expected \(O(\log(n))\) lookup due to randomization.
  • Red-Black Trees: equivalent to 2-3-4 Trees
  • R-tree: automatically determines bounding rectangles to store nearby points on a map in a logical way.
  • van Emde Boas tree: created by the Dutch to take over the world. By which I mean, do the operations of a Map as fast as \(O(\log\log n)\), but only if your keys are all integers
  • B-Tree: minimizes hard disk accesses when data is too large to be held in RAM
  • VLists: linked arrays for faster expected access time than a standard linked list.
  • Treap: combination of tree and heap
  • Splay Trees: faster lookup for frequently-accessed elements
  • Rope: like a string, but can do modifications much easier, especially if the string is very long.
  • Cuckoo hashing: collision handling that allows for \(O(1)\) worst-case lookup
  • K-D Trees: used for storing points in k dimensions
  • Bloom filter: quickly returns a probably-correct answer to the question "Does this piece of data appear in this set?"
  • Binomial heap: insert in constant amortized time, merging of heaps in \(O(\log n)\) time
  • Merkle tree: used to verify that data being received is uncorrupted and unchanged. Applications in security or unreliable data transmission
  • Suffix tree: a search tree that lets you quickly search for whether a string occurs anywhere in a large text document
  • Beap: insert, remove, and find in \(O(\sqrt{n})\)
  • PATRICIA Tries: space-optimized string storage
  • Multidimensional range trees: a map where the keys are points in multi-dimensional space
  • Self-organizing list: a list that reorders itself according to how its elements are used. You should focus on the Move-To-Front (MTF) ordering method.
  • 4 The Report

    Your report must be IN PDF FORMAT. Word and OpenOffice have lots of crazy versioning and formatting issues that make them awful formats for document dispersal. Computer people have given up on convincing non-computer people of this, but at least you can know better.

    Your report should address the questions below. Some questions might not make sense or might not have a good answer for your data structure; then don't answer them. When I say "should address the questions below", I mean that through the course of reading your report I should understand that question. It does not mean that I want a list of questions/answers; you need to write paragraphs with a nice flow and good organization and all those good things.

    I'm not going to give you a recommended length. If it's clear, answers all the questions, and is short, that's fine. If it's long, that's fine too. Diagrams and pictures are great and likely very helpful; they will be looked upon with favor. Ideas should be clearly communicated, and diagrams should be original in content (ie, don't just reproduce a diagram on wikipedia). The end of the document should include a references section to show me where you learned this material. After reading the report, I need to understand these concepts as well as you do.

    1. Where did this data structure come from? Who thought it up, when, and why?
    2. Is this related to any other data structures we learned about? How does it compare?
    3. How is the data structure constructed, and what methods does it perform?
    4. What applications is this data structure well suited for? Give me a few original examples of real-life problems that can be solved, in part, with your data structure.
    5. How do each of its methods work? What is their runtime? If your data structure isn't focused at optimizing runtime, what is it trying to optimize? If it is space, talk about how it optimizes space. If it's something else, talk about how this data structure optimizes the thing it is trying to optimize.
    6. What does this data structure do well? What does it do poorly? Under what circumstances would you use it, and under what circumstances would you use something else?
    7. If I wanted to use this data structure, is it easy to find an implementation of it? Is it in Java's standard libraries, or those of other programming languages? How hard does it seem to be to implement?

    5 The Presentation

    Your time slot will be 5 minutes per person, meaning 2-person groups get 10 minutes and 3-person groups get 15. But you must allow at least 2 minutes for questions from your classmates, and a minute or so to setup the next talk, so really should prepare 7-minute presentations for 2-person groups and 13-minute presentations for 3-person groups.

    This is an absurdly short amount of time, which allows for only an introduction. To do this well, you will have to be streamlined and practiced. All group members should speak. You will be cut off when you run out of time, and your presentation will be graded based on what was presented.

    You need to be sure to sign up for a date when all group members will be present in class. No matter what date you have signed up for, your visual aids are due to me by the same deadline as a report, to ensure that you've got everything prepared before Thanksgiving.

    Upon being asked how long it took to prepare a speech: "It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if an hour, I am ready now." -- Woodrow Wilson (likely apocryphal, but the sentiment is true)

    6 The Implementation

    3-person groups are required to submit a Java implementation using your data structure. 2-person groups may choose to do this as well, which will mean the written report counts for a smaller portion of your grade (although the expectations will be the same).

    There is a great deal of flexibility on how your implementation can work. What I really need is some kind of demonstration that I can run, that compiles in Java with no special requirements. You need to include a README.txt file that explains which file contains your main method that I'm going to run, as well as any other info I need to know about what your demonstration is doing and any resources you used to work on it.

    Some data structures are harder to implement than others, so there is flexibility in how good your implementation needs to be. In some extreme cases, it might be okay to find an open-source implementation online and just make a demonstration program that uses that. It would be a good idea to email Dr. Roche and clear the idea of what you're going to do to make sure it's sufficient.

    If your demonstration program does not work, you should not expect to earn a good grade for this part.

    7 Grading

    The report will be given a letter grade based on clarity, completeness, and correctness. Aside from "being right" and "sufficiently answering the questions," grammar/spelling/style absolutely counts. Even computer people have to write.

    The presentation will be graded using this rubric.

    The implementation will earn a letter grade according to whether it works, how well it works, and how interesting/impressive is the demonstration.

    In addition, I will set up a way for you to submit comments to the presenters. The usefulness and depth of your comments will be graded on a 1-5 scale.

    • 60%: Written report (if you didn't do an implementation)
    • 40%: Written report (if you did an implementation)
    • 20%: Implementation (only required for 3-person groups)
    • 35%: Presentation
    • 5%: Quality of your comments

    All group members will receive the same grade on the written report, presentation, and implementation (if any).