Project 5: Mystery Data Structures
Due: Wednesday, 31 Aug 2011, at 23:59:59 PST
Overview
In contrast to P1-P4, in which you implemented a data structure, in P5 you will be
examining data structures (using computational methods) whose types you do not know. You must perform experiments, in the
form of measurements of the time cost of different operations performed on the mystery data structures,
to deduce the data structures' types.
Each student has been assigned a set of 5
data structures whose types must be determined. To "fetch" each of these
data structures, you should call MysteryDataStructure.getMysteryDataStructure(cs12UserID, i, new Integer(0)).
The first parameter must be your CSE 12 user ID -- this is crucial because each student receives a different set of
data structures to analyze! The second parameter i specifies which of the mystery data structures (0-4) to "fetch".
The third parameter can be any object of type T Comparable<? super T> -- reasonable examples include
Integer and String. The purpose of this third parameter is to let you specify the type of element
you want your mystery data structure to store. Integer should work fine and there's no compelling reason (that I see)
to change it, but you have that flexibility -- just instantiate a new Whatever() and pass it in as the third
parameter to getMysteryDataStructure. The MysteryDataStructure.getMysteryDataStructure
returns an instance of (interface) type Collection12.
The DeduceTheMysteryDataStructure.java file should get you started.
Also see the Javadoc for more information
on the mystery data structure and related classes.
Once you've "fetched" each of the 5 data structures, you will need to perform some experiments on it to
deduce its type. There are four possible data structures that each mystery data structure might be, and each has
associated time cost properties:
- Doubly-linked list. O(1) time cost for add, O(n) for remove and contains in worst-case.
- Heap. O(log n) time cost for add, O(n) for remove and contains in worst-case.
- Binary search tree (self-balancing). O(log n) time cost for add, remove, and contains in worst-case.
- Hash table. O(1) time cost for add and contains in average-case.
Your 5 assigned mystery data structures will be chosen from the pool above. There is no guarantee that you will receive
one of each -- each of the 5 mystery data structures is chosen independently and uniformly at random from the above set of 4.
In addition, you may receive multiple data structures of the same type that have slightly different performance characteristics -- their
asymptotic complexities would have to be the same, of course, but the constant factors associated with the time costs could vary
substantially.
You may (and should) rely on the time cost properties to deduce the identity of each mystery data structure. In particular,
your measurement code should measure the time cost of various operations (e.g., add, remove, contains) for various numbers of
elements N stored in the container. Of course, to populate the container with N elements, you'll need to
call the add method on the mystery data structure. To measure the time of various operations, you should use
the ArtificialClock class, in particular the getNumTicks method, which returns the "current time" of the
simulated clock. An example is shown in the boilerplate DeduceTheMysteryDataStructure.java file.
Compiling and running
You will need to have the P5Stuff.jar file in the current working directory; this file is necessary both for compilation
and execution: To compile: javac -cp .:P5Stuff.jar DeduceTheMysteryDataStructure.java. To run:
java -cp .:P5Stuff.jar DeduceTheMysteryDataStructure.
Requirements
Note that you may not use Java reflection or related means to "hack" your way into the identity of each data structure. You must
run experiments based on the time costs of various Collection12 operations. Submission requirements:
- You need to create a single PDF file Analysis.pdf containing:
- The type (linked list, heap, binary search tree, or hash table) of each of your mystery data structures, indexed 0-4.
- Empirical evidence that led you to infer the type you specified. This evidence should take the form of graphs, showing
the time cost of various Collection12 methods as a function of N, the number of data stored in the
container. The purpose of each graph is to reveal the asymptotic time cost of the various operations. Given the
inferred asymptotic complexities of various Collection12 operations, you can deduce the identity of
mystery data structure. In addition to the graphs, you should also give a short written justification for how you decided
the identity of each structure. For example, you might write: "Graph 1 shows the time cost of the add
method for Mystery Data Structure 0. The slope of the curve grows smaller as N grows larger, and the curve is clearly O(log n).
Based on that plus other evidence (etc. etc.) we conclude that the mystery data structure must be a binary search tree."
- In addition, you need to submit your DeduceTheMysteryDataStructure.java file containing your code that runs the experiments
(time measurements on each mystery data structure).
Sample graphs
Both graphs plot the time cost of the contains(o) of one of my own mystery data structures.
Hints
- It will be important to average experimental results to reduce the noise in your graphs. The simulated running times
(as measured by the ArtificialClock) will vary somewhat from trial to trial.
- To create graphs showing the time costs of various operations, your DeduceTheMysteryDataStructure.java experimental code
should output data that you can then plot in whatever graph program you prefer (e.g., gnuplot, Matlab, Excel, whatever).
Extra Credit
For extra credit you can implement the AlgorithmClassifier interface -- see the Javadoc
for more information. The job of the classifier is to take as input a mystery data structure and then output what kind
of data structure it is automatically. Your classifier will be tested on a large number of data structures, and guessing will
be penalized during evaluation. You should implement your classifier in a class called AlgorithmClassifierImpl.
Extra Extra Credit
For extra extra credit, you can implement the AlgorithmClassifier interface in a class called AlgorithmClassifierImpl
without using the ArtificialClock. Instead, you must use the System clock -- e.g.,
System.currentTimeMillis() or System.nanoTime() -- to perform all the experiments. You will have to "fight"
against the garbage collector, which will run unexpectedly and skew your measurements considerably. In order to prevent the
garbage collector (and any other large sources of noise) from distorting your results in a systematic way, it may be useful to
randomize the order of your experiments -- don't run all your trials for a particular value of N all at once; instead,
randomly pick an N value and then run one trial for it.
If you complete the extra extra credit, you will receive one (1) Reese's peanut butter cup, presented to you during the last
day of lecture (Thursday, 1 September 2011), along with my congratulations.
Grading
The maximum score on P5 is 10 points. The extra credit is worth 5 points maximum. The extra extra credit is worth
1 Reese's peanut butter cup.
Submission
Submit your work using the
bundleP5 script in the directory containing your DeduceTheMysteryDataStructure.java and
Analysis.pdf files. If you choose to do the extra credit, then you should also submit your
AlgorithmClassifierImpl.java file (otherwise, don't worry about this file).
Make sure you are logged on to ieng6.ucsd.edu using your CSE 12-specific account.
As always, make sure to submit before the
deadline -- the script will show you (or anyone else) no mercy!