Project 4: Mystery Data Structures
Due: Thursday, 2 Aug 2012, at 23:59:59 PDT
Overview
In contrast to P1-P3, in which you implemented a data structure, in P4 you will be
examining data structures (using computational methods) whose types you do not know. You must perform experiments, in the
form of measurements of the time cost of different operations performed on the mystery data structures,
to deduce the data structures' types.
Each student has been assigned a set of 5
data structures whose types must be determined. To "fetch" each of these
data structures, you should call MysteryDataStructure.getMysteryDataStructure(cs12UserID, i, new Integer(0)).
The first parameter must be your CSE 12 user ID -- this is crucial because each student receives a different set of
data structures to analyze! The second parameter i specifies which of the mystery data structures (0-4) to "fetch".
The third parameter can be any object of type T Comparable<? super T> -- reasonable examples include
Integer and String. The purpose of this third parameter is to let you specify the type of element
you want your mystery data structure to store. Integer should work fine and there's no compelling reason (that I see)
to change it, but you have that flexibility -- just instantiate a new Whatever() and pass it in as the third
parameter to getMysteryDataStructure. The MysteryDataStructure.getMysteryDataStructure
returns an instance of (interface) type Collection12.
The DeduceTheMysteryDataStructure.java file should get you started.
Also see the Javadoc for more information
on the mystery data structure and related classes.
Once you've "fetched" each of the 5 data structures, you will need to perform some experiments on it to
deduce its type. There are four possible data structures that each mystery data structure might be, and each has
associated time cost properties:
- Doubly-linked list. O(1) time cost for add, O(n) for remove and contains in worst-case.
- Heap. O(log n) time cost for add, O(n) for remove and contains in worst-case.
- Binary search tree (self-balancing). O(log n) time cost for add, remove, and contains in worst-case.
- Hash table. O(1) time cost for add and contains in average-case.
Your 5 assigned mystery data structures will be chosen from the pool above. There is no guarantee that you will receive
one of each -- each of the 5 mystery data structures is chosen independently and uniformly at random from the above set of 4.
In addition, you may receive multiple data structures of the same type that have slightly different performance characteristics -- their
asymptotic complexities would have to be the same, of course, but the constant factors associated with the time costs could vary
substantially.
You may (and should) rely on the time cost properties to deduce the identity of each mystery data structure. In particular,
your measurement code should measure the time cost of various operations (e.g., add, remove, contains) for various numbers of
elements N stored in the container. Of course, to populate the container with N elements, you'll need to
call the add method on the mystery data structure. To measure the time of various operations, you should use
the ArtificialClock class, in particular the getNumTicks method, which returns the "current time" of the
simulated clock. An example is shown in the boilerplate DeduceTheMysteryDataStructure.java file.
Compiling and running
You will need to have the P4Stuff.jar file in the current working directory; this file is necessary both for compilation
and execution: To compile: javac -cp .:P4Stuff.jar DeduceTheMysteryDataStructure.java. To run:
java -cp .:P4Stuff.jar DeduceTheMysteryDataStructure.
Requirements
Note that you may not use Java reflection or related means to "hack" your way into the identity of each data structure. You must
run experiments based on the time costs of various Collection12 operations. Submission requirements:
- You need to create a single PDF file Analysis.pdf containing:
- The type (linked list, heap, binary search tree, or hash table) of each of your mystery data structures, indexed 0-4.
- Empirical evidence that led you to infer the type you specified. This evidence should take the form of graphs, showing
the time cost of various Collection12 methods as a function of N, the number of data stored in the
container. The purpose of each graph is to reveal the asymptotic time cost of the various operations. Given the
inferred asymptotic complexities of various Collection12 operations, you can deduce the identity of
mystery data structure. In addition to the graphs, you should also give a short written justification for how you decided
the identity of each structure. For example, you might write: "Graph 1 shows the time cost of the add
method for Mystery Data Structure 0. The slope of the curve grows smaller as N grows larger, and the curve is clearly O(log n).
Based on that plus other evidence (etc. etc.) we conclude that the mystery data structure must be a binary search tree."
- In addition, you need to submit your DeduceTheMysteryDataStructure.java file containing your code that runs the experiments
(time measurements on each mystery data structure).
Sample graphs
Both graphs plot the time cost of the contains(o) of one of my own mystery data structures.
How big should n be?
For this assignment, n=10000 should be completely sufficient. In my own solution, I use the following different values for n:
final int[] Ns = { 1, 2, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000 };
If you try to add a huge amount of data to your mystery data structure, the Java virtual machine's
(JVM) memory may be exhausted. In this case, if you try to call add(o) to any of the P4 data structures, the JVM will simply terminate
due to lack of memory.
Hints
- It will be important to average experimental results to reduce the noise in your graphs. The simulated running times
(as measured by the ArtificialClock) will vary somewhat from trial to trial.
- To create graphs showing the time costs of various operations, your DeduceTheMysteryDataStructure.java experimental code
should output data that you can then plot in whatever graph program you prefer (e.g., gnuplot, Matlab, Excel, whatever).
Extra Credit
For extra credit you can implement the AlgorithmClassifier interface -- see the Javadoc
for more information. The job of the classifier is to take as input a mystery data structure and then output what kind
of data structure it is automatically. Your classifier will be tested on a large number of data structures, and guessing will
be penalized during evaluation. You should implement your classifier in a class called AlgorithmClassifierImpl.
Grading
The maximum score on P4 is 9 points. The extra credit is worth 5 points maximum.
Submission
Submit your work using the
bundleP4 script in the directory containing your DeduceTheMysteryDataStructure.java and
Analysis.pdf files. If you choose to do the extra credit, then you should also submit your
AlgorithmClassifierImpl.java file (otherwise, don't worry about this file).
Make sure you are logged on to ieng6.ucsd.edu using your CSE 12-specific account.
As always, make sure to submit before the deadline -- the script will show you (or anyone else) no mercy!