Executive Summary: Service Selection at the OMA

The year is 1984 and the world is ruled by just a few super-states. One of them, Oceania, is dominant and controls both North and South America, sourthern Africa, and Australia. It is engaged in a perpetual, unwinnable war with the other super-powers for the purpose of subjugating its own citizens.

To support the perpetual war, the Oceania Military Academy (OMA) churns out an amazingly high number of graduates every year (in the millions), and the process of selecting which part of the military service they will each enter is a daunting task. This being Oceania, the students themselves (called Proles) have no say in their own service selection. Instead, they are all ranked by each of the different communities, who then take turns in picking which Proles they will get.

You will design an algorithm and write a program to do service section for Oceania, which puts a premium on efficiency!


Your task is to design an algorithm and write a computer program that reads in the names of the k different communities, as well as the n Proles and their rankings in each community, and prints the results of the service selection. This works similarly to a draft: the k communities, in the order they were read in, take turns picking, one at a time, until there are no more Proles left. For each pick at each iteration, the community chooses the highest-ranking Prole (according to its own ranking) that has not been picked yet.

Specifically, the input, which should be read from a file whose name is the program's sole argument, contains in order

The output, which should be written to standard out, should contain n lines, the first being the first community's choice, written as

ProleName ComunityName
then the second community's choice, and so on, wrapping around after the last community, over and over until all Proles have been assigned.
sample input in0.txt sample run commentary
Tank Drivers
Bill 4 5 1 
Suzy 1 1 4 
John 2 4 5 
Kate 3 2 3 
Mike 5 3 2
> ./selection in0.txt
Suzy Tank Drivers
Kate Infantry 
Bill Paratroopers 
John Tank Drivers
Mike Infantry

The Tank Drivers community will choose Suzy first. Then the Infantry would like to choose Suzy but can't, so they pick Kate instead. Next the Paratroopers pick Bill, and so on.

Provided Code

The following programs are available to help you:

Note: You can get all these files at once by downloading the file proj1.tar.gz and running tar xzvf proj1.tar.gz

Dr. Roche has written a simple algorithm to solve the service selection problem and implemented it in Python, C++, and Java. Those are the selection.X files above. Start by reading over one or all of these to see how my algorithm works. Note: this is a really slow way to solve the problem! Your job will be to improve it, of course.

The other three files are to help you generate test cases. The text files jobs.txt (Marine MOS codes from the Vietnam era) and names.txt (All first names with frequency at least 5 in the United States in 1984) contain a whole bunch of "communities" and names. The gencases.py program will read from these text files and write (to standard out) a sample output with n and k as specified on the command line. For example:

> python3 gencases.py 5 3
Basic Intelligence Marine
Air Support Control Officer
Aircraft Structures Mechanic A-6/EA-6
Rebeca 2 2 2
Kecia 1 4 1
Kereem 3 3 5
Reno 4 5 4
Lashauna 5 1 3

You can use this program to test your code. For example, here is a bash session showing me creating a sample input in in.txt, then testing all three programs, and finally using the diff utility to make sure the outputs all match.

> python3 gencases.py 100 10 > in.txt
> python3 selection.py in.txt > out1.txt
> make -f Makefile-cpp
g++ -O6 selection.cpp -o selection
> ./selection in.txt > out2.txt
> make -f Makefile-java
javac selection.java
> java selection in.txt > out3.txt
> diff -q out1.txt out2.txt  ← Note: when diff outputs nothing, the two files match!
> diff -q out1.txt out3.txt  ← Note: when diff outputs nothing, the two files match!

Once you are convinced your program is correct, you will probably want to know how fast it is. The Linux time program is a great resource for this. Since we don't care about the output when doing timings, you could do the timing in one line. For example, to time the Python version of the algorithm with n=500 and k=15, you could do:

> python3 gencases.py 500 15 > tmp.txt
> /usr/bin/time -v python3 selection.py tmp.txt > /dev/null
        Command being timed: "python3 selection.py tmp.txt"
        User time (seconds): 4.11
        System time (seconds): 0.01
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 27184
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1855
        Voluntary context switches: 2
        Involuntary context switches: 205
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

This tells us that it took about 4.22 seconds and 27MB to complete the selection process.

What to turn in, when and how

This project is due by COB on 6 Feb. It will be considered to have been turned in by COB 6 February if the paper part has been slid under my door by the time I arrive the next morning, and the electronic part has been submitted by that time as well. No late work will be accepted.
  1. Analysis of the "Dr. Roche" algorithm: Analyze Dr. Roche's "simple service selection algorithm", as provided in the starter code. The algorithm and running time is exactly identical between the three programming languages; I suggest you read over all three to help you understand what is going on. I want a big-O analysis of the worst-case cost, in terms of k and n. Try to make your analysis as tight as possible. The piece of paper with your analysis should clearly present your analysis and its justification.
  2. Description and analysis of your own new-and-improved algorithm: Describe an improved algorithm for the same problem in pseudocode, and give a worst-case big-O analysis, in terms of $k$ (the number of communities) and $n$ (the number of Proles). Try to give a tight analysis, and try to design your algorithm to be as efficient as possible. If necessary, you may assume that $k$ is much smaller than $n$. So for example, $\Theta(kn)$ time is much better than $\Theta(n^2)$ time. The piece of paper with your new algorithm and its analysis should clearly present the algorithm (i.e. I should be able to write a program from your description) and clearly present its analysis (i.e. justify it).
  3. Implement your improved algorithm: Implement your improved algorithm, along with any other tweaks you can think of, so that your code is as fast as possible. I recommend that you code in either C++, Java, or Python, but you may use any programming language that you wish, as long as it runs in the Linux environment in the labs. Again: remember that the algorithm implemented by your program and analyzed in your written submission must be the same! One may be pseudo-code, and the other real code in whichever language, but the underlying algorithm must be the same.

    You will submit your project code electronically using LCDR Kenney's submit system: submit.cs.usna.edu You absolutely need to submit every file necessary to compile and run your program, along with a file named Makefile that can be used with make to build your program, determine your usename (e.g. m189999), determine the command required to run your program. To that end, it must define "targets" build, user and command.
    Important: use one of Makefile-cpp / Makefile-java / Makefile-python as a template to base your "Makefile" on.

    Example C++Example JavaExample Python
    $ make build
    g++ -O6 selection.cpp -o selection
    $ make user
    $ make command
    $ make build
    javac selection.java
    $ make user
    $ make command
    java selection
    $ make build
    do nothing
    $ make user
    $ make command
    python3 selection.py

    How to submit: You will submit with LCDR Kenney's submit system. If you do not have the current (AY2018) version of his submit script, download it by logging in to submit.cs.usna.edu, choosing "Download Submit Script" from the menu with your username at the top left, saving that script to your ~/bin directory, and using chmod to make it executable. With that script in hand you would submit it with a command like the following one (my solution only consists of the two files drbrown.cpp and Makefile):

    ~/bin/submit -c=si335 -p=proj01 drbrown.cpp Makefile