Project Title: Data Processing Task Scheduler using Priority Queues

The asignment is to implement a task scheduler that manages the execution of data processing jobs. These tasks will be assigned a priority based on deadlines, computational cost, and importance. The scheduler will use a priority queue to determine the order in which tasks should be executed. Your program will simulate how the jobs can be scheduled and run.

Project Requirements

  1. Priority Queue Implementation
    Use a binary heap to implement a priority queue. Each task will be inserted into the queue with a priority value. The highest priority task (based runtime cost, deadlines and importance) should be executed first.
  2. Task Definition
    Each task will have the following attributes:
  3. Task Scheduler
    The scheduler is called at the start of the program and at the completion of each task. It uses priorites computed as follows:

    If the Computational cost is \(c\), the time until deadline is \(d\) and importance \(i\) is 1 if low, 2 if medium and 3 if high, the priority \(p\) should be computed as: \[ p = \frac{w_1}{c} + \frac{w_2}{d} + w_3 i \] where \(w_1,w_2, w_3\) are weights input at the start of the program.

    Each time it is run, the scheduler first must recompute all the priorities of all the jobs because the time until dealine has changed. It should then use heapify to rebuild the priority queue. Then the scheduler takes any tasks that have arrived since it was last run, computes their priority, and adds them to the priority queue.

    Once all new arrivals have been added to the queue, the next job to run is removed from the queue and given to the simulator.

    If the job selected can not be completed on time, it should still be selected and run, however, if the deadline has already passed and its priority is not low, then its priority should be set to low and added back to the queue. Then a different task is to be run. If a task's deadline has passed and its priority is already low, you should just run it. Finally, if the time to deadline \(d\) is <= 0, use 1 instead to prevent negative values and dividing by 0.

  4. Simulation and Metrics: Simulate the execution of tasks over time. When a task is executed, the simulator updates the passage of time based on the task’s computational cost. It should print out information on the task in the following format:
    Executing Task 1: Clean Data for Model A (Start Time: 0, End Time: 30) [Priority: 3.0416]

    After all the tasks have completed, the simulator shouls report

    System Metrics:
    - Tasks completed before deadline: 3/3
    - Average waiting time: 10 
    - System idle time: 0 
      

    Example run

    A small task file found here would generate the following output:
    (base) crabbe@lnx1065134govt:~/www/SD311/$ python main.py 
    Task file name: tasks.csv
    Weight 1 (cost): 1
    Weight 2 (time left): 1
    Weight 3 (importance):1
    Executing Task 1: Clean Data for Model A (Start Time: 0, End Time: 30) [Priority: 3.0416666666666665]
    Executing Task 5: Optimize Hyperparameters (Start Time: 30, End Time: 150) [Priority: 3.012037037037037]
    Executing Task 4: Collect New Data (Start Time: 150, End Time: 165) [Priority: 2.0666666666666664]
    Executing Task 3: Generate Data Report (Start Time: 165, End Time: 185) [Priority: 2.05]
    Executing Task 2: Train Model 5 (Start Time: 185, End Time: 275) [Priority: 2.077777777777778]
    System Metrics:
    - Tasks completed before deadline: 3/5
    - Average waiting time: 93.4
    - System idle time: 0
        
    Note that the weights are input one at a time by the user. The tasks file name is also input by the user, buthte file itself is a CSV file.

    Restrictions

    If you examine the course policy, you will see it actually has very strict restrictions on sources you may consult, consistent with the CS major.  I have decided that this is not helpful in the context of the Data Science major and that I will relax that constraint.  You may use any Internet resources you wish, with the exception of AI/large language models.