Threads

  1. 2 things make a process a process: its resources (address space, etc), and the fact it gets scheduled. Every process differs from every other one in that they have different address spaces, and they get scheduled at different times.
  2. These are orthogonal properties, we could remove one with out the other.
  3. If we create an entity that gets run at separate times, but shares resources with other of these entities, then we have what are know as threads.
    aside: Dragons hate threads. Here's a picture of a dragon burning threads:

    always be careful letting a dragon near your operating system.
  4. The management of these new "lightweight processes" (they lack the baggage of their own address space) is usually done by the OS.
  5. Basic Multithreading
    1. for a multi-threaded process, we still have a single address space and we have a PCB.
    2. But if we have multiple threads, we will need a Thread Control Block for each thread, and we will need a separate call stack too.
    3. This way threads will be at different places in the same code, modifying the same data.
    4. Benefits: Faster creation of thread than process, faster exiting, faster switching, easy communication between threads.
    5. Examples: GUI threads while program running; network servers, running threads on different processors, etc.
    6. All this is added complexity for the OS. It dispatches at the thread level, but if a process is blocked or suspended, then all threads are stopped.
    7. Thread functionality:
      1. Spawn. When a process is spawned, a thread for that process is spawned. That thread can spawn other threads.
      2. Block. A single thread can block for, say, I/O.
      3. Unblock, obviously.
      4. Finish.
    8. Synchronization? Next week.
  6. Types of threads
    1. 2 Kinds of threads, user level and kernel level.
    2. In user level threads, the application itself manages the threads. Essentially when the application compiles, the thread library linked in contains a little dispatcher. When the process is running, the dispatcher can schedule the threads.
    3. Since this dispatcher isn't in the kernel, it can't be invoked with an interrupt, so the only way to switch threads is for a thread to call the user-level dispatcher directly.
    4. Advantages:
      1. thread switches are fast since we don't have to jump to the kernel.
      2. Can create application specific scheduling.
      3. Can run on any OS.
    5. Disadvantages:
      1. When a thread blocks, the process blocks.
      2. Cannot use multiple threadsfrom same process on multiple cores.
    6. In kernel level threads, management is done in the kernel, just like processes. Basicaly, schedule threads, not processes.
  7. Multicore
    1. Amdahl's law: The speedup of parallelizing a process that takes time T on N processes, where f is the fraction of the time that is parallelizable is:
      \[\begin{align} S & = \frac{ST}{PT} \\ & = \frac{T}{\frac{fT}{N}+(1-f)T}\\ & = \frac{1}{(1-f)+\frac{f}{N}} \end{align} \]
    2. This looks good, but note that even with minimal amounts of serial code (%10), the speedup nearly halves.
    3. In "embarasingly parallel" problems, we often do get perfect speedup.
    4. Here's a diagram of the threads in Valve's Engine software
  8. Semantics of fork() and exec() system calls Does fork() duplicate only the calling thread or all threads? System dependent. Many version of UNIX have multiple fork calls to handle. Linux is weird.
  9. Signal handling
    1. All threads
    2. The thread that wants it
    3. a handler thread.
  10. Thread pools
  11. Java
    1. Implemented by JVM.
    2. Typically NOT user threads, though originally was
    3. JVM calls kernel level threads
    4. No global variables, so thread must be passed shared object.
  12. Windows
    1. Implemented pretty much as described.
    2. uses C++, so a process is an object.
    3. In XP, each thread contains:
      1. A thread id
      2. Register set
      3. Separate user and kernel stacks
      4. Private data storage area
      5. In the kernel there is an executive thread bloc
    4. Windows 7 also allows user level threads they call Fibers.
    5. "In general, fibers do not provide advantages over a well-designed multithreaded application. However, using fibers can make it easier to port applications that were designed to schedule their own threads."
  13. Linux
    1. Linux does not do threads as described above. It uses the more generic term "tasks".
    2. The traditional fork( ) system call completely duplicates a process ( task ).
    3. clone( ) creates new task that shares resources
    4. one of the arguments is a flag:
      1. CLONE_FS File-system information is shared
      2. CLONE_VM The same memory space is shared
      3. CLONE_SIGHAND Signal handlers are shared
      4. CLONE_FILES The set of open files is shared
    5. All of them together make for a traditional thread model.