GDSC Space

CS-2 Study: Operating System

1. The "Big Picture" of OS

1-1. Role / Feature of OS

Operating System (OS in abbreviation) is a special type of program. Widely known ones are Windows, MacOS, and Linux (which are for Desktop), Android and iOS (which are for mobile smartphones).

Although detailed features may vary, the "key features" as the OS remain the same for all of them.

The part that runs the "key feature of OS" is called "Kernel".

It's got two main features: resource allocation and process/thread management.

CPU Scheduling - CPU's fair allocation and order of using CPU.

Basic Concepts: Priority, Scheduling Queue, Pre-emption/Non-preemption
CPU Scheduling Algorithm
Linux CPU Scheduling

Memory Management: Virtual Memory - Efficient memory management.

Physical memory, Logical memory
Memory allocation
Paging, Page Replacement Algorithm

File/Directory Management: File System - utilizing disks, etc.

File/Directory
File System

Process and Thread Management: how to allocate resources for various processes and threads.

Process and Thread
Synchronization and Deadlock state

So, in Brief, the map sort of looks like this:

Big Picture of OS

Kernel
System call

Process / Thread Management

Process and Thread
Synchronization and Deadlock

Resource Allocation / Management

CPU Scheduling

Basic Concepts: Priority, Scheduling Queue, Pre-emption/Non-preemption
CPU Scheduling Algorithm
Linux CPU Scheduling

Memory Management: Virtual Memory

Physical memory, Logical memory
Memory allocation
Paging, Page Replacement Algorithm

File/Directory Management: File System

File/Directory
File System

1-2. System Call, Dual Mode

OS is also a kind of a program! So it must be stored in memory.

OS is stored in "kernel space" in memory. it's separated from "user space", where the user application programs are stored.

Now, the most important part is that: "If we want to utilize the features of OS, we must run the OS code that's stored in kernel space".

Generally speaking, user application program can't have direct access to the resources like CPU, memory, etc. So, it's more like OS is doing the work for the user application program that the user program can't do. This is done by "System Call": the interface to get the feature of OS.

There are various number of system calls. There are:

A process can make another process by system call, fork(). This child process can make their own child process as well.

Hence, the processes are managed in hierarchical way.

The one who made the process is "Parent Process", and the made process is "Child Process".

If a system call is called, following steps are done:

In OS, there's a specific command that invokes the SW interrupt.\

I/O commands (accesses to resource) are one of them; and we call the interrupt caused by those as "software interrupt".

System call is kind of a system call. So:

When System call is invoked, CPU backs up the current program state

the code for SW interrupt (which is in kernel space) is invoked

Then continues the user space's code running.

So CPU separates the mode between when it's running the user application program and kernel space's program.

When CPU is running user space's program: user mode

can't get OS features; doesn't run I/O request and such.

When CPU is running kernel space's program: kernel mode

can get OS features; can run every code including accessing to the resources and such.
You can see which mode is CPU in by looking at the supervisor flag in flag register.

System call is called very frequently! Even when we're running a simple program that prints a simple string "Hello world", system call is called over 600 times.

2. Process and Thread

Process

Types of Process:

Foreground process: interacts with user

Background process: can't be seen by user

Daemon (service in Windows OS): can't bee seen, nor user interaction: just does its job.

Even though there are various types of processes, the information in the memory that consists one process does not vary that much.

In Kernel space, PCB (Process Control Block) is stored, and there's code, data, stack, and heap segment in user space (as we know from Computer Architecture class)

User space segment

Code segment (a.k.a. text segment) << Static allocation segment

Instructions are stored

read-only

Data segment << Static allocation segment

Data that doesn't change when the program runs

e.g. static variables, global variables

Sometimes we separate what so called "BSS segment"

if there's an initial value (like static or global variable) --> Data segment
if there's no such an initial value --> BSS segment

Heap segment << Dynamic allocation segment

Dynamically allocated values are stored (e.g. malloc() in C)

Need to deallocate after using

otherwise, memory leak can happen

In some programming language, they provide garbage collection feature where it automatically deallocates heap memory that is not used.

Stack segment << Static allocation segment

Temporary values are stored

local variable, return address, parameter, etc.

function calling information can be stored as stack trace format

Stack Trace is...

function call info that was stored in stack area in specific time
useful for debugging

PCB and Context Switching

PCB

In order for OS to manage a bunch of processes in the memory, the processes need information in the kernel space to be able to be distinguished by the OS.

This information is called Process Control Block (PCB in abbreviation).

PCB is a struct that includes various information that is related with process.

struct is just like struct in C; it's a complex data type that combines various data types into one.

When a new process is stored in a memory (process was made), PCB is made; destroyed when the process is completed.

PCB contains...

Process ID (PID)
Register values that was used
Process state
CPU scheduling (priority information)
memory information
I/O information

PCBs are usually managed by process table (a bunch of PCBs)

When there's a new process, a new PCB is added to the process table; allocates required resources.
When a process is terminated, the resources are deallocated and the PCB gets deleted from process table.
when the process is not properly terminated so the PCB still remains in the process table, we call this "zombie process".

Context Switching

Process is running == CPU resource allocated by OS

Various processes take turns to run for a limited time == CPU resource allocated taking turn for each processes for a limited time

A process' CPU usage time is limited by timer interrupt (or so called timeout interrupt)

To do the timer interrupt and run the other process, the process running state must be stored somewhere (i.e. program counter, register value, memory information, external files, I/O, etc.) and its information (context) stored in PCB. Then, the following process' context is loaded.

This is called context switching.

Too frequent context switching can make cache miss chance higher --> getting the process info from PCB (memory) is more frequent --> hence bigger overhead.

There are various states for process:

New state

currently making a new process
stored in memory, allocated PCB
waits for CPU allocation

Ready state

can immediately run, but waits for the CPU allocation turn
turning ready state to running state is called: dispatch

Running state

it's currently running with CPU allocation
can use CPU for a limited time
if timer interrupt happens, it reverts to ready state
if I/O was requested, it goes to blocked state.

Blocked state

Can't be immediately run

I/O request, asking for a resource that's not immediately ready (disk, etc.)
after the request is done, it goes to ready state.

Terminated state

Process has been terminated.
OS deletes PCB from process table and deallocates the memory

(This is for blocking I/O, and apparently there's non-blocking I/O where the process request I/O and it doesn't send to blocked state but rather just runs the following command, etc. Make sure to check out the book for this!)

2-1. Multi-Process && Multi-Thread

Multi-Process

What if I want to run codes that consists a single process simultaneously?

same program, different processes --> run simultaneously

Various processes running simultaneously

Like this, we can see multiple processes running simultaneously.
They do not share the resources, they run individually. Even though they run the same task, their PID values are different; resources are individually allocated --> does NOT (not entirely, more like barely) affect other processes.

Multi-Thread

We could also use multiple threads to do so.

In this, the multiple threads that runs the same process are called "Multi-Thread".
Each thread can have their own thread ID, program counter, register value, stack, etc.

Therefore each thread can have the next address to run, alongside the temporary value during the calculation process.

Difference between multi-process and multi-thread?

whether they share the process' resources or not.

Multi-Process: they don't.

each process is independent to each other --> if one process causes problem, other processes most likely won't get affected.

Multi-Thread: they do.

they share the same address of code, data, heap section. --> shares the same resource of the same process / opened file

advantage: can easily interact with each other.
disadvantage: if one thread causes a problem --> the whole process can get in trouble.

2-2. Communication between Processes

IPC (Inter-Process Communication)

Even though processes don't share the resources at default, there is still a way to share the resources and exchange (give and receive) data to each other. This is called IPC.

Two main types:

Shared Memory

Allocates memory space for public (in terms of processes, so multiple processes can access the same memory space)
Each process communicates like they read and write their own memory space.
In often times, kernels generally do not interfere.

the data that they exchange generally don't get through kernel space.

It's fast (because it's just like they read and write their own memory space), but the consistency may not be conserved.

It's called "Race condition". We'll get to that in Synchronization and Deadlock.

Message Passing

data exchange is done by in a form of message
Gets through the kernel
The system calls for sending and receiving message is clearly defined:

send()
recv()

Each process can call those system calls to exchange messages
Pretty much no need to consider the problems in terms of race condition and synchronization
But, it's much slower compared to shared memory (because it gets through the kernel)
There are various ways to send messages:

Pipe

one-way process communication
if you want both ways, you generally get two pipes
these are generally unnamed pipes (no two-way communication, only between parent and child)
named pipe (or FIFO) that was slightly improved from unnamed pipe, where it supports two-way communication and can be used in not only between parent and child but also an arbitrary process

Signal

Asynchronous signal that notifies process that a certain event has happened
it's not limited to IPC (it's more like you can use signal to implement IPC)
Type of Signals will be in the bottom of the table.
If a signal happens, process pauses the task and run "signal handler" to handle the signals and resume the task.

Process can directly cause signal and can (re)define part of the signal handlers.

Socket

Will be continued in Ch. 3. Network because it requires the background knowledge of network.

RPC (Remote Procedure Call)

runs remote code.
If running a specific code in a process is local procedure call, other process' remote code running is RPC.
Through RPC, we can minimize the performance decrease and still give and receive the message
Generally used in Huge-scale of traffic handling environment

especially in communication between servers.

Type of Signals:

3. Synchronization and Deadlock

3-1. Synchronization

Shared Resource

Resources that process or threads share

could be memory or a file, global variable or I/O device.

Among the shared resource, if it causes trouble when it was run simultaneously --> it's called "Critical Section".

If a simultaneously running process or thread enters critical section together and run it, it would cause a problem. --> This is called "Race Condition".
Resource consistency can be violated. So either of the process has to wait until the other one finishes.

Synchronization

Synchronization is required to prevent race condition and manage the critical section. It has two conditions while running:

Running order management: Making sure process and thread is running in correct order
Mutual Exclusion: Making sure only one process or thread accessing to critical section

Mutex Lock

A tool that making sure the mutual exclusion happens.

If a process or a thread wants to access to a critical section, they must acquire a lock
If a task is completed in a critical section, they must release the lock

Typical mutex lock is designed like this:

a common shared variable: lock
2 functions: acquire(), release()
acquire() is for acquiring the lock; only one call available per each specific lock
release() is for releasing the acquired lock
To access critical section, lock.acquire() is called;

if a process did this and another process wants to call lock.acquire(), they can't actually acquire the lock.

To exit critical section, lock.release() is called;

if there's any waiting process/thread, then they finally are able to successfully call lock.acquire()

Programming languages like Python, C/C++ supports mutex lock; users don't need to actually implement acquire() or release() function. Even though Java doesn't use the 'Mutex Lock' expression, they still support the lock feature.

Semaphore

Mutex Lock is a tool that considers only one shared resource.

For multiple shared resources, we can use semaphore. It's more generalized way for mutex lock.

One variable and two functions

Variable S: number of shared resources available (== number of processes that can access to critical section)
wait(): a function called before accessing critical section wait() { S--; // decreases S by 1 if (S < 0) { // Check if S is now negative (means that if there was any shared resources available) sleep(); // the process that called wait() turns into blocked state; can't access to critical section. } }
signal() a function called after accessing critical section signal() { S++; // increase S by 1 if (S <= 0) { // S being negative or 0 means that there's a blocked state process that's waiting to access to the critical section. If S is positive, then there's 1 or more shared resources available. wakeup(p) // Turns one blocked state process into ready state. } }

Monitor

Conditional Variable

synchronization tool that's used to manage the running order

you can call wait() and signal() for conditional variable

if a specific process' running condition is not met: wait() --> pause.
if a specific process' running condition is met: signal() --> resume.

So what is monitor?

a synchronization tool that consists of shared resources and its interface

can do both mutual exclusion and running order management
process and thread must go through the interface (shared resource function/calculation/implementation) and access into monitor, and the process/thread inside the monitor always has to be only one. Otherwise the process must be in queue. (blocked state)
calling wait() and signal()
Example: Java's keyword "synchronized" public synchronized void example(int val) { this.count += val; }

Thread Safety

In multi-thread environment, there's no problem when a certain variable, function, object has been accessed by multiple threads simultaneously --> thread safety.

race condition --> not thread safety.

For example, in Java, add method in vector class is thread-safety-guaranteed. (as the add method is implemented with synchronized keyword)
Meanwhile, add method in ArrayList class is NOT thread-safety-guaranteed. (as there's no synchronized keyword in add method implementation)

3-2. Deadlock

What's a deadlock state?

Process is stopped while waiting for nothing (event that won't happen)

If process A with allocation resource X, and waiting for the resource Y to be deallocated from process B; process B with allocation of resource Y waits for resource X to be deallocated from process A --> deadlock state

Four conditions of Deadlock state

Mutual Exclusion

The resource that one processes uses can't be used by other resources

Hold and wait

The resource has been allocated to the process and the process waits to get allocated another resource

non-preemption

Can use the resource if and only if the process that's already using the resource gets terminated

cycling wait

like the graph above, the waiting graph is cycle

How to resolve Deadlock state?

Deadlock state prevention

Making sure that any of the Four conditions of Deadlock state state is not met.
Let one process have all the resources allocated: then the next one; etc. (prevents hold and wait)
Set priority for all the allocatable resources and allocate it according to the priority (prevents the cycle)

Deadlock state evasion

considers deadlock state as a result of limited resource's injudicious allocation
therefore limits the allocation of the resources just until the deadlock state does not happen
if a limited resource needs to be allocated by many processes; there can be deadlock state.

Deadlock state detection & resolve

Basically like a follow-up fix (fixing the deadlock state after it happened)
OS allocates resources whenever process requires to do so; regularly checks the deadlock state.

If deadlock state is detected:

Resource pre-emption: forcefully take resources from other processes until the deadlock state is resolved.
terminating the deadlock state process.

4. CPU Scheduling

Priority

OS sets priority for each processes and specifies in PCB

With high priority process, CPU resources gets more allocated and faster.

User can manually set a process' priority higher.

In Unix-based OS (Unix, Linux, MacOS, etc.), you can check the command "ps" to check each process' priority.

Which process will get higher priority?

The goal is to maximize the CPU Utilization

CPU Utilization: Time for task per the entire CPU running time

Most process' use both CPU and I/O devices: it goes back and forth between blocked state and running state.

Task that a process using CPU: CPU burst

Processes that have more CPU burst: CPU bound process

complex math calculation, graphic task, etc.

Task that a process waits for I/O devices: I/O burst

Processes that have more I/O burst: I/O bound process

video playing, disk back-up, etc.

CPU bound process stays more on running state meanwhile I/O bound process stays more on blocked state

So to maximize CPU utilization, I/O bound process needs to run as soon as possible so more I/O request happens --> allocate more CPU resource on CPU bound process.

Scheduling Queue

OS requires processes (wants to use CPU, a certain I/O devices, be stored in memory, etc.) to be in queue to be allocated with the resources; This queue is implemented with Scheduling Queue.

Ready Queue: PCB's queue where processes want to use CPU

Waiting Queue: blocked state processes' PCB queue

I/O request, process waits in waiting queue in blocked state, waiting for I/O complete interruption
if it's interrupted, then it goes to the last position of ready queue and be ready to execute
Timer interrupt in CPU --> goes back to ready queue (and if it's requesting I/O, then it goes to waiting queue)
CPU runs processes in queue mostly in order, but based on the priorities.

Pre-emptive and Non-preemptive Scheduling

Scheduling is done when the process' execution finishes. However, there are two main scenarios where scheduling is done even when the process is not terminated.

A process requests I/O and hence goes to waiting queue (turned into blocked state)
Timer Interrupt

A scheduling type that occurs in both situations: Preemptive Scheduling

forcefully take away CPU resource from the process and allocate to another process

A scheduling type that must wait until a process that using CPU terminates: Non-preemptive Scheduling

Each scheduling type has their own pros and cons

Preemptive scheduling can prevent a process using CPU resource exclusively (monopoly)

but can have overhead while context switching

Non-preemptive scheduling has less context switching and hence less overhead

but must wait until the process that's using CPU terminates; there's no other way

4-1. CPU Scheduling Algorithm

1. FIFO / FCFS (First In First Out / First Come First Served)

Literally, the first process came --> gets its job done first

But waiting time can be really long

If the first one came and it's significantly longer than other process so the entire system is delayed: it's called Convoy Effect

2. SJF (Shortest Job First)

Shortest Job gets done first

Turnaround time decreases

but response time increases

3. RR (Round Robin)

FIFO + Time slicing

basically FIFO but has time limit for each process in one time, so after the time slice, context switch happens

Response time decreases

Turnaround time increases

4. SRT (Shortest Remaining Time)

SJF + RR

Each process uses CPU in a specific time slice, but the next one will be the shortest remaining time.

5. Priority Scheduling

Each process gets priority, and runs starting from the highest priority process.

However, even though the process has arrived first but priority is low --> can keep delaying

this is called starvation
to avoid starvation, we have something called aging

which is the longer the process wait, the higher priority they get.

6. Multi-Level Queue

advanced form of priority scheduling

there are multiple queues with different priority, and gets the process that's in the highest priority queue done

However, the downside is that the processes can't move around the queues; can have starvation

7. Multi-level Feedback Queue

It's similar to multi-level queue but processes can move between the queues; so the starvation is prevented.

The longer the process requires CPU, the lower the priority it gets.

CPU bound processes' priority gets lower, I/O bound processes' priority gets higher.

Aging can also be implemented

4-2. Linux CPU Scheduling

In Linux, various scheduling algorithms can be used.

this can be spotted from the scheduling policy in Linux

Scheduling Policy	in what situation?
SCHED_FIFO	Policy for real-time processes (provides higher priority)
SCHED_RR	Policy for real-time processes (provides higher priority)
SCHED_NORMAL	Policy for normal processes
SCHED_BATCH	Policy for batch tasks that are not-so-much preemptive than normal processes
SCHED_IDLE	Policy for very low priority processes (provides lower priority)

Generally speaking, FIFO, RR, NORMAL ones are commonly mentioned policies.

FIFO and RR scheduling is done by real-time scheduler (RT)
NORMAL is done by CFS

Completely Fair Scheduler; aims to schedule in a completely fair way
in Linux, each processes have info called Virtual Runtime (vruntime): CFS schedules from the smallest vruntime process.

vruntime is NOT an actual runtime, it's a virtual runtime (considering each process' weight)

CFS' time slice gets bigger when the process' weight is bigger.

Kernel should be able to distinguish which process has the smallest vruntime out of tremendous amount of processes; hence they use Red-Black Tree data structure.

5. Virtual Memory

How does CPU recognize all the processes' address in the memory and manage them?

5-1. Physical Address and Logical Address

CPU and processes do not use the Physical Address of memory (actual address of memory HW)

They rather use Logical Address

starting from 0x0 (0번지) of each processes

So, they can never have the same Physical address but they can have the same logical address.
Also, the conversion between physical and logical address is necessary in order to interact with memory HW (otherwise it'd be really hard to do so, wouldn't it?)
That's what MMU (Memory Management Unit) is for.

It's in between CPU and Memory
Converts the CPU's logical address (what CPU understands) into Memory's physical address (what memory understands)

Then how does the process get allocated to the memory using the address system?

5-2. Swapping and Contiguous Memory Allocation

Swapping

Among the processes that are stored in the memory, there could be some processes that are not currently running.

I/O requesting processes that are in blocked state
Processes that has not been used for a long time

These processes are moved into swap space (part of disk), and storing other processes in the empty space of the memory

it's called Swapping.

Moving from memory to swap space: Swap Out

Moving from swap space to memory: Swap In

When swap-out process gets swap-in back, this process can get stored in different address of memory.

Then how to store processes in memory efficiently when there's a multiple spaces available?

Contiguous Memory Allocation and External Fragmentation

Until now, we have assumed that the processes are stored contiguously in memory. This technique is called Contiguous memory allocation.

It looks like it's obvious to do so, but it's not the most efficient way to do so; since it contains the problem External Fragmentation.

There could be an empty space in the middle of the memory due to the fact that the empty space may not be sequential; even though there are 50MB available, it may be divided into 30MB and 20MB in the memory.
The empty space is created when the process is started and finished. It surely is an empty space but it's hard to store a new process; so it's basically a memory space waste. This is called External Fragmentation.

5-3. Virtual Memory Management by Paging

Another problem from contiguous memory allocation?

Another problem from contiguous memory allocation besides external fragmentation is that we can't run a process that's bigger than the actual size of memory.

a computer with 4GB RAM can't run a program that's more than 4GB.

To overcome this problem, we use virtual memory.

Virtual Memory

Stores only a part of the program in a memory, so that we can run a process that's bigger than our memory size.

Utilizing part of our disk as memory --> illusion to the process that the memory is bigger than its actual size.

There are two main virtual memory management techniques: Paging and Segmentation.

We're going to focus on paging, as it's more widely used.

Segmentation

Instead of dividing into a constant unit page, we divide the process with segment, whose value can be interchangeable.

Even though the division is not consistent, we can still prevent the external fragmentation due to its meaningful division of logical units.

One segment could be a code segment (or a part of it), or a data segment (or a part of it).

However, if we use segmentation, the segment's size is not consistent; meaning that external fragmentation can still happen.

Paging

Paging divides the processes' logical address space into an arbitrary constant unit called page, physical address space into frame, which is going to be the same size as page. Then, we allocate page to frame.

We need to pay attention to the fact that the pages could be placed non-contiguously in the physical memory.

Make sure to look at the book to help visualize it! (pg. 211)

By this, we can prevent the external fragmentation.

We can still use swapping in paging method.

In the systems that use paging, instead of the entire process being swap out/swap in, its swaps out/in per pages.

Swap out/swap in in Paging system is called Page out/Page in.

Let's say among the pages that consists the same process, some of them are paged in and some of them are paged out. This means some parts of the process is stored in memory meanwhile some of them is in disk.

Therefore, in order to run a process, not an entire process is required to be in the memory
From perspective of CPU, logical memory's size got bigger than an actual memory's size
Through paging, we can run processes that are bigger than memory's size.

we said that the pages could be placed non-contiguously in the physical memory. Then, a new problem arises: it's hard for CPU to find the next page to run.

it's hard for CPU to know which page that consists a certain process is in which frame.
To overcome this challenge, we use something called [[#Page Table]].

Page Table

In page table, there's a page number and the frame number that's actually stored.

By using page table, CPU can locate which pages are in which frame.

But not only those, page table also includes other crucial information.

The rows of page table are called Page Table Entry (PTE for short).

Although it may vary based on different OS, it might be...

Page number, frame number, Valid bit, Protection bit, Reference bit, Modified bit, etc.

Valid bit

shows whether the page is accessible or not

indicates if a current page is stored in memory or a disk

if a page is stored in memory --> 1, if not, then 0
CPU can't immediately access to a disk, it needs an extra step: page stored in a disk should be stored in (or moved to) a memory
If CPU tries to access to a page whose valid bit is 0, then it causes page fault exception.

Look at the column: Hard Faults/sec. This column represents the page fault per sec.

Then how does CPU process the page fault exception?

Backs up the current task state
Page Fault process routine activated: take desired page into memory so the valid bit becomes 1.
after the routine, the page that has been moved from disk to memory is run.

Types of page fault:

major page fault

I/O required; writing page info from disk to memory (the one that we've been talking)

minor page fault

I/O not required; the page that CPU asked is in the physical memory but not in the page table.
generally less effect towards the performance than major page fault.

Protection bit

For page protection

limits the read/write/execute by using combination of variables r, w, and x, which represents read, write, and execute respectively.

If protection bit is 100, then it means r=1, w=0, x=0 --> the page is read-only.

If protection bit is 111, then it means r, w, x = 1 --> the page can read, write, and execute.

Reference bit

shows if a CPU has been accessed to this page before

if a CPU has read or written on a certain page after stored in memory, reference bit becomes 1

After stored in memory, if CPU has not read nor written the page: reference bit is 0.

Modified bit

Shows if a CPU has written on the page. Also known as dirty bit.

if modified bit is 1, that means it has been written/modified at least once.

if a page needs to be deleted, the page's modification history needs to be reflected (included) in the disk; therefore it needs an extra step to write on a disk.

if modified bit is 0, that means it hasn't been modified, only been read or not been accessed by CPU)

no history to write, so if page is deleted from the memory, you can immediately do so.

Internal Fragmentation

Although paging can resolve external fragmentation, it can also cause internal fragmentation.

Paging is dividing process into consistent unit, but not all the processes are not divided by the exact size of page. Especially the last page can have the remainder.

For instance, let's say the page's size is 10KB and the entire process is 107KB. It will be divided into 11 different pages, with last page being 7KB. So 3KB is left for the last page; hence memory waste. This is called Internal fragmentation.

Page Table Base Register (PTBR)

Each process' page table may be stored in memory. Hence, we need to know the location of the memory that's storing the page table to be able to execute a process.

There's a special register that's exclusively for a specific process' page table's memory location.

Since each process has PTBR, it's written in each PCB. It's used in context switching.

However, if every processes' page tables were stored in memory, it's inefficient so we try to avoid for two reasons:

Higher memory access frequency

CPU needs to access page table, and access to real frame. For each case of page, CPU needs to access memory twice.
Therefore, if every page tables were stored in memory, the time taken for accessing memory would take double the amount.
To overcome this challenge, we use page table's cache memory called:

Translation Look-aside Buffer (TLB)
TLB is a page table's cache; based on the locality of reference, it stores some part of page table that is likely to be accessed often.

If page number of logical address that CPU's trying to access is in TLB, TLB lets CPU know the frame number that stores the page number.

This is called TLB hit.

If not, CPU needs to access the page table by itself, and this is called TLB miss.

requires additional memory access.

Hence, we need to increase TLB hit.

Takes up a lot of memory space

To overcome this, hierarchical paging was introduced; where it pages paging table.

also called multi-level page table

By paging the page table and having outer page table that points each page table, we can get a full access by just having outer page table in the memory.

Paging Address System

There are various addresses in a single page, so paging system's logical address follows this structure:

<Page number, offset>

Page number: which page number we're accessing?
Offset: how many offsets is the desired address apart from starting address of page (frame)?

Hence, the <Page number, offset> converts into <frame number, offset> by page table.

5-4. Page Replacement Algorithm

Demand Paging

We don't have to store all the pages that consists a process; we can only store the required pages --> this is called demand paging. It's got following aspects:

CPU accesses to a certain page by running a command
if valid bit is 1, CPU accesses directly to the stored frame
otherwise, page fault invoked.
If page fault was invoked, the page is stored in the memory by the page fault process routine and sets valid bit as 1
goes back to #1.

If no page is stored in memory and trying to run a process, it's possible; it's called pure demand paging.

This invokes page fault from the start
after storing some required pages in the memory, the page fault is less invoked.

What is a page replacement algorithm?

At some point, the memory will be full; then we need to swap out some of the pages. Then how do we select which pages to swap out?

This is called page replacement algorithm.

This is directly related with the entire computer's performance.
With good page replacement algorithm, less page fault invokes.
Bad page replacement algorithm invokes more page fault --> writing a page from disk to memory, which means lower performance.

There's also a keyword called "Thrashing", which means it spends more time on paging than process running time --> hence reduces performance.

There are mainly three page replacement algorithms:

First In First Out (FIFO) Page Replacement Algorithm

swapping out the first page to be stored in the memory

straightforward, but could swap out the important page that's been using. Then, soon it'll cause page fault.

Optimal Page Replacement Algorithm

Swapping out the pages that will be least frequently used

Guarantees the least page fault, but implementation is practically hard; you can't predict which page will be least frequently used.

Least Recently Used (LRU) Page Replacement Algorithm

Swapping out the pages that has been least frequently used

most commonly used page replacement algorithm's OG form, and there are many derived algorithms from this.

6. File System

What is a file system?

OS' internal program that stores and manages the disk's information in a form of directory (folder)

With this, we can deal with tremendous amount of information in the disk in a form of directory and folder.

In a single OS, multiple file systems could be used; if a file system gets different, the way to deal with disk's information also gets different.

6-1. File and Directory

File

Consists of...

file name, required info to execute file, additional information about the file
additional info is called "attribute" or "metadata". These types of info is "properties".

All the tasks dealing with files are done in OS. User application program can't allocate a file on its own and read/write/execute, it needs to use system call that deals with files.

If a process was allocated with 10 different files, the process should be able to distinguish each files. For this, process uses an information called file descriptor. (file handle in Windows)

0 or bigger integer format
used for low-level file recognition and tasks
open() returns the opened file's file descriptor
write() and close() gets the file descriptor as parameter
not only works for just files, it also works for I/O devices, pipe and socket (tools for IPC) also use file descriptor.

File allocation

OS reads and write in a unit called 'block'. Each block is usually 4096 bytes.

When a file is stored in a disk, they get more than one block allocated to them.

Some file systems can have next block's address in each block so every block is linked. This is called "Linked allocation".

In this case, you can look at directory entry and notice which file is located where.

directory entry: file name, first block address, and length (per block)

Some other file systems can have every block's address in a special blocked called "Index block", this is called "Indexed allocation".

If you know index block, you can access the desired file data.

6-2. File System

There are various file systems.

Each OS supports different file systems, and one computer can utilize various file systems even on a single OS.

Windows: NT file system (NTFS), Resilient file system (ReFS), etc.
Linux: EXT, EXT2, EXT3, EXT4, XFS, ZFS, etc.

For these file systems, file allocation is done based on the index block called [[#i-node (index-node)]]

MacOS: APFS (Apple File System), etc.

Partitioning

In a disk, various types of file systems can be used.

To apply multiple file systems in a single disk, you have to separate which file systems are we going to use in different areas of disk.

The separation of the disk's area is called "Partitioning", and one area separated by the partitioning is called "Partition".

A disk can be separated into multiple partitions; each can use different file systems.

Formatting

Among various file systems, decision of the file system is done when we format the disk.

Formatting is setting a file system

Decide how are we going to store and manage files
A task to get ready to write a new data

i-node (index-node)

In i-node based file system, each file has their own i-node, and there's a number assigned in each i-node.

you can actually look at the number in Linux or MacOS using ls command's i option.

For instance, let's consider EXT4 file system (Refer to the book for the visualization and clear understanding)

consists of groups of multiple blocks

first block is not storing an actual file data, but rather contains an info for the booting and partition management.

Each block group consists of...

Name	Description
Superblock	stores overall file system information such as number of i-node, total number of blocks, size of block, etc.
Block Group Descriptors	stores metadata about block groups
Block Bitmap	stores how the data was allocated in a current block group
i-node Bitmap	stores how i-node was allocated in a current block group
i-node table	stores each file's i-node information
Data Block	stores each file's data

As you can see, i-nodes are all in the specific area of partition.

in other file systems based on i-node, it also works like this

Hence, in an i-node based file system, OS can't make a new file if i-node's area is full and can't allocate i-nodes anymore: even if there's an empty space in data area.

Hard Link and Symbolic Link

Hard Link file: a file that shares i-node with the original file

shares the same file data

if hard link file's data changes, original file data changes as well
even if the original file is deleted or moved, you can still access to the file data

used when you want to reference the same file with different names\

Symbolic Link file: points at the original file

doesn't share the same file data, only stores the location of the original file

if original file gets moved or deleted, you can't use the file data.

used when you want to simply refer to it (e.g. shortcut to file that's in complex file path)

Mount

File system transfer (편입) from storage device to device.

e.g. USB to PC, etc.
if you put USB on the PC, you can access USB memory's file system through PC's file system.
connects through the path "/mnt"
can also mount based on the commands (mostly in Unix or Linux)

mount -t ext4 -o ro /dev/sda /mnt/test

meaning that...

mount the /dev/sda's device's EXT4 file system into /mnt/test directory with read-only.

7. Additional Notes

Booting

Storing Kernel in memory and starting the computer.

When a power is on, it reads the information from a non-volatile memory (ROM) to a volatile memory (RAM)

CPU reads specific address that has been set before

there's BIOS (Basic I/O System) in that address
BIOS checks HW and see if there's any problem in there. (POST, Power On Self Test)
If there's no problem, then it reads special information that needs for booting from disk's master boot record (MBR, located in a very first part of disk: first sector in case of hard disk)
In MBR, there's a special program called bootstrap (boot loader)

a program that finds a location of kernel and stores in memory

Recently, there's a trend that instead of using BIOS, using UEFI (Unified Extensible Firmware Interface)

can be booted in large environment
fast and secure booting
supports GUI

Virtual Machine and Container

Virtual Machine

A virtual computer that has been developed in a SW manner

A software that makes this virtual machine and runs: hypervisor

e.g. VirtualBox
can use another OS
separates resource allocation and provides virtualization / resource isolation

works like they're different computers
but can cause big overhead and hence slower performance

Container

Another virtualization / resource isolation technique

e.g. Docker, LXC

Virtual Machine provides HW level of virtualization

Container provides OS level of virtualization

uses the same kernel, only isolates for the specific process needed
way lighter, and doesn't depend on the platform

Tremendous amount of containers are created (over 4 billion per week)

managing this large amount of containers: container orchestration

e.g. Kubernetes

CS-2 Study: Operating System

1. The "Big Picture" of OS

1-1. Role / Feature of OS

Operating System (OS in abbreviation) is a special type of program. Widely known ones are Windows, MacOS, and Linux (which are for Desktop), Android and iOS (which are for mobile smartphones).

Although detailed features may vary, the "key features" as the OS remain the same for all of them.

The part that runs the "key feature of OS" is called "Kernel".

It's got two main features: resource allocation and process/thread management.

CPU Scheduling - CPU's fair allocation and order of using CPU.

Basic Concepts: Priority, Scheduling Queue, Pre-emption/Non-preemption
CPU Scheduling Algorithm
Linux CPU Scheduling

Memory Management: Virtual Memory - Efficient memory management.

Physical memory, Logical memory
Memory allocation
Paging, Page Replacement Algorithm

File/Directory Management: File System - utilizing disks, etc.

File/Directory
File System

Process and Thread Management: how to allocate resources for various processes and threads.

Process and Thread
Synchronization and Deadlock state

So, in Brief, the map sort of looks like this:

Big Picture of OS

Kernel
System call

Process / Thread Management

Process and Thread
Synchronization and Deadlock

Resource Allocation / Management

CPU Scheduling

Basic Concepts: Priority, Scheduling Queue, Pre-emption/Non-preemption
CPU Scheduling Algorithm
Linux CPU Scheduling

Memory Management: Virtual Memory

Physical memory, Logical memory
Memory allocation
Paging, Page Replacement Algorithm

File/Directory Management: File System

File/Directory
File System

1-2. System Call, Dual Mode

OS is also a kind of a program! So it must be stored in memory.

OS is stored in "kernel space" in memory. it's separated from "user space", where the user application programs are stored.

Now, the most important part is that: "If we want to utilize the features of OS, we must run the OS code that's stored in kernel space".

There are various number of system calls. There are:

A process can make another process by system call, fork(). This child process can make their own child process as well.

Hence, the processes are managed in hierarchical way.

The one who made the process is "Parent Process", and the made process is "Child Process".

If a system call is called, following steps are done:

In OS, there's a specific command that invokes the SW interrupt.\

I/O commands (accesses to resource) are one of them; and we call the interrupt caused by those as "software interrupt".

System call is kind of a system call. So:

When System call is invoked, CPU backs up the current program state

the code for SW interrupt (which is in kernel space) is invoked

Then continues the user space's code running.

So CPU separates the mode between when it's running the user application program and kernel space's program.

When CPU is running user space's program: user mode

can't get OS features; doesn't run I/O request and such.

When CPU is running kernel space's program: kernel mode

can get OS features; can run every code including accessing to the resources and such.
You can see which mode is CPU in by looking at the supervisor flag in flag register.

System call is called very frequently! Even when we're running a simple program that prints a simple string "Hello world", system call is called over 600 times.

2. Process and Thread

Process

Types of Process:

Foreground process: interacts with user

Background process: can't be seen by user

Daemon (service in Windows OS): can't bee seen, nor user interaction: just does its job.

Even though there are various types of processes, the information in the memory that consists one process does not vary that much.

In Kernel space, PCB (Process Control Block) is stored, and there's code, data, stack, and heap segment in user space (as we know from Computer Architecture class)

User space segment

Code segment (a.k.a. text segment) << Static allocation segment

Instructions are stored

read-only

Data segment << Static allocation segment

Data that doesn't change when the program runs

e.g. static variables, global variables

Sometimes we separate what so called "BSS segment"

if there's an initial value (like static or global variable) --> Data segment
if there's no such an initial value --> BSS segment

Heap segment << Dynamic allocation segment

Dynamically allocated values are stored (e.g. malloc() in C)

Need to deallocate after using

otherwise, memory leak can happen

In some programming language, they provide garbage collection feature where it automatically deallocates heap memory that is not used.

Stack segment << Static allocation segment

Temporary values are stored

local variable, return address, parameter, etc.

function calling information can be stored as stack trace format

Stack Trace is...

function call info that was stored in stack area in specific time
useful for debugging

PCB and Context Switching

PCB

In order for OS to manage a bunch of processes in the memory, the processes need information in the kernel space to be able to be distinguished by the OS.

This information is called Process Control Block (PCB in abbreviation).

PCB is a struct that includes various information that is related with process.

struct is just like struct in C; it's a complex data type that combines various data types into one.

When a new process is stored in a memory (process was made), PCB is made; destroyed when the process is completed.

PCB contains...

Process ID (PID)
Register values that was used
Process state
CPU scheduling (priority information)
memory information
I/O information

PCBs are usually managed by process table (a bunch of PCBs)

When there's a new process, a new PCB is added to the process table; allocates required resources.
When a process is terminated, the resources are deallocated and the PCB gets deleted from process table.
when the process is not properly terminated so the PCB still remains in the process table, we call this "zombie process".

Context Switching

Process is running == CPU resource allocated by OS

Various processes take turns to run for a limited time == CPU resource allocated taking turn for each processes for a limited time

A process' CPU usage time is limited by timer interrupt (or so called timeout interrupt)

To do the timer interrupt and run the other process, the process running state must be stored somewhere (i.e. program counter, register value, memory information, external files, I/O, etc.) and its information (context) stored in PCB. Then, the following process' context is loaded.

This is called context switching.

Too frequent context switching can make cache miss chance higher --> getting the process info from PCB (memory) is more frequent --> hence bigger overhead.

There are various states for process:

New state

currently making a new process
stored in memory, allocated PCB
waits for CPU allocation

Ready state

can immediately run, but waits for the CPU allocation turn
turning ready state to running state is called: dispatch

Running state

it's currently running with CPU allocation
can use CPU for a limited time
if timer interrupt happens, it reverts to ready state
if I/O was requested, it goes to blocked state.

Blocked state

Can't be immediately run

I/O request, asking for a resource that's not immediately ready (disk, etc.)
after the request is done, it goes to ready state.

Terminated state

Process has been terminated.
OS deletes PCB from process table and deallocates the memory

2-1. Multi-Process && Multi-Thread

Multi-Process

What if I want to run codes that consists a single process simultaneously?

same program, different processes --> run simultaneously

Various processes running simultaneously

Like this, we can see multiple processes running simultaneously.
They do not share the resources, they run individually. Even though they run the same task, their PID values are different; resources are individually allocated --> does NOT (not entirely, more like barely) affect other processes.

Multi-Thread

We could also use multiple threads to do so.

In this, the multiple threads that runs the same process are called "Multi-Thread".
Each thread can have their own thread ID, program counter, register value, stack, etc.

Therefore each thread can have the next address to run, alongside the temporary value during the calculation process.

Difference between multi-process and multi-thread?

whether they share the process' resources or not.

Multi-Process: they don't.

each process is independent to each other --> if one process causes problem, other processes most likely won't get affected.

Multi-Thread: they do.

they share the same address of code, data, heap section. --> shares the same resource of the same process / opened file

advantage: can easily interact with each other.
disadvantage: if one thread causes a problem --> the whole process can get in trouble.

2-2. Communication between Processes

IPC (Inter-Process Communication)

Even though processes don't share the resources at default, there is still a way to share the resources and exchange (give and receive) data to each other. This is called IPC.

Two main types:

Shared Memory

Allocates memory space for public (in terms of processes, so multiple processes can access the same memory space)
Each process communicates like they read and write their own memory space.
In often times, kernels generally do not interfere.

the data that they exchange generally don't get through kernel space.

It's fast (because it's just like they read and write their own memory space), but the consistency may not be conserved.

It's called "Race condition". We'll get to that in Synchronization and Deadlock.

Message Passing

data exchange is done by in a form of message
Gets through the kernel
The system calls for sending and receiving message is clearly defined:

send()
recv()

Each process can call those system calls to exchange messages
Pretty much no need to consider the problems in terms of race condition and synchronization
But, it's much slower compared to shared memory (because it gets through the kernel)
There are various ways to send messages:

Pipe

one-way process communication
if you want both ways, you generally get two pipes
these are generally unnamed pipes (no two-way communication, only between parent and child)
named pipe (or FIFO) that was slightly improved from unnamed pipe, where it supports two-way communication and can be used in not only between parent and child but also an arbitrary process

Signal

Asynchronous signal that notifies process that a certain event has happened
it's not limited to IPC (it's more like you can use signal to implement IPC)
Type of Signals will be in the bottom of the table.
If a signal happens, process pauses the task and run "signal handler" to handle the signals and resume the task.

Process can directly cause signal and can (re)define part of the signal handlers.

Socket

Will be continued in Ch. 3. Network because it requires the background knowledge of network.

RPC (Remote Procedure Call)

runs remote code.
If running a specific code in a process is local procedure call, other process' remote code running is RPC.
Through RPC, we can minimize the performance decrease and still give and receive the message
Generally used in Huge-scale of traffic handling environment

especially in communication between servers.

Type of Signals:

3. Synchronization and Deadlock

3-1. Synchronization

Shared Resource

Resources that process or threads share

could be memory or a file, global variable or I/O device.

Among the shared resource, if it causes trouble when it was run simultaneously --> it's called "Critical Section".

If a simultaneously running process or thread enters critical section together and run it, it would cause a problem. --> This is called "Race Condition".
Resource consistency can be violated. So either of the process has to wait until the other one finishes.

Synchronization

Synchronization is required to prevent race condition and manage the critical section. It has two conditions while running:

Running order management: Making sure process and thread is running in correct order
Mutual Exclusion: Making sure only one process or thread accessing to critical section

Mutex Lock

A tool that making sure the mutual exclusion happens.

If a process or a thread wants to access to a critical section, they must acquire a lock
If a task is completed in a critical section, they must release the lock

Typical mutex lock is designed like this:

a common shared variable: lock
2 functions: acquire(), release()
acquire() is for acquiring the lock; only one call available per each specific lock
release() is for releasing the acquired lock
To access critical section, lock.acquire() is called;

if a process did this and another process wants to call lock.acquire(), they can't actually acquire the lock.

To exit critical section, lock.release() is called;

if there's any waiting process/thread, then they finally are able to successfully call lock.acquire()

Programming languages like Python, C/C++ supports mutex lock; users don't need to actually implement acquire() or release() function. Even though Java doesn't use the 'Mutex Lock' expression, they still support the lock feature.

Semaphore

Mutex Lock is a tool that considers only one shared resource.

For multiple shared resources, we can use semaphore. It's more generalized way for mutex lock.

One variable and two functions

Variable S: number of shared resources available (== number of processes that can access to critical section)
wait(): a function called before accessing critical section wait() { S--; // decreases S by 1 if (S < 0) { // Check if S is now negative (means that if there was any shared resources available) sleep(); // the process that called wait() turns into blocked state; can't access to critical section. } }
signal() a function called after accessing critical section signal() { S++; // increase S by 1 if (S <= 0) { // S being negative or 0 means that there's a blocked state process that's waiting to access to the critical section. If S is positive, then there's 1 or more shared resources available. wakeup(p) // Turns one blocked state process into ready state. } }

Monitor

Conditional Variable

synchronization tool that's used to manage the running order

you can call wait() and signal() for conditional variable

if a specific process' running condition is not met: wait() --> pause.
if a specific process' running condition is met: signal() --> resume.

So what is monitor?

a synchronization tool that consists of shared resources and its interface

can do both mutual exclusion and running order management
process and thread must go through the interface (shared resource function/calculation/implementation) and access into monitor, and the process/thread inside the monitor always has to be only one. Otherwise the process must be in queue. (blocked state)
calling wait() and signal()
Example: Java's keyword "synchronized" public synchronized void example(int val) { this.count += val; }

Thread Safety

In multi-thread environment, there's no problem when a certain variable, function, object has been accessed by multiple threads simultaneously --> thread safety.

race condition --> not thread safety.

For example, in Java, add method in vector class is thread-safety-guaranteed. (as the add method is implemented with synchronized keyword)
Meanwhile, add method in ArrayList class is NOT thread-safety-guaranteed. (as there's no synchronized keyword in add method implementation)

3-2. Deadlock

What's a deadlock state?

Process is stopped while waiting for nothing (event that won't happen)

If process A with allocation resource X, and waiting for the resource Y to be deallocated from process B; process B with allocation of resource Y waits for resource X to be deallocated from process A --> deadlock state

Four conditions of Deadlock state

Mutual Exclusion

The resource that one processes uses can't be used by other resources

Hold and wait

The resource has been allocated to the process and the process waits to get allocated another resource

non-preemption

Can use the resource if and only if the process that's already using the resource gets terminated

cycling wait

like the graph above, the waiting graph is cycle

How to resolve Deadlock state?

Deadlock state prevention

Making sure that any of the Four conditions of Deadlock state state is not met.
Let one process have all the resources allocated: then the next one; etc. (prevents hold and wait)
Set priority for all the allocatable resources and allocate it according to the priority (prevents the cycle)

Deadlock state evasion

considers deadlock state as a result of limited resource's injudicious allocation
therefore limits the allocation of the resources just until the deadlock state does not happen
if a limited resource needs to be allocated by many processes; there can be deadlock state.

Deadlock state detection & resolve

Basically like a follow-up fix (fixing the deadlock state after it happened)
OS allocates resources whenever process requires to do so; regularly checks the deadlock state.

If deadlock state is detected:

Resource pre-emption: forcefully take resources from other processes until the deadlock state is resolved.
terminating the deadlock state process.

4. CPU Scheduling

Priority

OS sets priority for each processes and specifies in PCB

With high priority process, CPU resources gets more allocated and faster.

User can manually set a process' priority higher.

In Unix-based OS (Unix, Linux, MacOS, etc.), you can check the command "ps" to check each process' priority.

Which process will get higher priority?

The goal is to maximize the CPU Utilization

CPU Utilization: Time for task per the entire CPU running time

Most process' use both CPU and I/O devices: it goes back and forth between blocked state and running state.

Task that a process using CPU: CPU burst

Processes that have more CPU burst: CPU bound process

complex math calculation, graphic task, etc.

Task that a process waits for I/O devices: I/O burst

Processes that have more I/O burst: I/O bound process

video playing, disk back-up, etc.

CPU bound process stays more on running state meanwhile I/O bound process stays more on blocked state

So to maximize CPU utilization, I/O bound process needs to run as soon as possible so more I/O request happens --> allocate more CPU resource on CPU bound process.

Scheduling Queue

OS requires processes (wants to use CPU, a certain I/O devices, be stored in memory, etc.) to be in queue to be allocated with the resources; This queue is implemented with Scheduling Queue.

Ready Queue: PCB's queue where processes want to use CPU

Waiting Queue: blocked state processes' PCB queue

I/O request, process waits in waiting queue in blocked state, waiting for I/O complete interruption
if it's interrupted, then it goes to the last position of ready queue and be ready to execute
Timer interrupt in CPU --> goes back to ready queue (and if it's requesting I/O, then it goes to waiting queue)
CPU runs processes in queue mostly in order, but based on the priorities.

Pre-emptive and Non-preemptive Scheduling

Scheduling is done when the process' execution finishes. However, there are two main scenarios where scheduling is done even when the process is not terminated.

A process requests I/O and hence goes to waiting queue (turned into blocked state)
Timer Interrupt

A scheduling type that occurs in both situations: Preemptive Scheduling

forcefully take away CPU resource from the process and allocate to another process

A scheduling type that must wait until a process that using CPU terminates: Non-preemptive Scheduling

Each scheduling type has their own pros and cons

Preemptive scheduling can prevent a process using CPU resource exclusively (monopoly)

but can have overhead while context switching

Non-preemptive scheduling has less context switching and hence less overhead

but must wait until the process that's using CPU terminates; there's no other way

4-1. CPU Scheduling Algorithm

1. FIFO / FCFS (First In First Out / First Come First Served)

Literally, the first process came --> gets its job done first

But waiting time can be really long

If the first one came and it's significantly longer than other process so the entire system is delayed: it's called Convoy Effect

2. SJF (Shortest Job First)

Shortest Job gets done first

Turnaround time decreases

but response time increases

3. RR (Round Robin)

FIFO + Time slicing

basically FIFO but has time limit for each process in one time, so after the time slice, context switch happens

Response time decreases

Turnaround time increases

4. SRT (Shortest Remaining Time)

SJF + RR

Each process uses CPU in a specific time slice, but the next one will be the shortest remaining time.

5. Priority Scheduling

Each process gets priority, and runs starting from the highest priority process.

However, even though the process has arrived first but priority is low --> can keep delaying

this is called starvation
to avoid starvation, we have something called aging

which is the longer the process wait, the higher priority they get.

6. Multi-Level Queue

advanced form of priority scheduling

there are multiple queues with different priority, and gets the process that's in the highest priority queue done

However, the downside is that the processes can't move around the queues; can have starvation

7. Multi-level Feedback Queue

It's similar to multi-level queue but processes can move between the queues; so the starvation is prevented.

The longer the process requires CPU, the lower the priority it gets.

CPU bound processes' priority gets lower, I/O bound processes' priority gets higher.

Aging can also be implemented

4-2. Linux CPU Scheduling

In Linux, various scheduling algorithms can be used.

this can be spotted from the scheduling policy in Linux

Scheduling Policy	in what situation?
SCHED_FIFO	Policy for real-time processes (provides higher priority)
SCHED_RR	Policy for real-time processes (provides higher priority)
SCHED_NORMAL	Policy for normal processes
SCHED_BATCH	Policy for batch tasks that are not-so-much preemptive than normal processes
SCHED_IDLE	Policy for very low priority processes (provides lower priority)

Generally speaking, FIFO, RR, NORMAL ones are commonly mentioned policies.

FIFO and RR scheduling is done by real-time scheduler (RT)
NORMAL is done by CFS

Completely Fair Scheduler; aims to schedule in a completely fair way
in Linux, each processes have info called Virtual Runtime (vruntime): CFS schedules from the smallest vruntime process.

vruntime is NOT an actual runtime, it's a virtual runtime (considering each process' weight)

CFS' time slice gets bigger when the process' weight is bigger.

Kernel should be able to distinguish which process has the smallest vruntime out of tremendous amount of processes; hence they use Red-Black Tree data structure.

5. Virtual Memory

How does CPU recognize all the processes' address in the memory and manage them?

5-1. Physical Address and Logical Address

CPU and processes do not use the Physical Address of memory (actual address of memory HW)

They rather use Logical Address

starting from 0x0 (0번지) of each processes

So, they can never have the same Physical address but they can have the same logical address.
Also, the conversion between physical and logical address is necessary in order to interact with memory HW (otherwise it'd be really hard to do so, wouldn't it?)
That's what MMU (Memory Management Unit) is for.

It's in between CPU and Memory
Converts the CPU's logical address (what CPU understands) into Memory's physical address (what memory understands)

Then how does the process get allocated to the memory using the address system?

5-2. Swapping and Contiguous Memory Allocation

Swapping

Among the processes that are stored in the memory, there could be some processes that are not currently running.

I/O requesting processes that are in blocked state
Processes that has not been used for a long time

These processes are moved into swap space (part of disk), and storing other processes in the empty space of the memory

it's called Swapping.

Moving from memory to swap space: Swap Out

Moving from swap space to memory: Swap In

When swap-out process gets swap-in back, this process can get stored in different address of memory.

Then how to store processes in memory efficiently when there's a multiple spaces available?

Contiguous Memory Allocation and External Fragmentation

Until now, we have assumed that the processes are stored contiguously in memory. This technique is called Contiguous memory allocation.

It looks like it's obvious to do so, but it's not the most efficient way to do so; since it contains the problem External Fragmentation.

There could be an empty space in the middle of the memory due to the fact that the empty space may not be sequential; even though there are 50MB available, it may be divided into 30MB and 20MB in the memory.
The empty space is created when the process is started and finished. It surely is an empty space but it's hard to store a new process; so it's basically a memory space waste. This is called External Fragmentation.

5-3. Virtual Memory Management by Paging

Another problem from contiguous memory allocation?

Another problem from contiguous memory allocation besides external fragmentation is that we can't run a process that's bigger than the actual size of memory.

a computer with 4GB RAM can't run a program that's more than 4GB.

To overcome this problem, we use virtual memory.

Virtual Memory

Stores only a part of the program in a memory, so that we can run a process that's bigger than our memory size.

Utilizing part of our disk as memory --> illusion to the process that the memory is bigger than its actual size.

There are two main virtual memory management techniques: Paging and Segmentation.

We're going to focus on paging, as it's more widely used.

Segmentation

Instead of dividing into a constant unit page, we divide the process with segment, whose value can be interchangeable.

Even though the division is not consistent, we can still prevent the external fragmentation due to its meaningful division of logical units.

One segment could be a code segment (or a part of it), or a data segment (or a part of it).

However, if we use segmentation, the segment's size is not consistent; meaning that external fragmentation can still happen.

Paging

Paging divides the processes' logical address space into an arbitrary constant unit called page, physical address space into frame, which is going to be the same size as page. Then, we allocate page to frame.

We need to pay attention to the fact that the pages could be placed non-contiguously in the physical memory.

Make sure to look at the book to help visualize it! (pg. 211)

By this, we can prevent the external fragmentation.

We can still use swapping in paging method.

In the systems that use paging, instead of the entire process being swap out/swap in, its swaps out/in per pages.

Swap out/swap in in Paging system is called Page out/Page in.

Let's say among the pages that consists the same process, some of them are paged in and some of them are paged out. This means some parts of the process is stored in memory meanwhile some of them is in disk.

Therefore, in order to run a process, not an entire process is required to be in the memory
From perspective of CPU, logical memory's size got bigger than an actual memory's size
Through paging, we can run processes that are bigger than memory's size.

we said that the pages could be placed non-contiguously in the physical memory. Then, a new problem arises: it's hard for CPU to find the next page to run.

it's hard for CPU to know which page that consists a certain process is in which frame.
To overcome this challenge, we use something called [[#Page Table]].

Page Table

In page table, there's a page number and the frame number that's actually stored.

By using page table, CPU can locate which pages are in which frame.

But not only those, page table also includes other crucial information.

The rows of page table are called Page Table Entry (PTE for short).

Although it may vary based on different OS, it might be...

Page number, frame number, Valid bit, Protection bit, Reference bit, Modified bit, etc.

Valid bit

shows whether the page is accessible or not

indicates if a current page is stored in memory or a disk

if a page is stored in memory --> 1, if not, then 0
CPU can't immediately access to a disk, it needs an extra step: page stored in a disk should be stored in (or moved to) a memory
If CPU tries to access to a page whose valid bit is 0, then it causes page fault exception.

Look at the column: Hard Faults/sec. This column represents the page fault per sec.

Then how does CPU process the page fault exception?

Backs up the current task state
Page Fault process routine activated: take desired page into memory so the valid bit becomes 1.
after the routine, the page that has been moved from disk to memory is run.

Types of page fault:

major page fault

I/O required; writing page info from disk to memory (the one that we've been talking)

minor page fault

I/O not required; the page that CPU asked is in the physical memory but not in the page table.
generally less effect towards the performance than major page fault.

Protection bit

For page protection

limits the read/write/execute by using combination of variables r, w, and x, which represents read, write, and execute respectively.

If protection bit is 100, then it means r=1, w=0, x=0 --> the page is read-only.

If protection bit is 111, then it means r, w, x = 1 --> the page can read, write, and execute.

Reference bit

shows if a CPU has been accessed to this page before

if a CPU has read or written on a certain page after stored in memory, reference bit becomes 1

After stored in memory, if CPU has not read nor written the page: reference bit is 0.

Modified bit

Shows if a CPU has written on the page. Also known as dirty bit.

if modified bit is 1, that means it has been written/modified at least once.

if a page needs to be deleted, the page's modification history needs to be reflected (included) in the disk; therefore it needs an extra step to write on a disk.

if modified bit is 0, that means it hasn't been modified, only been read or not been accessed by CPU)

no history to write, so if page is deleted from the memory, you can immediately do so.

Internal Fragmentation

Although paging can resolve external fragmentation, it can also cause internal fragmentation.

Paging is dividing process into consistent unit, but not all the processes are not divided by the exact size of page. Especially the last page can have the remainder.

For instance, let's say the page's size is 10KB and the entire process is 107KB. It will be divided into 11 different pages, with last page being 7KB. So 3KB is left for the last page; hence memory waste. This is called Internal fragmentation.

Page Table Base Register (PTBR)

Each process' page table may be stored in memory. Hence, we need to know the location of the memory that's storing the page table to be able to execute a process.

There's a special register that's exclusively for a specific process' page table's memory location.

Since each process has PTBR, it's written in each PCB. It's used in context switching.

However, if every processes' page tables were stored in memory, it's inefficient so we try to avoid for two reasons:

Higher memory access frequency

CPU needs to access page table, and access to real frame. For each case of page, CPU needs to access memory twice.
Therefore, if every page tables were stored in memory, the time taken for accessing memory would take double the amount.
To overcome this challenge, we use page table's cache memory called:

Translation Look-aside Buffer (TLB)
TLB is a page table's cache; based on the locality of reference, it stores some part of page table that is likely to be accessed often.

If page number of logical address that CPU's trying to access is in TLB, TLB lets CPU know the frame number that stores the page number.

This is called TLB hit.

If not, CPU needs to access the page table by itself, and this is called TLB miss.

requires additional memory access.

Hence, we need to increase TLB hit.

Takes up a lot of memory space

To overcome this, hierarchical paging was introduced; where it pages paging table.

also called multi-level page table

By paging the page table and having outer page table that points each page table, we can get a full access by just having outer page table in the memory.

Paging Address System

There are various addresses in a single page, so paging system's logical address follows this structure:

<Page number, offset>

Page number: which page number we're accessing?
Offset: how many offsets is the desired address apart from starting address of page (frame)?

Hence, the <Page number, offset> converts into <frame number, offset> by page table.

5-4. Page Replacement Algorithm

Demand Paging

We don't have to store all the pages that consists a process; we can only store the required pages --> this is called demand paging. It's got following aspects:

CPU accesses to a certain page by running a command
if valid bit is 1, CPU accesses directly to the stored frame
otherwise, page fault invoked.
If page fault was invoked, the page is stored in the memory by the page fault process routine and sets valid bit as 1
goes back to #1.

If no page is stored in memory and trying to run a process, it's possible; it's called pure demand paging.

This invokes page fault from the start
after storing some required pages in the memory, the page fault is less invoked.

What is a page replacement algorithm?

At some point, the memory will be full; then we need to swap out some of the pages. Then how do we select which pages to swap out?

This is called page replacement algorithm.

This is directly related with the entire computer's performance.
With good page replacement algorithm, less page fault invokes.
Bad page replacement algorithm invokes more page fault --> writing a page from disk to memory, which means lower performance.

There's also a keyword called "Thrashing", which means it spends more time on paging than process running time --> hence reduces performance.

There are mainly three page replacement algorithms:

First In First Out (FIFO) Page Replacement Algorithm

swapping out the first page to be stored in the memory

straightforward, but could swap out the important page that's been using. Then, soon it'll cause page fault.

Optimal Page Replacement Algorithm

Swapping out the pages that will be least frequently used

Guarantees the least page fault, but implementation is practically hard; you can't predict which page will be least frequently used.

Least Recently Used (LRU) Page Replacement Algorithm

Swapping out the pages that has been least frequently used

most commonly used page replacement algorithm's OG form, and there are many derived algorithms from this.

6. File System

What is a file system?

OS' internal program that stores and manages the disk's information in a form of directory (folder)

With this, we can deal with tremendous amount of information in the disk in a form of directory and folder.

In a single OS, multiple file systems could be used; if a file system gets different, the way to deal with disk's information also gets different.

6-1. File and Directory

File

Consists of...

file name, required info to execute file, additional information about the file
additional info is called "attribute" or "metadata". These types of info is "properties".

All the tasks dealing with files are done in OS. User application program can't allocate a file on its own and read/write/execute, it needs to use system call that deals with files.

If a process was allocated with 10 different files, the process should be able to distinguish each files. For this, process uses an information called file descriptor. (file handle in Windows)

0 or bigger integer format
used for low-level file recognition and tasks
open() returns the opened file's file descriptor
write() and close() gets the file descriptor as parameter
not only works for just files, it also works for I/O devices, pipe and socket (tools for IPC) also use file descriptor.

File allocation

OS reads and write in a unit called 'block'. Each block is usually 4096 bytes.

When a file is stored in a disk, they get more than one block allocated to them.

Some file systems can have next block's address in each block so every block is linked. This is called "Linked allocation".

In this case, you can look at directory entry and notice which file is located where.

directory entry: file name, first block address, and length (per block)

Some other file systems can have every block's address in a special blocked called "Index block", this is called "Indexed allocation".

If you know index block, you can access the desired file data.

6-2. File System

There are various file systems.

Each OS supports different file systems, and one computer can utilize various file systems even on a single OS.

Windows: NT file system (NTFS), Resilient file system (ReFS), etc.
Linux: EXT, EXT2, EXT3, EXT4, XFS, ZFS, etc.

For these file systems, file allocation is done based on the index block called [[#i-node (index-node)]]

MacOS: APFS (Apple File System), etc.

Partitioning

In a disk, various types of file systems can be used.

To apply multiple file systems in a single disk, you have to separate which file systems are we going to use in different areas of disk.

The separation of the disk's area is called "Partitioning", and one area separated by the partitioning is called "Partition".

A disk can be separated into multiple partitions; each can use different file systems.

Formatting

Among various file systems, decision of the file system is done when we format the disk.

Formatting is setting a file system

Decide how are we going to store and manage files
A task to get ready to write a new data

i-node (index-node)

In i-node based file system, each file has their own i-node, and there's a number assigned in each i-node.

you can actually look at the number in Linux or MacOS using ls command's i option.

For instance, let's consider EXT4 file system (Refer to the book for the visualization and clear understanding)

consists of groups of multiple blocks

first block is not storing an actual file data, but rather contains an info for the booting and partition management.

Each block group consists of...

Name	Description
Superblock	stores overall file system information such as number of i-node, total number of blocks, size of block, etc.
Block Group Descriptors	stores metadata about block groups
Block Bitmap	stores how the data was allocated in a current block group
i-node Bitmap	stores how i-node was allocated in a current block group
i-node table	stores each file's i-node information
Data Block	stores each file's data

As you can see, i-nodes are all in the specific area of partition.

in other file systems based on i-node, it also works like this

Hence, in an i-node based file system, OS can't make a new file if i-node's area is full and can't allocate i-nodes anymore: even if there's an empty space in data area.

Hard Link and Symbolic Link

Hard Link file: a file that shares i-node with the original file

shares the same file data

if hard link file's data changes, original file data changes as well
even if the original file is deleted or moved, you can still access to the file data

used when you want to reference the same file with different names\

Symbolic Link file: points at the original file

doesn't share the same file data, only stores the location of the original file

if original file gets moved or deleted, you can't use the file data.

used when you want to simply refer to it (e.g. shortcut to file that's in complex file path)

Mount

File system transfer (편입) from storage device to device.

e.g. USB to PC, etc.
if you put USB on the PC, you can access USB memory's file system through PC's file system.
connects through the path "/mnt"
can also mount based on the commands (mostly in Unix or Linux)

mount -t ext4 -o ro /dev/sda /mnt/test

meaning that...

mount the /dev/sda's device's EXT4 file system into /mnt/test directory with read-only.

7. Additional Notes

Booting

Storing Kernel in memory and starting the computer.

When a power is on, it reads the information from a non-volatile memory (ROM) to a volatile memory (RAM)

CPU reads specific address that has been set before

there's BIOS (Basic I/O System) in that address
BIOS checks HW and see if there's any problem in there. (POST, Power On Self Test)
If there's no problem, then it reads special information that needs for booting from disk's master boot record (MBR, located in a very first part of disk: first sector in case of hard disk)
In MBR, there's a special program called bootstrap (boot loader)

a program that finds a location of kernel and stores in memory

Recently, there's a trend that instead of using BIOS, using UEFI (Unified Extensible Firmware Interface)

can be booted in large environment
fast and secure booting
supports GUI

Virtual Machine and Container

Virtual Machine

A virtual computer that has been developed in a SW manner

A software that makes this virtual machine and runs: hypervisor

e.g. VirtualBox
can use another OS
separates resource allocation and provides virtualization / resource isolation

works like they're different computers
but can cause big overhead and hence slower performance

Container

Another virtualization / resource isolation technique

e.g. Docker, LXC

Virtual Machine provides HW level of virtualization

Container provides OS level of virtualization

uses the same kernel, only isolates for the specific process needed
way lighter, and doesn't depend on the platform

Tremendous amount of containers are created (over 4 billion per week)

managing this large amount of containers: container orchestration

e.g. Kubernetes

Discover us

Blog

Development

CS-2 Study: Operating System

CS-2 Study: Operating System

1. The "Big Picture" of OS

1-2. System Call, Dual Mode

2. Process and Thread

Process

PCB and Context Switching

PCB

Context Switching

2-1. Multi-Process && Multi-Thread

Multi-Process

Multi-Thread

Difference between multi-process and multi-thread?

2-2. Communication between Processes

IPC (Inter-Process Communication)

3. Synchronization and Deadlock

3-1. Synchronization

Shared Resource

Synchronization

Mutex Lock

Semaphore

Monitor

Conditional Variable

So what is monitor?

Thread Safety

3-2. Deadlock

What's a deadlock state?

Four conditions of Deadlock state

How to resolve Deadlock state?

4. CPU Scheduling

Priority

Which process will get higher priority?

Scheduling Queue

Pre-emptive and Non-preemptive Scheduling

4-1. CPU Scheduling Algorithm

1. FIFO / FCFS (First In First Out / First Come First Served)

2. SJF (Shortest Job First)

3. RR (Round Robin)

4. SRT (Shortest Remaining Time)

5. Priority Scheduling

6. Multi-Level Queue

7. Multi-level Feedback Queue

4-2. Linux CPU Scheduling

5. Virtual Memory

5-1. Physical Address and Logical Address

5-2. Swapping and Contiguous Memory Allocation

Swapping

Contiguous Memory Allocation and External Fragmentation

5-3. Virtual Memory Management by Paging

Another problem from contiguous memory allocation?

Virtual Memory

Segmentation

Paging

Page Table

Valid bit

Protection bit

Reference bit

Modified bit

Internal Fragmentation

Page Table Base Register (PTBR)

Paging Address System

5-4. Page Replacement Algorithm

Demand Paging

What is a page replacement algorithm?

First In First Out (FIFO) Page Replacement Algorithm

Optimal Page Replacement Algorithm

Least Recently Used (LRU) Page Replacement Algorithm

6. File System

What is a file system?

6-1. File and Directory

File

Directory

File allocation

6-2. File System

Partitioning

Formatting

i-node (index-node)