1. The "Big Picture" of OS
1-1. Role / Feature of OS
- Operating System (OS in abbreviation) is a special type of program. Widely known ones are Windows, MacOS, and Linux (which are for Desktop), Android and iOS (which are for mobile smartphones).
- Although detailed features may vary, the "key features" as the OS remain the same for all of them.
- The part that runs the "key feature of OS" is called "Kernel".
- It's got two main features: resource allocation and process/thread management.
- CPU Scheduling - CPU's fair allocation and order of using CPU.
- Basic Concepts: Priority, Scheduling Queue, Pre-emption/Non-preemption
- CPU Scheduling Algorithm
- Linux CPU Scheduling
- Memory Management: Virtual Memory - Efficient memory management.
- Physical memory, Logical memory
- Memory allocation
- Paging, Page Replacement Algorithm
- File/Directory Management: File System - utilizing disks, etc.
- File/Directory
- File System
- Process and Thread Management: how to allocate resources for various processes and threads.
- Process and Thread
- Synchronization and Deadlock state
So, in Brief, the map sort of looks like this:
- OS
- Big Picture of OS
- Kernel
- System call
- Process / Thread Management
- Process and Thread
- Synchronization and Deadlock
- Resource Allocation / Management
- CPU Scheduling
- Basic Concepts: Priority, Scheduling Queue, Pre-emption/Non-preemption
- CPU Scheduling Algorithm
- Linux CPU Scheduling
- Memory Management: Virtual Memory
- Physical memory, Logical memory
- Memory allocation
- Paging, Page Replacement Algorithm
- File/Directory Management: File System
- File/Directory
- File System
1-2. System Call, Dual Mode
OS is also a kind of a program! So it must be stored in memory.
OS is stored in "kernel space" in memory. it's separated from "user space", where the user application programs are stored.
Now, the most important part is that: "If we want to utilize the features of OS, we must run the OS code that's stored in kernel space".
Generally speaking, user application program can't have direct access to the resources like CPU, memory, etc. So, it's more like OS is doing the work for the user application program that the user program can't do. This is done by "System Call": the interface to get the feature of OS.
There are various number of system calls. There are:
A process can make another process by system call, fork(). This child process can make their own child process as well.
Hence, the processes are managed in hierarchical way.
The one who made the process is "Parent Process", and the made process is "Child Process".
If a system call is called, following steps are done:
In OS, there's a specific command that invokes the SW interrupt.\
- I/O commands (accesses to resource) are one of them; and we call the interrupt caused by those as "software interrupt".
- System call is kind of a system call. So:
- When System call is invoked, CPU backs up the current program state
- the code for SW interrupt (which is in kernel space) is invoked
- Then continues the user space's code running.
So CPU separates the mode between when it's running the user application program and kernel space's program.
- When CPU is running user space's program: user mode
- can't get OS features; doesn't run I/O request and such.
- When CPU is running kernel space's program: kernel mode
- can get OS features; can run every code including accessing to the resources and such.
- You can see which mode is CPU in by looking at the supervisor flag in flag register.
System call is called very frequently!
Even when we're running a simple program that prints a simple string "Hello world", system call is called over 600 times.
2. Process and Thread
Process
Types of Process:
- Foreground process: interacts with user
- Background process: can't be seen by user
- Daemon (service in Windows OS): can't bee seen, nor user interaction: just does its job.
Even though there are various types of processes, the information in the memory that consists one process does not vary that much.
In Kernel space, PCB (Process Control Block) is stored, and there's code, data, stack, and heap segment in user space (as we know from Computer Architecture class)
User space segment
- Code segment (a.k.a. text segment) << Static allocation segment
- Instructions are stored
- read-only
- Data segment << Static allocation segment
- Data that doesn't change when the program runs
- e.g. static variables, global variables
- Sometimes we separate what so called "BSS segment"
- if there's an initial value (like static or global variable) --> Data segment
- if there's no such an initial value --> BSS segment
- Heap segment << Dynamic allocation segment
- Dynamically allocated values are stored (e.g. malloc() in C)
- Need to deallocate after using
- otherwise, memory leak can happen
- In some programming language, they provide garbage collection feature where it automatically deallocates heap memory that is not used.
- Stack segment << Static allocation segment
- Temporary values are stored
- local variable, return address, parameter, etc.
- function calling information can be stored as stack trace format
- Stack Trace is...
- function call info that was stored in stack area in specific time
- useful for debugging
PCB and Context Switching
PCB
- In order for OS to manage a bunch of processes in the memory, the processes need information in the kernel space to be able to be distinguished by the OS.
- This information is called Process Control Block (PCB in abbreviation).
- PCB is a struct that includes various information that is related with process.
- struct is just like struct in C; it's a complex data type that combines various data types into one.
- When a new process is stored in a memory (process was made), PCB is made; destroyed when the process is completed.
- PCB contains...
- Process ID (PID)
- Register values that was used
- Process state
- CPU scheduling (priority information)
- memory information
- I/O information
- PCBs are usually managed by process table (a bunch of PCBs)
- When there's a new process, a new PCB is added to the process table; allocates required resources.
- When a process is terminated, the resources are deallocated and the PCB gets deleted from process table.
- when the process is not properly terminated so the PCB still remains in the process table, we call this "zombie process".
Context Switching
- Process is running == CPU resource allocated by OS
- Various processes take turns to run for a limited time == CPU resource allocated taking turn for each processes for a limited time
- A process' CPU usage time is limited by timer interrupt (or so called timeout interrupt)
- To do the timer interrupt and run the other process, the process running state must be stored somewhere (i.e. program counter, register value, memory information, external files, I/O, etc.) and its information (context) stored in PCB. Then, the following process' context is loaded.
- This is called context switching.
- Too frequent context switching can make cache miss chance higher --> getting the process info from PCB (memory) is more frequent --> hence bigger overhead.
- There are various states for process:
- New state
- currently making a new process
- stored in memory, allocated PCB
- waits for CPU allocation
- Ready state
- can immediately run, but waits for the CPU allocation turn
- turning ready state to running state is called: dispatch
- Running state
- it's currently running with CPU allocation
- can use CPU for a limited time
- if timer interrupt happens, it reverts to ready state
- if I/O was requested, it goes to blocked state.
- Blocked state
- Can't be immediately run
- I/O request, asking for a resource that's not immediately ready (disk, etc.)
- after the request is done, it goes to ready state.
- Terminated state
- Process has been terminated.
- OS deletes PCB from process table and deallocates the memory
(This is for blocking I/O, and apparently there's non-blocking I/O where the process request I/O and it doesn't send to blocked state but rather just runs the following command, etc.
Make sure to check out the book for this!)
2-1. Multi-Process && Multi-Thread
Multi-Process
- What if I want to run codes that consists a single process simultaneously?
- same program, different processes --> run simultaneously
- Various processes running simultaneously
- Like this, we can see multiple processes running simultaneously.
- They do not share the resources, they run individually. Even though they run the same task, their PID values are different; resources are individually allocated --> does NOT (not entirely, more like barely) affect other processes.
Multi-Thread
- We could also use multiple threads to do so.
- In this, the multiple threads that runs the same process are called "Multi-Thread".
- Each thread can have their own thread ID, program counter, register value, stack, etc.
- Therefore each thread can have the next address to run, alongside the temporary value during the calculation process.
Difference between multi-process and multi-thread?
- whether they share the process' resources or not.
- Multi-Process: they don't.
- each process is independent to each other --> if one process causes problem, other processes most likely won't get affected.
- Multi-Thread: they do.
- they share the same address of code, data, heap section. --> shares the same resource of the same process / opened file
- advantage: can easily interact with each other.
- disadvantage: if one thread causes a problem --> the whole process can get in trouble.
2-2. Communication between Processes
IPC (Inter-Process Communication)
- Even though processes don't share the resources at default, there is still a way to share the resources and exchange (give and receive) data to each other. This is called IPC.
- Two main types:
- Shared Memory
- Allocates memory space for public (in terms of processes, so multiple processes can access the same memory space)
- Each process communicates like they read and write their own memory space.
- In often times, kernels generally do not interfere.
- the data that they exchange generally don't get through kernel space.
- It's fast (because it's just like they read and write their own memory space), but the consistency may not be conserved.
- It's called "Race condition". We'll get to that in Synchronization and Deadlock.
- Message Passing
- data exchange is done by in a form of message
- Gets through the kernel
- The system calls for sending and receiving message is clearly defined:
- send()
- recv()
- Each process can call those system calls to exchange messages
- Pretty much no need to consider the problems in terms of race condition and synchronization
- But, it's much slower compared to shared memory (because it gets through the kernel)
- There are various ways to send messages:
- Pipe
- one-way process communication
- if you want both ways, you generally get two pipes
- these are generally unnamed pipes (no two-way communication, only between parent and child)
- named pipe (or FIFO) that was slightly improved from unnamed pipe, where it supports two-way communication and can be used in not only between parent and child but also an arbitrary process
- Signal
- Asynchronous signal that notifies process that a certain event has happened
- it's not limited to IPC (it's more like you can use signal to implement IPC)
- Type of Signals will be in the bottom of the table.
- If a signal happens, process pauses the task and run "signal handler" to handle the signals and resume the task.
- Process can directly cause signal and can (re)define part of the signal handlers.
- Socket
- Will be continued in Ch. 3. Network because it requires the background knowledge of network.
- RPC (Remote Procedure Call)
- runs remote code.
- If running a specific code in a process is local procedure call, other process' remote code running is RPC.
- Through RPC, we can minimize the performance decrease and still give and receive the message
- Generally used in Huge-scale of traffic handling environment
- especially in communication between servers.
Type of Signals:
3. Synchronization and Deadlock
3-1. Synchronization
Shared Resource
- Resources that process or threads share
- could be memory or a file, global variable or I/O device.
- Among the shared resource, if it causes trouble when it was run simultaneously --> it's called "Critical Section".
- If a simultaneously running process or thread enters critical section together and run it, it would cause a problem. --> This is called "Race Condition".
- Resource consistency can be violated. So either of the process has to wait until the other one finishes.
Synchronization
- Synchronization is required to prevent race condition and manage the critical section. It has two conditions while running:
- Running order management: Making sure process and thread is running in correct order
- Mutual Exclusion: Making sure only one process or thread accessing to critical section
Mutex Lock
- A tool that making sure the mutual exclusion happens.
- If a process or a thread wants to access to a critical section, they must acquire a lock
- If a task is completed in a critical section, they must release the lock
- Typical mutex lock is designed like this:
- a common shared variable: lock
- 2 functions: acquire(), release()
- acquire() is for acquiring the lock; only one call available per each specific lock
- release() is for releasing the acquired lock
- To access critical section, lock.acquire() is called;
- if a process did this and another process wants to call lock.acquire(), they can't actually acquire the lock.
- To exit critical section, lock.release() is called;
- if there's any waiting process/thread, then they finally are able to successfully call lock.acquire()
- Programming languages like Python, C/C++ supports mutex lock; users don't need to actually implement acquire() or release() function. Even though Java doesn't use the 'Mutex Lock' expression, they still support the lock feature.
Semaphore
- Mutex Lock is a tool that considers only one shared resource.
- For multiple shared resources, we can use semaphore. It's more generalized way for mutex lock.
- One variable and two functions
- Variable S: number of shared resources available (== number of processes that can access to critical section)
- wait(): a function called before accessing critical section wait() { S--; // decreases S by 1 if (S < 0) { // Check if S is now negative (means that if there was any shared resources available) sleep(); // the process that called wait() turns into blocked state; can't access to critical section. } }
- signal() a function called after accessing critical section signal() { S++; // increase S by 1 if (S <= 0) { // S being negative or 0 means that there's a blocked state process that's waiting to access to the critical section. If S is positive, then there's 1 or more shared resources available. wakeup(p) // Turns one blocked state process into ready state. } }
Monitor
Conditional Variable
- synchronization tool that's used to manage the running order
- you can call wait() and signal() for conditional variable
- if a specific process' running condition is not met: wait() --> pause.
- if a specific process' running condition is met: signal() --> resume.
So what is monitor?
- a synchronization tool that consists of shared resources and its interface
- can do both mutual exclusion and running order management
- process and thread must go through the interface (shared resource function/calculation/implementation) and access into monitor, and the process/thread inside the monitor always has to be only one. Otherwise the process must be in queue. (blocked state)
- calling wait() and signal()
- Example: Java's keyword "synchronized" public synchronized void example(int val) { this.count += val; }
Thread Safety
- In multi-thread environment, there's no problem when a certain variable, function, object has been accessed by multiple threads simultaneously --> thread safety.
- race condition --> not thread safety.
- For example, in Java, add method in vector class is thread-safety-guaranteed. (as the add method is implemented with synchronized keyword)
- Meanwhile, add method in ArrayList class is NOT thread-safety-guaranteed. (as there's no synchronized keyword in add method implementation)
3-2. Deadlock
What's a deadlock state?
- Process is stopped while waiting for nothing (event that won't happen)
- If process A with allocation resource X, and waiting for the resource Y to be deallocated from process B; process B with allocation of resource Y waits for resource X to be deallocated from process A --> deadlock state
Four conditions of Deadlock state
- Mutual Exclusion
- The resource that one processes uses can't be used by other resources
- Hold and wait
- The resource has been allocated to the process and the process waits to get allocated another resource
- non-preemption
- Can use the resource if and only if the process that's already using the resource gets terminated
- cycling wait
- like the graph above, the waiting graph is cycle
How to resolve Deadlock state?
- Deadlock state prevention
- Making sure that any of the Four conditions of Deadlock state state is not met.
- Let one process have all the resources allocated: then the next one; etc. (prevents hold and wait)
- Set priority for all the allocatable resources and allocate it according to the priority (prevents the cycle)
- Deadlock state evasion
- considers deadlock state as a result of limited resource's injudicious allocation
- therefore limits the allocation of the resources just until the deadlock state does not happen
- if a limited resource needs to be allocated by many processes; there can be deadlock state.
- Deadlock state detection & resolve
- Basically like a follow-up fix (fixing the deadlock state after it happened)
- OS allocates resources whenever process requires to do so; regularly checks the deadlock state.
- If deadlock state is detected:
- Resource pre-emption: forcefully take resources from other processes until the deadlock state is resolved.
- terminating the deadlock state process.
4. CPU Scheduling
Priority
- OS sets priority for each processes and specifies in PCB
- With high priority process, CPU resources gets more allocated and faster.
- User can manually set a process' priority higher.
- In Unix-based OS (Unix, Linux, MacOS, etc.), you can check the command "ps" to check each process' priority.
Which process will get higher priority?
- The goal is to maximize the CPU Utilization
- CPU Utilization: Time for task per the entire CPU running time
- Most process' use both CPU and I/O devices: it goes back and forth between blocked state and running state.
- Task that a process using CPU: CPU burst
- Processes that have more CPU burst: CPU bound process
- complex math calculation, graphic task, etc.
- Task that a process waits for I/O devices: I/O burst
- Processes that have more I/O burst: I/O bound process
- video playing, disk back-up, etc.
- CPU bound process stays more on running state meanwhile I/O bound process stays more on blocked state
- So to maximize CPU utilization, I/O bound process needs to run as soon as possible so more I/O request happens --> allocate more CPU resource on CPU bound process.
Scheduling Queue
- OS requires processes (wants to use CPU, a certain I/O devices, be stored in memory, etc.) to be in queue to be allocated with the resources; This queue is implemented with Scheduling Queue.
- Ready Queue: PCB's queue where processes want to use CPU
- Waiting Queue: blocked state processes' PCB queue
- I/O request, process waits in waiting queue in blocked state, waiting for I/O complete interruption
- if it's interrupted, then it goes to the last position of ready queue and be ready to execute
- Timer interrupt in CPU --> goes back to ready queue (and if it's requesting I/O, then it goes to waiting queue)
- CPU runs processes in queue mostly in order, but based on the priorities.
Pre-emptive and Non-preemptive Scheduling
- Scheduling is done when the process' execution finishes. However, there are two main scenarios where scheduling is done even when the process is not terminated.
- A process requests I/O and hence goes to waiting queue (turned into blocked state)
- Timer Interrupt
- A scheduling type that occurs in both situations: Preemptive Scheduling
- forcefully take away CPU resource from the process and allocate to another process
- A scheduling type that must wait until a process that using CPU terminates: Non-preemptive Scheduling
- Each scheduling type has their own pros and cons
- Preemptive scheduling can prevent a process using CPU resource exclusively (monopoly)
- but can have overhead while context switching
- Non-preemptive scheduling has less context switching and hence less overhead
- but must wait until the process that's using CPU terminates; there's no other way
4-1. CPU Scheduling Algorithm
1. FIFO / FCFS (First In First Out / First Come First Served)
- Literally, the first process came --> gets its job done first
- But waiting time can be really long
- If the first one came and it's significantly longer than other process so the entire system is delayed: it's called Convoy Effect
2. SJF (Shortest Job First)
- Shortest Job gets done first
- Turnaround time decreases
- but response time increases
3. RR (Round Robin)
- FIFO + Time slicing
- basically FIFO but has time limit for each process in one time, so after the time slice, context switch happens
- Response time decreases
- Turnaround time increases
4. SRT (Shortest Remaining Time)
- SJF + RR
- Each process uses CPU in a specific time slice, but the next one will be the shortest remaining time.
5. Priority Scheduling
- Each process gets priority, and runs starting from the highest priority process.
- However, even though the process has arrived first but priority is low --> can keep delaying
- this is called starvation
- to avoid starvation, we have something called aging
- which is the longer the process wait, the higher priority they get.
6. Multi-Level Queue
- advanced form of priority scheduling
- there are multiple queues with different priority, and gets the process that's in the highest priority queue done
- However, the downside is that the processes can't move around the queues; can have starvation
7. Multi-level Feedback Queue
- It's similar to multi-level queue but processes can move between the queues; so the starvation is prevented.
- The longer the process requires CPU, the lower the priority it gets.
- CPU bound processes' priority gets lower, I/O bound processes' priority gets higher.
- Aging can also be implemented
4-2. Linux CPU Scheduling
- In Linux, various scheduling algorithms can be used.
- this can be spotted from the scheduling policy in Linux
Scheduling Policy | in what situation? |
SCHED_FIFO | Policy for real-time processes (provides higher priority) |
SCHED_RR | Policy for real-time processes (provides higher priority) |
SCHED_NORMAL | Policy for normal processes |
SCHED_BATCH | Policy for batch tasks that are not-so-much preemptive than normal processes |
SCHED_IDLE | Policy for very low priority processes (provides lower priority) |
- Generally speaking, FIFO, RR, NORMAL ones are commonly mentioned policies.
- FIFO and RR scheduling is done by real-time scheduler (RT)
- NORMAL is done by CFS
- Completely Fair Scheduler; aims to schedule in a completely fair way
- in Linux, each processes have info called Virtual Runtime (vruntime): CFS schedules from the smallest vruntime process.
- vruntime is NOT an actual runtime, it's a virtual runtime (considering each process' weight)
- CFS' time slice gets bigger when the process' weight is bigger.
- Kernel should be able to distinguish which process has the smallest vruntime out of tremendous amount of processes; hence they use Red-Black Tree data structure.
5. Virtual Memory
How does CPU recognize all the processes' address in the memory and manage them?
5-1. Physical Address and Logical Address
- CPU and processes do not use the Physical Address of memory (actual address of memory HW)
- They rather use Logical Address
- starting from 0x0 (0번지) of each processes
- So, they can never have the same Physical address but they can have the same logical address.
- Also, the conversion between physical and logical address is necessary in order to interact with memory HW (otherwise it'd be really hard to do so, wouldn't it?)
- That's what MMU (Memory Management Unit) is for.
- It's in between CPU and Memory
- Converts the CPU's logical address (what CPU understands) into Memory's physical address (what memory understands)
Then how does the process get allocated to the memory using the address system?
5-2. Swapping and Contiguous Memory Allocation
Swapping
- Among the processes that are stored in the memory, there could be some processes that are not currently running.
- I/O requesting processes that are in blocked state
- Processes that has not been used for a long time
- These processes are moved into swap space (part of disk), and storing other processes in the empty space of the memory
- it's called Swapping.
- Moving from memory to swap space: Swap Out
- Moving from swap space to memory: Swap In
- When swap-out process gets swap-in back, this process can get stored in different address of memory.
- Then how to store processes in memory efficiently when there's a multiple spaces available?
Contiguous Memory Allocation and External Fragmentation
- Until now, we have assumed that the processes are stored contiguously in memory. This technique is called Contiguous memory allocation.
- It looks like it's obvious to do so, but it's not the most efficient way to do so; since it contains the problem External Fragmentation.
- There could be an empty space in the middle of the memory due to the fact that the empty space may not be sequential; even though there are 50MB available, it may be divided into 30MB and 20MB in the memory.
- The empty space is created when the process is started and finished. It surely is an empty space but it's hard to store a new process; so it's basically a memory space waste. This is called External Fragmentation.
5-3. Virtual Memory Management by Paging
Another problem from contiguous memory allocation?
- Another problem from contiguous memory allocation besides external fragmentation is that we can't run a process that's bigger than the actual size of memory.
- a computer with 4GB RAM can't run a program that's more than 4GB.
- To overcome this problem, we use virtual memory.
Virtual Memory
- Stores only a part of the program in a memory, so that we can run a process that's bigger than our memory size.
- Utilizing part of our disk as memory --> illusion to the process that the memory is bigger than its actual size.
- There are two main virtual memory management techniques: Paging and Segmentation.
- We're going to focus on paging, as it's more widely used.
Segmentation
- Instead of dividing into a constant unit page, we divide the process with segment, whose value can be interchangeable.
- Even though the division is not consistent, we can still prevent the external fragmentation due to its meaningful division of logical units.
- One segment could be a code segment (or a part of it), or a data segment (or a part of it).
- However, if we use segmentation, the segment's size is not consistent; meaning that external fragmentation can still happen.
Paging
- Paging divides the processes' logical address space into an arbitrary constant unit called page, physical address space into frame, which is going to be the same size as page. Then, we allocate page to frame.
- We need to pay attention to the fact that the pages could be placed non-contiguously in the physical memory.
- Make sure to look at the book to help visualize it! (pg. 211)
- By this, we can prevent the external fragmentation.
- We can still use swapping in paging method.
- In the systems that use paging, instead of the entire process being swap out/swap in, its swaps out/in per pages.
- Swap out/swap in in Paging system is called Page out/Page in.
- Let's say among the pages that consists the same process, some of them are paged in and some of them are paged out. This means some parts of the process is stored in memory meanwhile some of them is in disk.
- Therefore, in order to run a process, not an entire process is required to be in the memory
- From perspective of CPU, logical memory's size got bigger than an actual memory's size
- Through paging, we can run processes that are bigger than memory's size.
- we said that the pages could be placed non-contiguously in the physical memory. Then, a new problem arises: it's hard for CPU to find the next page to run.
- it's hard for CPU to know which page that consists a certain process is in which frame.
- To overcome this challenge, we use something called [[#Page Table]].
Page Table
- In page table, there's a page number and the frame number that's actually stored.
- By using page table, CPU can locate which pages are in which frame.
- But not only those, page table also includes other crucial information.
- The rows of page table are called Page Table Entry (PTE for short).
- Although it may vary based on different OS, it might be...
- Page number, frame number, Valid bit, Protection bit, Reference bit, Modified bit, etc.
Valid bit
- shows whether the page is accessible or not
- indicates if a current page is stored in memory or a disk
- if a page is stored in memory --> 1, if not, then 0
- CPU can't immediately access to a disk, it needs an extra step: page stored in a disk should be stored in (or moved to) a memory
- If CPU tries to access to a page whose valid bit is 0, then it causes page fault exception.
- Look at the column: Hard Faults/sec. This column represents the page fault per sec.
- Then how does CPU process the page fault exception?
- Backs up the current task state
- Page Fault process routine activated: take desired page into memory so the valid bit becomes 1.
- after the routine, the page that has been moved from disk to memory is run.
- Types of page fault:
- major page fault
- I/O required; writing page info from disk to memory (the one that we've been talking)
- minor page fault
- I/O not required; the page that CPU asked is in the physical memory but not in the page table.
- generally less effect towards the performance than major page fault.
Protection bit
- For page protection
- limits the read/write/execute by using combination of variables r, w, and x, which represents read, write, and execute respectively.
- If protection bit is 100, then it means r=1, w=0, x=0 --> the page is read-only.
- If protection bit is 111, then it means r, w, x = 1 --> the page can read, write, and execute.
Reference bit
- shows if a CPU has been accessed to this page before
- if a CPU has read or written on a certain page after stored in memory, reference bit becomes 1
- After stored in memory, if CPU has not read nor written the page: reference bit is 0.
Modified bit
- Shows if a CPU has written on the page. Also known as dirty bit.
- if modified bit is 1, that means it has been written/modified at least once.
- if a page needs to be deleted, the page's modification history needs to be reflected (included) in the disk; therefore it needs an extra step to write on a disk.
- if modified bit is 0, that means it hasn't been modified, only been read or not been accessed by CPU)
- no history to write, so if page is deleted from the memory, you can immediately do so.
Internal Fragmentation
- Although paging can resolve external fragmentation, it can also cause internal fragmentation.
- Paging is dividing process into consistent unit, but not all the processes are not divided by the exact size of page. Especially the last page can have the remainder.
- For instance, let's say the page's size is 10KB and the entire process is 107KB. It will be divided into 11 different pages, with last page being 7KB. So 3KB is left for the last page; hence memory waste. This is called Internal fragmentation.
Page Table Base Register (PTBR)
- Each process' page table may be stored in memory. Hence, we need to know the location of the memory that's storing the page table to be able to execute a process.
- There's a special register that's exclusively for a specific process' page table's memory location.
- Since each process has PTBR, it's written in each PCB. It's used in context switching.
- However, if every processes' page tables were stored in memory, it's inefficient so we try to avoid for two reasons:
- Higher memory access frequency
- CPU needs to access page table, and access to real frame. For each case of page, CPU needs to access memory twice.
- Therefore, if every page tables were stored in memory, the time taken for accessing memory would take double the amount.
- To overcome this challenge, we use page table's cache memory called:
- Translation Look-aside Buffer (TLB)
- TLB is a page table's cache; based on the locality of reference, it stores some part of page table that is likely to be accessed often.
- If page number of logical address that CPU's trying to access is in TLB, TLB lets CPU know the frame number that stores the page number.
- This is called TLB hit.
- If not, CPU needs to access the page table by itself, and this is called TLB miss.
- requires additional memory access.
- Hence, we need to increase TLB hit.
- Takes up a lot of memory space
- To overcome this, hierarchical paging was introduced; where it pages paging table.
- also called multi-level page table
- By paging the page table and having outer page table that points each page table, we can get a full access by just having outer page table in the memory.
Paging Address System
- There are various addresses in a single page, so paging system's logical address follows this structure:
- <Page number, offset>
- Page number: which page number we're accessing?
- Offset: how many offsets is the desired address apart from starting address of page (frame)?
- Hence, the <Page number, offset> converts into <frame number, offset> by page table.
5-4. Page Replacement Algorithm
Demand Paging
- We don't have to store all the pages that consists a process; we can only store the required pages --> this is called demand paging. It's got following aspects:
- CPU accesses to a certain page by running a command
- if valid bit is 1, CPU accesses directly to the stored frame
- otherwise, page fault invoked.
- If page fault was invoked, the page is stored in the memory by the page fault process routine and sets valid bit as 1
- goes back to #1.
- If no page is stored in memory and trying to run a process, it's possible; it's called pure demand paging.
- This invokes page fault from the start
- after storing some required pages in the memory, the page fault is less invoked.
What is a page replacement algorithm?
- At some point, the memory will be full; then we need to swap out some of the pages. Then how do we select which pages to swap out?
- This is called page replacement algorithm.
- This is directly related with the entire computer's performance.
- With good page replacement algorithm, less page fault invokes.
- Bad page replacement algorithm invokes more page fault --> writing a page from disk to memory, which means lower performance.
- There's also a keyword called "Thrashing", which means it spends more time on paging than process running time --> hence reduces performance.
- There are mainly three page replacement algorithms:
First In First Out (FIFO) Page Replacement Algorithm
- swapping out the first page to be stored in the memory
- straightforward, but could swap out the important page that's been using. Then, soon it'll cause page fault.
Optimal Page Replacement Algorithm
- Swapping out the pages that will be least frequently used
- Guarantees the least page fault, but implementation is practically hard; you can't predict which page will be least frequently used.
Least Recently Used (LRU) Page Replacement Algorithm
- Swapping out the pages that has been least frequently used
- most commonly used page replacement algorithm's OG form, and there are many derived algorithms from this.
6. File System
What is a file system?
- OS' internal program that stores and manages the disk's information in a form of directory (folder)
- With this, we can deal with tremendous amount of information in the disk in a form of directory and folder.
- In a single OS, multiple file systems could be used; if a file system gets different, the way to deal with disk's information also gets different.
6-1. File and Directory
File
- Consists of...
- file name, required info to execute file, additional information about the file
- additional info is called "attribute" or "metadata". These types of info is "properties".
- All the tasks dealing with files are done in OS. User application program can't allocate a file on its own and read/write/execute, it needs to use system call that deals with files.
- If a process was allocated with 10 different files, the process should be able to distinguish each files. For this, process uses an information called file descriptor. (file handle in Windows)
- 0 or bigger integer format
- used for low-level file recognition and tasks
- open() returns the opened file's file descriptor
- write() and close() gets the file descriptor as parameter
- not only works for just files, it also works for I/O devices, pipe and socket (tools for IPC) also use file descriptor.
Directory
- so called "folder" in windows, OS uses directory to manage various files.
- modern directory forms hierarchical structure. (which can be also called as "tree")
- so modern directory is managed as tree-structured directory with multiple hierarchical levels.
- Root directory: expressed as slash (/), paths are location of a certain file from root
- in windows, root directory is C:₩ and uses backslash(\) as path delimiter
- Directory is considered as some special kind of file
- more specifically, a file that contains the information that is related with element under the directory.
- the info related with element under the directory is expressed in a form of table.
- Each row is called "directory entry".
- contains file name, and information which would be used to guess a file's location
- also contains time taken for generation, modified time, size, etc.
File allocation
- OS reads and write in a unit called 'block'. Each block is usually 4096 bytes.
- When a file is stored in a disk, they get more than one block allocated to them.
- Some file systems can have next block's address in each block so every block is linked. This is called "Linked allocation".
- In this case, you can look at directory entry and notice which file is located where.
- directory entry: file name, first block address, and length (per block)
- Some other file systems can have every block's address in a special blocked called "Index block", this is called "Indexed allocation".
- If you know index block, you can access the desired file data.
6-2. File System
- There are various file systems.
- Each OS supports different file systems, and one computer can utilize various file systems even on a single OS.
- Windows: NT file system (NTFS), Resilient file system (ReFS), etc.
- Linux: EXT, EXT2, EXT3, EXT4, XFS, ZFS, etc.
- For these file systems, file allocation is done based on the index block called [[#i-node (index-node)]]
- MacOS: APFS (Apple File System), etc.
Partitioning
- In a disk, various types of file systems can be used.
- To apply multiple file systems in a single disk, you have to separate which file systems are we going to use in different areas of disk.
- The separation of the disk's area is called "Partitioning", and one area separated by the partitioning is called "Partition".
- A disk can be separated into multiple partitions; each can use different file systems.
Formatting
- Among various file systems, decision of the file system is done when we format the disk.
- Formatting is setting a file system
- Decide how are we going to store and manage files
- A task to get ready to write a new data
i-node (index-node)
- In i-node based file system, each file has their own i-node, and there's a number assigned in each i-node.
- you can actually look at the number in Linux or MacOS using ls command's i option.
- For instance, let's consider EXT4 file system (Refer to the book for the visualization and clear understanding)
- consists of groups of multiple blocks
- first block is not storing an actual file data, but rather contains an info for the booting and partition management.
- Each block group consists of...
Name | Description |
Superblock | stores overall file system information such as number of i-node, total number of blocks, size of block, etc. |
Block Group Descriptors | stores metadata about block groups |
Block Bitmap | stores how the data was allocated in a current block group |
i-node Bitmap | stores how i-node was allocated in a current block group |
i-node table | stores each file's i-node information |
Data Block | stores each file's data |
- As you can see, i-nodes are all in the specific area of partition.
- in other file systems based on i-node, it also works like this
- Hence, in an i-node based file system, OS can't make a new file if i-node's area is full and can't allocate i-nodes anymore: even if there's an empty space in data area.
Hard Link and Symbolic Link
- Hard Link file: a file that shares i-node with the original file
- shares the same file data
- if hard link file's data changes, original file data changes as well
- even if the original file is deleted or moved, you can still access to the file data
- used when you want to reference the same file with different names\
- Symbolic Link file: points at the original file
- doesn't share the same file data, only stores the location of the original file
- if original file gets moved or deleted, you can't use the file data.
- used when you want to simply refer to it (e.g. shortcut to file that's in complex file path)
Mount
- File system transfer (편입) from storage device to device.
- e.g. USB to PC, etc.
- if you put USB on the PC, you can access USB memory's file system through PC's file system.
- connects through the path "/mnt"
- can also mount based on the commands (mostly in Unix or Linux)
- mount -t ext4 -o ro /dev/sda /mnt/test
- meaning that...
- mount the /dev/sda's device's EXT4 file system into /mnt/test directory with read-only.
7. Additional Notes
Booting
- Storing Kernel in memory and starting the computer.
- When a power is on, it reads the information from a non-volatile memory (ROM) to a volatile memory (RAM)
- CPU reads specific address that has been set before
- there's BIOS (Basic I/O System) in that address
- BIOS checks HW and see if there's any problem in there. (POST, Power On Self Test)
- If there's no problem, then it reads special information that needs for booting from disk's master boot record (MBR, located in a very first part of disk: first sector in case of hard disk)
- In MBR, there's a special program called bootstrap (boot loader)
- a program that finds a location of kernel and stores in memory
- Recently, there's a trend that instead of using BIOS, using UEFI (Unified Extensible Firmware Interface)
- can be booted in large environment
- fast and secure booting
- supports GUI
Virtual Machine and Container
Virtual Machine
- A virtual computer that has been developed in a SW manner
- A software that makes this virtual machine and runs: hypervisor
- e.g. VirtualBox
- can use another OS
- separates resource allocation and provides virtualization / resource isolation
- works like they're different computers
- but can cause big overhead and hence slower performance
Container
- Another virtualization / resource isolation technique
- e.g. Docker, LXC
- Virtual Machine provides HW level of virtualization
- Container provides OS level of virtualization
- uses the same kernel, only isolates for the specific process needed
- way lighter, and doesn't depend on the platform
- Tremendous amount of containers are created (over 4 billion per week)
- managing this large amount of containers: container orchestration
- e.g. Kubernetes