Can we design Scalable, Reliable and Secure Operating system?

As computer processing power is doubling after every 18 months according to Moore’s law. Currently, we have 3 to 10 processor’s cores are available and lots of memory available and thousands of different devices available. If this trend continues, are we need a different OS design that can scale?

Designing a scalable operating system is difficult as compared with software which is running on that operating system. During designing, we care about OS’s compatibility, scalability, and security. In this paper, I will explore the OS scalability and security issues and trade-off, which we make as design decisions. Allot of work is done on scalability issue and design scalable OS, for example, Barrelfish, which follows the principle that scalable OS should be event-driven base model and not the threaded model based, as when we increase the number of threads, maybe couple of thousand for large enterprise level OS, it’s performance decrease; others are Corey, Fos, Tessellation etc.

Security in OS is vital and unsecured OS mean unsecured applications, as if OS allowed to install bad software it can steal user information, which we are using in other applications for example key logger can steal the user name and password of our online bank account and intruder can transfer all of the amounts from online bank account. Designing secure OS is challenging, I will explore the key challenges and their solutions.

A reliable OS is which never crashes and stays live at 99% of time. Efforts are done to make reliable OS by reducing the Kernel code less than 1500 lines of code and running device driver in user mode. Minix 3.0 is one example of reliability. In soft real time system and hard real time system, an operating system is very important as we need OS software running all the time without rebooting the OS after even security update, will discuss about reincarnation mechanism and how it can help it to achieve the reliability in the operating system.

Introduction

When we talk about reliability in general terms which is; “I have not experienced a failure in the system and does not know anybody who has the same experience in his lifetime”. This statement is very optimistic, in technical term; mean time to failure is <50 Years let’s say, but is opposite with OS as our operating system crashes many times and we reboot the OS after updates, installing, uninstalling the software, this shows that our current OS is not reliable, now think about embedded OS, Nuclear reactor control OS, Air traffic control system, Autopilot system, if that is not reliable and result can be catastrophic.

Contemporary society is dependable on software and software should be reliable as we have TV set, we buy and plug into the power socket and runs perfectly for 20 years.

An Operating system reliability can be achieved by.

  1. MicroKernel
  2. Reincarnation Mechanism
  3. Secure role based access control

An operating system is scalable if it can be extended without redoing the work or coding. Current monolithic structure OS is not scalable according to Moore’s law, hardware power is doubling after every 18 months. Now we have Multicore CPUs

To achieve the concurrency and parallelism, OS scheduler should be modified, there was a time in the early 80s when uniprocessor was used which was simple just fetch and execute the job, no multicore or multiprocessing was involved; nowadays we have multicores CPU’s to achieve the parallelism. Jobs are distributed across multiple CPU’s cores’. Each core has L1, L2, and L3 cash and shares memory. Global data is maintained in main memory when all the processes required to share the data. Now we have the locking mechanism to avoid the reader/writer problem. When we increase the cores, there are potentials of deadlocks, so scaling becomes problems here. Message passing mechanism was proposed, will discuss it in the next section.

To achieve the parallelism threads are used in Windows Operating system and the Linux operating system. There are users and kernel level threads. User level threads are mapped with Kernel level threads when kernel requires to kill the thread due to the exception or an error condition, it only terminates the small number of user threads, not the whole process. This is the good mechanism to achieve the concurrency and parallelism, but the system gives peak performance when the number of threads is limited for example 200 threads in windows OS. Now think about large enterprise level OS, when we have hundred thousand threads in large client-server architecture, there are many connection requests and system has to maintain all the threads. If threads are too many, system performance decreases and it is possible that server could not process the further requests. To overcome this problem a solution was proposed by the research community, which is called Flash Server. Which follows the principle of the event-driven model, so this model is scalable. We can register as many events as we can.

Operating System scalability can be achieved by

  1. Even driven
  2. Messages passing between the processes

An OS is secure if it can protect the user from all kinds of attacks and data is consistent and only authorized people can access it. Windows use role bases access mechanism.

To prevent the operating system following most common mechanisms are used

  1. Antivirus Software
  2. Anti-Spyware Software
  3. Firewall
  4. Password authentication

By installing all these software we cannot give assurance about the foolproof security. It is possible that user visits a website and clicks on the popup and as a result, software is installed on the computer and that keylogger application logs your username and password of your banking account and sends to the server and a hacker can access your account by using your credentials. End to end encryption mechanism becomes useless in this case.

OS design is complex, so the tradeoff is also difficult. Investigation and exploration about scalability, portability, security and reliability issues can open us a new door towards the better solution.

  • Solutions to scalability and security
  1. MicroKernel
  2. Reincarnation Mechanism
  3. Event-based mechanism
  4. Message passing between processes.

Details with diagrams about proposes solution are given below.

MicroKernel.

Drivers and servers are put in user mode and kernel handles interrupt, processes, scheduling, interposes communication. User program sends the request to the server for example file server and file server passes that request to the driver in our case disk driver and driver talks to the kernel to full fill the request.

Reincarnation Mechanism

Explanation:

In this mechanism reincarnation server checks if the other server and drivers are working properly. If the fault is detected, it automatically replaces without any user interactions, so the system becomes self-healing and can achieve the high reliability. The system can be updated lively without rebooting.

Event-based mechanism

Explanation:

Request from disk and networks are quid in butter, scheduler picks the job from butter one by one and processes. The scheduler can process based upon FIFO or Round robin mechanism.

If we increase the requests OS performance does not decrease in this mechanism, so it is better if we have the larger number of requests to process.

Message passing between processes.

Explanation:

Client in the process in which requests for services and server processes that request. Request/Response mechanism is done through message passing, so no use of memory, in this case, no need to keep the date in global shared memory. 

Analysis and Conclusion

If we apply the solution discussed we can achieve the scalability and research operating system Minix, Barrelfish and Flash, Foe showed the high performance. The future operating system will be much different currently which we have. Scheduling algorithms, memory management, and data passing will be done in a different way.

Android was taken from Linux, Windows have the monolithic operating system. Layered OS was not successful as we need to access memory. In layer we do restriction that lower layer can do only read requests, it cannot be modified to the upper layer.

Monolithic OS has many bugs because Kernel has huge code, number of bugs per kilo line of code increases when we have a lot of code in kernel and some bugs may cause security issues, so microkernel is better solution where we limit the kernel code to less than 1500 line of code and put device drivers in userspace. In embedded devices, we have microkernel implementations. All which discussed will be implemented in the Windows operating system in near future to cope with hardware growth. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: