Annotation of prex-old/doc/html/doc/kernel.html, Revision 1.1.1.1.2.1
1.1 nbrk 1: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2: <html>
3: <head>
4: <title>Prex Kernel Internals</title>
5: <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
6: <meta name="keywords" content="Prex, embedded, real-time, operating system, RTOS, open source, free">
7: <meta name="author" content="Kohsuke Ohtani">
8: <link rel="stylesheet" type="text/css" href="../default.css" media="screen">
9: <link rel="stylesheet" type="text/css" href="../print.css" media="print">
10: </head>
11: <body>
12: <div id="top">
13: </div>
14: <div id="middle">
15:
16: <table id="content" cellpadding="0" cellspacing="0">
17: <tbody>
18:
19: <tr>
20: <td id="header" colspan="2" valign="top">
21: <table width="100%" border="0" cellspacing="0" cellpadding="0">
22: <tr>
23: <td id="logo">
24: <a href="http://prex.sourceforge.net/">
25: <img alt="Prex logo" src="../img/logo.gif" border="0"
1.1.1.1.2.1! nbrk 26: style="width: 250px; height: 54px;"></a>
1.1 nbrk 27: </td>
28: <td id="brief" align="right" valign="bottom">
29: An Open Source, Royalty-free,<br>
30: Real-time Operating System
31: </td>
32: </tr>
33: </table>
34: </td>
35: </tr>
36:
37: <tr>
38: <td id="directory" style="vertical-align: top;">
39: <a href="http://prex.sourceforge.net/">Prex Home</a> >
40: <a href="index.html">Document Index</a> >
41: Kernel Internals
42: </tr>
43: <tr><td class="pad" colspan="2" style="vertical-align: top;"></td></tr>
44:
45: <tr>
1.1.1.1.2.1! nbrk 46: <td id="doc" style="vertical-align: top;">
! 47:
1.1 nbrk 48: <h1>Prex Kernel Internals</h1>
49:
50: <i>Version 1.4, 2005/12/31</i>
51:
52: <h3>Table of Contents</h3>
53: <ul>
54: <li><a href="#over">Kernel Overview</a></li>
55: <li><a href="#design">Design Policy</a></li>
56: <li><a href="#thread">Thread</a></li>
57: <li><a href="#task">Task</a></li>
58: <li><a href="#sched">Scheduler</a></li>
59: <li><a href="#memory">Memory Management</a></li>
60: <li><a href="#ipc">IPC</a></li>
61: <li><a href="#except">Exception Handling</a></li>
62: <li><a href="#int">Interrupt Framework</a></li>
63: <li><a href="#timer">Timer</a></li>
64: <li><a href="#device">Device I/O Service</a></li>
65: <li><a href="#mutex">Mutex</a></li>
66: <li><a href="#debug">Debug</a></li>
67: </ul>
68: <br>
69:
70: <h2 id="over">Kernel Overview</h2>
71:
72: <h3>Kernel Structure</h3>
73: <p>
74: The following figure illustrates the Prex kernel structure.
75: </p>
76: <p>
77: <img alt="Kernel Components" src="img/kernel.gif" border="1"
78: style="width: 602px; height: 414px;"><br>
79:
80: <i><b>Figure 1. Prex kernel Structure</b></i>
81: </p>
82: <p>
83: A kernel object belongs in one of the following groups.
84: </p>
85: <ul>
86: <li><b>kern</b>: kernel core components</li>
87: <li><b>mem</b>: memory managers</li>
88: <li><b>ipc</b>: inter process communication (*)</li>
89: <li><b>sync</b>: synchronize objects</li>
90: <li><b>arch</b>: architecture dependent components</li>
91: </ul>
92: <p>
93: <i>*) Since all messages in Prex are transferred among threads, the name of
94: "IPC" is not appropriate.
95: However, "IPC" is still used as a general term of the message transfer
96: via the kernel, in Prex.</i>
97: </p>
98:
99: <h3>Naming Convention</h3>
100: <p>
101: The name of "group/object" in figure 1 is mapped to "directory/file" in the
102: Prex source tree. For example, the thread related functions are located
103: in "kern/thread.c", and the functions for semaphore are placed
104: in "sync/sem.c".
105: </p>
106: <p>
107: In addition, there is a standard naming convention about kernel
108: routines. The method named <i>bar</i> for the object named <i>foo</i>
109: should be named "foo_bar". For example, the routine to create a new
110: thread is named "thread_create", and locking mutex will be "mutex_lock".
111: This rule is not applied to the name of the local function.
112: </p>
113:
114:
115: <h2 id="design">Design Policy</h2>
116: <p>
117: The Prex kernel focuses the following points to be designed.
118: </p>
119: <ul>
120: <li>Portability</li>
121: <li>Scalability</li>
122: <li>Reliability</li>
123: <li>Interoperability</li>
124: <li>Maintainability</li>
125: </ul>
126:
127: <h3>Portability</h3>
128: <p>
129: The Prex kernel is divided into two different layers -
130: a common kernel layer and an architecture dependent layer.
131:
132: Any routine in the common kernel layer must not access to the H/W by itself.
133: Instead, it must request it to the architecture dependent layer.
134: The interface to the architecture dependent layer is strictly defined
135: by the Prex kernel. This interface is designed carefully to support various
136: different architecture with minimum code change.
137: So, it is easy to port the Prex kernel to different architecture.
138: </p>
139: <p>
140: The following functions must be provided by the architecture dependent layer.
141: </p>
142: <ul>
143: <li><b>CPU</b>: initializes processor registers before kernel boot</li>
144: <li><b>Context</b>: abstracts processor and hardware context</li>
145: <li><b>MMU</b>: abstracts memory management unit (*)</li>
146: <li><b>Trap</b>: abstracts processor trap</li>
147: <li><b>Interrupt</b>: abstracts the interrupt control unit</li>
148: <li><b>Clock</b>: abstracts clock timer unit</li>
149: <li><b>Misc.</b>: abstracts system reset, idle power state</li>
150: </ul>
151: <p>
152: <i>*) In case of no-MMU system, MMU related routines will be
153: defined as no-operation routine.
154: So, the kernel common layer can not assume MMU is always available.</i>
155: </p>
156:
157: <h3>Scalability</h3>
158: <p>
159: In order to obtain higher scalability, the kernel does not limit the maximum
160: number of the kernel objects to create.
161: So, the resource for all kernel objects are allocated dynamically after
162: system boot.
163: This can keep the memory prerequisite smaller than the static
164: resource allocation.
165: This means that the kernel can create task, thread, object, device,
166: event, mutex, timer as many as usable memory remains.
167: </p>
168: <p>
169: The kernel supports both of MMU and No-MMU systems. So, most of the kernel
170: components and VM sub-system are designed carefully to work without MMU.
171: </p>
172:
173: <h3>Reliability</h3>
174: <p>
175: When the remaining memory is exhausted, what should OS do?
176: If the system can stop with panic() there, the error checks of many
177: portions in the kernel are not necessary.
178: But obviously, this is not allowed with the reliable system.
179: Even if the memory is exhausted,
180: a kernel must continue processing.
181: So, all kernel code is checking the error status returned
182: by the memory allocation routine.
183: </p>
184: <p>
185: In addition, the kernel must not crush anytime even if any invalid parameter
186: is passed via kernel API. Basically, the Prex kernel code is written with
187: "garbage in, error out" principle.
188: The Prex kernel never stops even if any malicious
189: program is loaded.
190: </p>
191:
192: <h3>Interoperability</h3>
193: <p>
194: Although the Prex kernel was written from scratch, its applications
195: will be brought from the other operating systems like BSD. So, the system
196: call interface is designed with consideration to support general OS
197: API like POSIX or APIs for generic RTOS.
198: </p>
199: <p>
200: The error code for the Prex system call is defined as the same name
201: with POSIX. For example, EINVAL for "Invalid argument", or ENOMEM for
202: "Out of memory". So, peoples do not have to study new error codes if
203: they already have skills about POSIX programming.
204: This is important point to write applications and to read
205: the kernel code because study of a new error scheme will cause pain
206: for developers.
207:
208: In addition, it simplify the POSIX emulation library because it
209: does not have to remap the error code.
210: </p>
211:
212: <h3>Maintainability</h3>
213: <p>
214: All kernel codes are kept clean and simple for the maintenance.
215: All codes are well-commented and consistent.
216: It is easy to add or remove new system call into the kernel.
217: The kernel has the debugging facility like the function trace or
218: the dump of the kernel objects.
219: </p>
220:
221:
222: <h2 id="thread">Thread</h2>
223:
224: <h3>Thread Control Block</h3>
225: <p>
226: The thread control block includes data for
227: owner task, scheduler, timer, IPC, exception, mutex, and context.
228: The following thread structure is most important definition
229: in the kernel codes.
230: </p>
231: <pre>
232: struct thread {
233: int magic; /* magic number */
234: task_t task; /* pointer to owner task */
235: struct list task_link; /* link for threads in same task */
236: struct queue link; /* linkage on scheduling queue */
237: int state; /* thread state */
238: int policy; /* scheduling policy */
239: int prio; /* current priority */
1.1.1.1.2.1! nbrk 240: int baseprio; /* base priority */
! 241: int timeleft; /* remaining ticks to run */
! 242: u_int time; /* total running time */
! 243: int resched; /* true if rescheduling is needed */
! 244: int locks; /* schedule lock counter */
! 245: int suscnt; /* suspend counter */
! 246: struct event *slpevt; /* sleep event */
! 247: int slpret; /* sleep result code */
1.1 nbrk 248: struct timer timeout; /* thread timer */
249: struct timer *periodic; /* pointer to periodic timer */
1.1.1.1.2.1! nbrk 250: uint32_t excbits; /* bitmap of pending exceptions */
1.1 nbrk 251: struct queue ipc_link; /* linkage on IPC queue */
1.1.1.1.2.1! nbrk 252: void *msgaddr; /* kernel address of IPC message */
! 253: size_t msgsize; /* size of IPC message */
! 254: thread_t sender; /* thread that sends IPC message */
! 255: thread_t receiver; /* thread that receives IPC message */
! 256: object_t sendobj; /* IPC object sending to */
! 257: object_t recvobj; /* IPC object receiving from */
1.1 nbrk 258: struct list mutexes; /* mutexes locked by this thread */
259: struct mutex *wait_mutex; /* mutex pointer currently waiting */
260: void *kstack; /* base address of kernel stack */
1.1.1.1.2.1! nbrk 261: struct context ctx; /* machine specific context */
1.1 nbrk 262: };</pre>
263:
264: <h3>Thread Creation</h3>
265: New thread can be created by thread_create().
266: The initial states of newly created thread are as follow:
267: <p>
268: </p>
269:
270: <i><b>Table 1. Initial thread state</b></i>
271: <table border="1" width="60%" cellspacing="0">
272: <tbody>
273: <tr>
274: <th>Data type</th>
275: <th>Initial state</th>
276: </tr>
277: <tr>
278: <td>Owner Task</td>
279: <td>Inherit from parent thread</td>
280: </tr>
281: <tr>
282: <td>Thread state</td>
283: <td>Suspended</td>
284: </tr>
285: <tr>
286: <td>Suspend count</td>
287: <td>Task suspend count + 1</td>
288: </tr>
289: <tr>
290: <td>Scheduling policy</td>
291: <td>Round Robin</td>
292: </tr>
293: <tr>
294: <td>Scheduling Priority</td>
295: <td>Default (= 200)</td>
296: </tr>
297: <tr>
298: <td>Time quantum</td>
299: <td>Default (= 50 msec)</td>
300: </tr>
301:
302: <tr>
303: <td>Processor registers</td>
304: <td>Default value</td>
305: </tr>
306:
307: </tbody>
308: </table>
309:
310: <p>
311: Since new thread is initially set to the suspended state, thread_resume()
312: must be called to start it.
313: </p>
314:
315: <p>
316: Creating a thread and loading its register state are isolated
317: in different routines. These two routines are used by fork(), exec(),
318: and pthread_create() in the POSIX emulation library.
319: </p>
320:
321: <i><b>Table 2. Usage of thread_create()/thread_load()</b></i>
322: <table border="1" width="80%" cellspacing="0">
323: <tbody>
324: <tr>
325: <th>Library routine</th>
326: <th>thread_create()</th>
327: <th>thread_load()</th>
328: </tr>
329: <tr>
330: <td>fork()</td>
331: <td align="center">O</td>
332: <td align="center">X</td>
333: </tr>
334: <tr>
335: <td>exec()</td>
336: <td align="center">X</td>
337: <td align="center">O</td>
338: </tr>
339: <tr>
340: <td>pthread_create()</td>
341: <td align="center">O</td>
342: <td align="center">O</td>
343: </tr>
344: </tbody>
345: </table>
346:
347: <h3>Thread Termination</h3>
348: <p>
349: The kernel will usually release all resources owned by the terminated thread.
350: But, there are some complicated process to release the resources.
351: The priority adjustment may be required if the thread inherits its
352: priority.
353: </p>
354: <p>
355: If the thread is terminated with mutex locking, all threads waiting for
356: that mutex does sleep forever. So, the mutex held by the terminated thread
357: must be unlocked, or change its mutex owner if some thread is waiting for.
358:
359: </p>
360: <p>
361: In general, there is a known issue about the thread termination.
362: If the termination target is current thread, the kernel can not release
363: the context of the current thread because the
364: thread switching always requires the current context.
365: There are the following 3 solutions for this.
366: </p>
367: <ol>
368: <li>Create "clean up thread" to terminate thread</li>
369: <li>Add condition check in thread switching code</li>
370: <li>Defer termination in next termination request</li>
371: </ol>
372: The Prex kernel is using #3.
373:
374: <h3>Thread Suspension</h3>
375: <p>
376: Each thread can be set to the suspended state by using thread_suspend()
377: interface.
378: Although a thread can be suspended any number of times,
379: it does not start to run unless it is resumed by the same
380: number of suspend.
381: </p>
382:
383: <h3>Kernel Thread</h3>
384: <p>
385: A kernel thread is always executed in kernel mode, and it does not have user
386: mode context.
387: The scheduling policy is set to SCHED_FIFO by default.
388: </p>
389: <p>
390: Currently, the following kernel threads are running in kernel mode.
391: </p>
392: <ul>
393: <li>Interrupt Service Threads</li>
394: <li>Timer Thread</li>
395: <li>Idle Thread</li>
396: <li>DPC Thread</li>
397: </ul>
398:
399: <h3>Idle Thread</h3>
400: <p>
401: An idle thread runs when no other thread is active.
402: It has the role of cutting down the power consumption of a system.
403: An idle thread has FIFO scheduling policy, and it does not
404: have time quantum.
405: The lowest scheduling priority (=255) is reserved for an idle thread.
406: </p>
407:
408: <h2 id="task">Task</h2>
409:
410: <h3>Task Creation</h3>
411: <p>
412: The task can be created by using task_create().
413: New child task will have the same memory image with the parent task.
414: Especially text region and read-only region are physically
415: shared among them.
416: The parent task receives the new task ID of child task from task_create(), but
417: child task will receive 0 as task ID.
418: </p>
419: <p>
420: The initial task states are as follow:
421: </p>
422:
423: <i><b>Table 3. Initial task state</b></i>
424: <table border="1" width="60%" cellspacing="0">
425: <tbody>
426: <tr>
427: <th>Data type</th>
428: <th>Inherit from parent task?</th>
429: </tr>
430:
431: <tr>
432: <td>Object List</td>
433: <td align="center">No</td>
434: </tr>
435:
436: <tr>
437: <td>Threads</td>
438: <td align="center">No</td>
439: </tr>
440:
441: <tr>
442: <td>Memory Map</td>
443: <td align="center">Yes</td>
444: </tr>
445:
446: <tr>
447: <td>Suspend Count</td>
448: <td align="center">No</td>
449: </tr>
450:
451: <tr>
452: <td>Exception Handler</td>
453: <td align="center">Yes</td>
454: </tr>
455:
456: </tbody>
457: </table>
458:
459: <p>
460: If the parent task is specified as NULL for task_create(),
461: all child state are initialized to default.
462: This is used in exec() emulation.
463: </p>
464:
465: <h3>Task Suspension</h3>
466: <p>
467: When the task is set to suspend state, the thread suspend count of all threads
468: in the task is also incremented.
469: A thread can start to run only when both of the thread suspend count
470: and the task suspend count becomes 0.
471: </p>
472:
473: <h3>Kernel Task</h3>
474: <p>
475: The kernel task is a special task that has only an idle thread
476: and interrupt threads. It does not have any user mode memory.
477: </p>
478:
479: <h2 id="sched">Scheduler</h2>
480: <h3>Thread Priority</h3>
481: <p>
482: The Prex scheduler is based on the algorithm known as priority based
483: multi level queue. Each thread is assigned the priority between
484: 0 and 255. The lower number means higher priority like BSD unix.
485: It maintains 256 level run queues mapped to each priority.
486: The lowest priority (=255) is used only for an idle thread.
487: </p>
488: <p>
489: A thread has two different types of priority:
490: </p>
491: <ul>
492: <li><b>Base priority:</b>
493: This is a static priority which can be changed only by user mode program.
494: <li><b>Current Priority:</b>
495: An actual scheduling priority.
496: A kernel may adjust this priority dynamically if it's needed.
497: </ul>
498: <p>
499: Although the base priority and the current priority are same value in almost
500: conditions,
501: kernel will sometimes change the current priority to avoid
502: "priority inversion".
503: </p>
504:
505: <h3>Thread State</h3>
506: <p>
507: Each thread has one of the following states.
508: </p>
1.1.1.1.2.1! nbrk 509: <p>
! 510: <img alt="Memory Structure" src="img/thread.gif" border="1"
! 511: style="width: 430px; height: 314px;"><br>
! 512:
! 513: <i><b>Figure 2. Thread States</b></i>
! 514: </p>
1.1 nbrk 515:
516: <ul>
517: <li><b>RUN</b> :Running or ready to run</li>
518: <li><b>SLEEP</b> :Sleep for some event</li>
519: <li><b>SUSPEND</b> :Suspend count is not 0</li>
520: <li><b>EXIT</b> :Terminated</li>
521: </ul>
522: <p>
523: The thread is always preemptive even in kernel mode.
524: There are following 4 events to switch thread:
525: </p>
526:
527: <i><b>Table 4. Events to switch thread</b></i>
528: <table border="1" width="90%" cellspacing="0">
529: <tbody>
530: <tr>
531: <th>Event</th>
532: <th>Condition</th>
533: <th>Run queue position</th>
534: </tr>
535: <tr>
536: <td><b>Block</b></td>
537: <td>Thread sleep or suspend</td>
538: <td>Move to the tail of runq</td>
539: </tr>
540: <tr>
541: <td><b>Preemption</b></td>
542: <td>Higher priority thread becomes runnable</td>
543: <td>Keep the head of runq</td>
544: </tr>
545: <tr>
546: <td><b>Quantum Expiration</b></td>
547: <td>The thread consumes its time quantum</td>
548: <td>Move to the tail of runq</td>
549: </tr>
550: <tr>
551: <td><b>Yield</b></td>
552: <td>The thread releases CPU by itself</td>
553: <td>Move to the tail of runq</td>
554: </tr>
555:
556: </tbody>
557: </table>
558:
1.1.1.1.2.1! nbrk 559:
1.1 nbrk 560: <h3>Scheduling Policy</h3>
561: <p>
562: There are following three types of scheduling policy.
563: </p>
564: <ul>
565: <li><b>SCHED_FIFO</b>: First-in First-out
566: <li><b>SCHED_RR</b>: Round Robin (SCHED_FIFO + timeslice)
567: <li><b>SCHED_OTHER</b>: Not supported
568: </ul>
569: <p>
570: In early Prex development phase, SCHED_OTHER was implemented as a traditional
571: BSD scheduler. Since this scheduler changes the thread priority dynamically,
572: it is unpredictable and does not fit the real-time system.
573: Recently, SCHED_OTHER policy was dropped from Prex to
574: focus on real-time platform.
575: </p>
576:
577: <h3>Scheduling Parameter</h3>
578: <p>
579: An application program can change the following scheduling parameters via
580: kernel API.
581: </p>
582: <ul>
583: <li>Thread Priority</li>
584: <li>Scheduling Policy</li>
585: <li>Time Quantum (only for SCHED_RR)</li>
586: </ul>
587:
588: <h3>Scheduling Lock</h3>
589: <p>
590: The thread scheduling can be disabled by locking the scheduler.
591: This is used to synchronize the thread execution to protect the
592: access to the global resources.
593: Since any interrupt handler can work while scheduling lock state,
594: it does not affect to the interrupt latency.
595: </p>
596:
597: <h3>Named Event</h3>
598: <p>
599: The thread can sleep/wakeup for the specific event. The event works as
600: the queue of the sleeping threads. Since each event has its name,
601: it is easy to know which event the debugee is waiting for.
602: </p>
603:
604:
605: <h2 id="memory">Memory Management</h2>
606:
607: <h3>Physical Page Allocator</h3>
608: <p>The physical page allocator provides the service for
609: page allocation/deallocation/reservation.
610: It works on the bottom layer for other memory managers.
611: </p>
612: <p>
613: <img alt="Memory Structure" src="img/memory.gif" border="1"
614: style="width: 448px; height: 308px;"><br>
615:
1.1.1.1.2.1! nbrk 616: <i><b>Figure 3. Prex Memory Structure</b></i>
1.1 nbrk 617: </p>
618: <p>
619: The key point is that Prex kernel does not page out to
620: any external disk devices. This is an important design point to get
621: real-time performance and system simplicity.
622:
623: </p>
624:
625: <h3>Kernel Memory Allocator</h3>
626: <p>
627: The kernel memory allocator is optimized for the small
628: memory foot print system.
629: </p>
630: <p>
631: To allocate kernel memory, it is necessary to divide one page into
632: two or more blocks.
633: There are following 3 linked lists to manage used/free blocks for kernel
634: memory.
635: <ol>
636: <li>All pages allocated for the kernel memory are linked.</li>
637: <li>All blocks divided in the same page are linked.</li>
638: <li>All free blocks of the same size are linked.</li>
639: </ol>
640: <p>
641: Currently, it can not handle the memory size exceeding one page.
642: Instead, a driver can use page_alloc() to allocate large memory.
643: <br>
644: When the kernel code illegally writes data into non-allocated memory,
645: the system will crash easily. The kmem modules are called from
646: not only kernel code but from various drivers. In order to
647: check the memory over run, each free block has a tag with magic ID.
648: </p>
649: <p>
650: The kernel maintains the array of the block headers for the free blocks.
651: The index of an array is decided by the size of each block.
652: All block has the size of the multiple of 16.
653: </p>
654: <pre>free_blks[0] = list for 16 byte block
655: free_blks[1] = list for 32 byte block
656: free_blks[2] = list for 48 byte block
657: .
658: .
659: free_blks[255] = list for 4096 byte block
660: </pre>
661: <p>
662: In generic design, only one list is used to search the free block
663: for a first fit algorithm.
664: However, the Prex kernel memory allocator is using multiple lists
665: corresponding to each block size.
666: A search is started from the list of the requested size. So,
667: it is not necessary to search smaller block's list wastefully.
668: </p>
669: <p>
670: In most of the "buddy" based memory allocators, their algorithm are
671: using <b>2^n</b> bytes as block size.
672: But, this logic will throw away much memory in case
673: the block size is not fit. So, this is not suitable for the
674: embedded systems that Prex aims to.
675: </p>
676:
677: <h3>Virtual Memory Manager</h3>
678: <p>
679: A task owns its private virtual address space. All threads
680: in a same task share one memory space.
681: When new task is made, the address map of the parent task will be
682: automatically copied.
683: In this time, the read-only space is not copied and is shared with old map.
684: </p>
685: <p>
686: The kernel provide the following functions for VM:
687: </p>
688: <ul>
689: <li>Allocate/deallocate memory region</li>
690: <li>Change memory attribute (read/write/exec)</li>
691: <li>Map another task's memory to current task</li>
692: </ul>
693: <p>
694: The VM allocator is using the traditional list-based algorithm.
695: </p>
696: <p>
697: The kernel task is a special task which has the virtual memory mapping
698: for kernel. All other user mode tasks will have the same kernel memory
699: image mapped from the kernel task. So, kernel threads can work with the
700: all user mode task context without switching memory map.
701: </p>
702: <p>
703: <img alt="Memory Mapping" src="img/memmap.gif" border="1"
704: style="width: 504px; height: 271px;"><br>
705:
1.1.1.1.2.1! nbrk 706: <i><b>Figure 4. Kernel Memory Mapping</b></i>
1.1 nbrk 707: </p>
708:
709: <p>
710: Since the Prex kernel does not do page out to an external storage,
711: it is guaranteed that the allocated memory is always continuing
712: and existing. Thereby, a kernel and drivers can be constructed
713: very simply.
714: </p>
715: <p>
716: <i>Note: "Copy-on-write" feature was supported with the Prex kernel before.
717: But, it was dropped to increase the real-time performance.</i>
718: </p>
719:
720:
721: <h2 id="ipc">IPC</h2>
722: <p>
723: The message passing model of Prex is very simple compared with other
724: modern microkernels. The Prex message is sent to the "object" from thread
725: to thread.
726: The "object" in Prex is similar concept that is called as "port"
727: in other microkernel.
728: </p>
729:
730: <h3>Object</h3>
731: <p>
732: An object represents service, state, or policies etc.
733: For object manipulation, kernel provide 3 functions:
734: object_create(), object_delete(), object_lookup().
735: Prex task will create an object to publish its interface to other tasks.
736: For example, server tasks will create objects like "proc", "fs", "exec" to
737: allow clients to access their service.
738: And then, client tasks will send a request message to these objects
739: </p>
740: <p>
741: An actual object is stored in kernel space, and it is protected
742: from user mode code.
743: Each object data is managed with the hash table by using its name string.
744: Usually, an object has a unique name within a system. To send a
745: message to the specific object, it must obtain the target object ID
746: by looking up by the name.
747: </p>
748: <p>
749: An object can be created without its name. These objects can be used as
750: private objects for threads in same task.
751: </p>
752:
753: <h3>Message</h3>
754: <p>
755: Each IPC message must have the pre-defined message header in it.
756: The kernel will store the sender task's ID into the message header.
757: This mechanism ensures the receiver task can get the exact task ID
758: of the sender task. Therefore, receiver task can check the sender
759: task's capability for various secure services.
760: </p>
761: <p>
762: It is necessary to recognize the pre-defined message format between
763: sender and receiver.
764: </p>
765: <p>
766: Messages are sent to the specific object using msg_send().
767: The transmission of a message is always synchronous. This means that
768: the thread which sent the message is blocked until it receives
769: a response from another threads. msg_receive() performs reception
770: of a message. msg_receive() is also blocked when no message is
771: reached to the target object. The receiver thread must answer the
772: message using msg_reply() after it finishes processing.
773: </p>
774: <p>
775: The receiver thread can not receive another message until it
776: replies to the sender. In short, a thread can receive only one
777: message at once. Once the thread receives message, it can send
778: another message to different object. This mechanism allows threads
779: to redirect the sender's request to another thread.
780: </p>
781: <p>
782: A thread can receive a message from the specific object which is
783: created by itself or thread in same task.
784: If the message has not arrived, it
785: blocks until any message comes in. The following figure shows the IPC transmit
786: sequence of Prex.
787: </p>
788:
789: <img alt="ipc queue" src="img/msg.gif" border="1"
790: style="width: 505px; height: 347px;"><br>
791:
1.1.1.1.2.1! nbrk 792: <i><b>Figure 5. IPC Transmit Sequence</b></i>
1.1 nbrk 793:
794: <h3>Message Transfer</h3>
795: <p>
796: The message is copied to task to task directly without kernel
797: buffering. The memory region of sent message is
798: automatically mapped to the receiver's memory within kernel.
799: This mechanism allows to reduce the number of copy time while message
800: translation.
801: Since there is no page out of memory in Prex, we can
802: copy the message data via physical memory at anytime.
803: </p>
804: <img alt="Message transfer" src="img/ipcmap.gif" border="1"
805: style="width: 459px; height: 321px;"><br>
806:
1.1.1.1.2.1! nbrk 807: <i><b>Figure 6. IPC message transfer</b></i>
1.1 nbrk 808:
809: <h2 id="except">Exception Handling</h2>
810: <p>
811: A user mode task can specify its own exception handler with exception_setup().
812: There are two different types of exception.
813: </p>
814: <ul>
815: <li><b>H/W exception</b>:
816: This type of exception is caused by H/W trap & fault. The exception
817: will be sent to the thread which caused the trap.
818: If no exception handler is specified by the task, it will be
819: terminated by kernel.</li>
820:
821: <li><b>S/W exception</b>:
822: The user mode task can send S/W exception to another task by exception_raise().
823: The exception
824: will be sent to the thread that is sleeping with exception_wait().
825: If no thread is waiting for the exception, the exception is sent
826: to the first thread in the target task.</li>
827: </ul>
828: <p>
829: Kernel supports 32 types of exception.
830: The following pre-defined exceptions are raised by kernel itself.
831: </p>
832:
833: <i><b>Table 5. Kernel exceptions</b></i>
834: <table border="1" width="80%" cellspacing="0">
835: <tbody>
836: <tr>
837: <th>Exception</th>
838: <th>Type</th>
839: <th>Reason</th>
840: </tr>
841: <tr>
842: <td>SIGILL</td>
843: <td align="center">H/W</td>
844: <td>Illegal instruction</td>
845: </tr>
846: <tr>
847: <td>SIGTRAP</td>
848: <td align="center">H/W</td>
849: <td>Break point</td>
850: </tr>
851: <tr>
852: <td>SIGFPE</td>
853: <td align="center">H/W</td>
854: <td>Math error</td>
855: </tr>
856: <tr>
857: <td>SIGSEGV</td>
858: <td align="center">H/W</td>
859: <td>Invalid memory access</td>
860: </tr>
861: <tr>
862: <td>SIGALRM</td>
863: <td align="center">S/W</td>
864: <td>Alarm event</td>
865: </tr>
866: </tbody>
867: </table>
868:
869: <p>
870: POSIX emulation library will setup its own exception handler to convert
871: the Prex exceptions into UNIX signals. It will maintain its own signal mask.
872: And, it transfer control to the actual POSIX signal handler that is
873: defined by the user mode process.
874:
875: </p>
876:
877: <h2 id="int"> Interrupt Framework</h2>
878: <p>
879: Prex defines two different types of interrupt service to
880: optimize the response time of real-time operation.
881: </p>
882:
883: <h3>Interrupt Service Routine (ISR)</h3>
884: <p>
885: ISR is started by an actual hardware interrupt. The associated
886: interrupt is disabled in ICU and CPU interrupt is enabled
887: while it runs. If ISR determines that its device generates
888: the interrupt, ISR must program the device to stop the interrupt.
889: Then, ISR should do minimum I/O operation and return control
890: as quickly as possible.
891: ISR will run within the context of current running thread at
892: interrupt time. So, only few kernel services are available within
893: ISR. IRQ_ASSERT() macro can be used to detect the invalid
894: function call from ISR.
895: </p>
896:
897: <h3>Interrupt Service Thread (IST)</h3>
898: <p>
899: IST is automatically activated if ISR returns INT_CONTINUE to kernel. It
900: will be called when the system enters safer condition than ISR.
901: Any interrupt driven I/O operation should be done in IST not ISR.
902: Since ISR for same IRQ may be run during IST, the shared data,
903: resources, and device registers must be synchronized by using
904: irq_lock().
905: IST does not have to be reentrant, since it is not interrupted
906: by same IST itself.
907: </p>
908:
909: <h3>Interrupt Nesting & Priority</h3>
910: <p>
911: Each ISR has its logical priority level, with 0 being the highest
912: priority. While one ISR is running, all lower priority interrupts
913: are masked off.
914: This interrupt nesting mechanism avoids delaying of
915: high priority interrupt events.
916: </p>
917: <p>
918: IST is executed as a normal thread dispatched by the scheduler. So,
919: the interrupt thread which has higher priority is executed first.
920: The driver writer can specify the thread priority of IST when
921: IST is attached to the specific interrupt line.
922: The important point is that even a user mode task can be performed
923: prior to an interrupt thread.
924: </p>
925: <p>
926: The following figure is the sample of the Prex interrupt processing.
927: </p>
928: <p>
929: <img alt="Interrupt Processing" src="img/irq.gif" border="1"><br>
1.1.1.1.2.1! nbrk 930: <i><b>Figure 7. Prex Interrupt Processing</b></i>
1.1 nbrk 931: </p>
932: <p>
933: </p>
934:
935: <h3>Interrupt Locking</h3>
936: <p>
937: irq_lock() & irq_unlock() are used to disable all interrupts in
938: order to synchronize the access to the kernel or H/W resource. Since irq_lock()
939: increments a lock counter, irq_unlock() will automatically restore
940: to the original interrupt state when locking count becomes 0.
941: So the caller does not have to save the previous interrupt state.
942: </p>
943:
944: <h3>Interrupt Stack</h3>
945: <p>
946: If each ISR uses the kernel stack of the current running thread, the
947: stack area may be over-flow when continuous interrupts are occurred at
948: one same thread. So the kernel stack will be switched to the dedicated stack
949: while ISR is running.
950: </p>
951:
952:
953: <h2 id="timer">Timer</h2>
954:
955: <h3>Kernel Timers</h3>
956:
957: <p>
958: The kernel timer provides the following feature.
959: </p>
960: <ul>
961: <li><b>Sleep timer</b>: Put caller thread to sleep in the specified time.</li>
962: <li><b>Call back timer</b>: Call the routine after specified time passes.</li>
963: <li><b>Periodic timer</b>: Call the routine at the specified interval.</li>
964: </ul>
965:
966: <h3>Timer Jitter</h3>
967:
968: <p>
969: The periodic timer is designed to minimize the deviation between desired and
970: actual expiration.
971: </p>
972:
973: <h2 id="device">Device I/O Service</h2>
974:
975: The Prex device driver module is separated from the kernel, and this module
976: is linked with the kernel at the boot time.
977: The kernel provides only simple and minimum services to help the
978: communication between applications and drivers.
979:
980: <h3>Device Object</h3>
981: <p>
982: Since the Prex kernel does not have the file system in it, the kernel
983: provides a device object service for I/O interface.
984: The device object is created by the device driver to communicate to the
985: application. Usually, the driver creates a device object for an existing
986: physical device. But, it can be used to handle logical or virtual devices.
987: </p>
988:
989: <h3>Driver Interface</h3>
990: <p>
991: The interface between kernel and drivers are defined clearly as
992: "Driver Kernel Interface". The kernel provides the following services
993: for device drivers.
994: </p>
995: <ul>
996: <li>Device object service</li>
997: <li>Kernel memory allocation</li>
998: <li>Physical page allocation</li>
999: <li>Interrupt handling service</li>
1000: <li>Scheduler service</li>
1001: <li>Timer service</li>
1002: <li>Debug service</li>
1003: </ul>
1004:
1005: <h3>Application Interface</h3>
1006: <p>
1007: The kernel device I/O interface are provided to access the specific device
1008: object which is handled by a driver.
1009: The Prex kernel provides the following 5 functions for applications.
1010: </p>
1011: <ul>
1012: <li>Open a device</li>
1013: <li>Close a device</li>
1014: <li>Read from a device</li>
1015: <li>Write to a device</li>
1016: <li>Device I/O control</li>
1017: </ul>
1018:
1019: <h2 id="mutex">Mutex</h2>
1020: <h3>Priority Inheritance</h3>
1021: <p>
1022: The thread priority is automatically changed at one of the following conditions.
1023: </p>
1024: <ol>
1025: <li>
1026: When the current thread fails to lock the mutex and the mutex
1027: owner has lower priority than current thread, the priority
1028: of mutex owner is boosted to the current priority.
1029: If this mutex owner is waiting for another mutex, such related
1030: mutexes are also processed.
1031: </li>
1032: <li>
1033: When the current thread unlocks the mutex and its priority
1034: has already been boosted, kernel recomputes the current priority.
1035: In this case, the priority is set to the highest
1036: priority among the threads waiting for the mutexes locked by the
1037: current thread.
1038: </li>
1039: <li>
1040: When the priority is changed by the user request, the related(inherited)
1041: thread's priority is also changed.
1042: </li>
1043: </ol>
1044: <p>
1045: There are following limitations about priority inheritance
1046: with Prex mutex.
1047: </p>
1048: <ol>
1049: <li>
1050: If the priority is changed by the user request, the priority
1051: recomputation is done only when the new priority is higher
1052: than old priority. The inherited priority is reset to base
1053: priority when the mutex is unlocked.
1054: </li>
1055: <li>
1056: Even if thread is killed with mutex waiting, the related
1057: priority is not adjusted.
1058: </li>
1059: </ol>
1060: <h2 id="debug">Debug</h2>
1061: There are following debugging support functions:
1062: <ul>
1.1.1.1.2.1! nbrk 1063: <li>printf(): Display the debug message in kernel.</li>
1.1 nbrk 1064: <li>panic(): Dump processor registers and stop system.</li>
1065: <li>ASSERT(): If expression is false (zero), stop system and display information.</li>
1066: <li>trace_on(): If the kernel trace is enabled, all entry/exit of functions
1067: are logged.
1068: </ul>
1069: </tr>
1070: <tr>
1071: <td id="footer" colspan="2" style="vertical-align: top;">
1072: <a href="http://sourceforge.net">
1073: <img src="http://sourceforge.net/sflogo.php?group_id=132028&type=1"
1074: alt="SourceForge.net Logo" border="0" height="31" width="88"></a><br>
1075: Copyright© 2005-2007 Kohsuke Ohtani
1076: </td>
1077: </tr>
1078:
1079: </tbody>
1080: </table>
1081:
1082: </div>
1083: <div id="bottom"></div>
1084:
1085: </body>
1086: </html>
CVSweb