[BACK]Return to kernel.html CVS log [TXT][DIR] Up to [local] / prex-old / doc / html / doc

Annotation of prex-old/doc/html/doc/kernel.html, Revision 1.1.1.1

1.1       nbrk        1: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
                      2: <html>
                      3: <head>
                      4:   <title>Prex Kernel Internals</title>
                      5:   <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
                      6:   <meta name="keywords" content="Prex, embedded, real-time, operating system, RTOS, open source, free">
                      7:   <meta name="author" content="Kohsuke Ohtani">
                      8:   <link rel="stylesheet" type="text/css" href="../default.css" media="screen">
                      9:   <link rel="stylesheet" type="text/css" href="../print.css" media="print">
                     10: </head>
                     11: <body>
                     12: <div id="top">
                     13: </div>
                     14: <div id="middle">
                     15: 
                     16: <table id="content" cellpadding="0" cellspacing="0">
                     17:   <tbody>
                     18: 
                     19:     <tr>
                     20:       <td id="header" colspan="2" valign="top">
                     21:         <table width="100%" border="0" cellspacing="0" cellpadding="0">
                     22:         <tr>
                     23:           <td id="logo">
                     24:             <a href="http://prex.sourceforge.net/">
                     25:             <img alt="Prex logo" src="../img/logo.gif" border="0"
                     26:             style="width: 281px; height: 56px;"></a>
                     27:           </td>
                     28:           <td id="brief" align="right" valign="bottom">
                     29:             An Open Source, Royalty-free,<br>
                     30:            Real-time Operating System
                     31:           </td>
                     32:         </tr>
                     33:         </table>
                     34:       </td>
                     35:     </tr>
                     36: 
                     37:     <tr>
                     38:       <td id="directory" style="vertical-align: top;">
                     39:       <a href="http://prex.sourceforge.net/">Prex Home</a> >
                     40:       <a href="index.html">Document Index</a> >
                     41:       Kernel Internals
                     42:     </tr>
                     43:     <tr><td class="pad" colspan="2" style="vertical-align: top;"></td></tr>
                     44: 
                     45:     <tr>
                     46:       <td id="main" style="vertical-align: top;">
                     47:       <h1>Prex Kernel Internals</h1>
                     48: 
                     49: <i>Version 1.4, 2005/12/31</i>
                     50: 
                     51: <h3>Table of Contents</h3>
                     52: <ul>
                     53:   <li><a href="#over">Kernel Overview</a></li>
                     54:   <li><a href="#design">Design Policy</a></li>
                     55:   <li><a href="#thread">Thread</a></li>
                     56:   <li><a href="#task">Task</a></li>
                     57:   <li><a href="#sched">Scheduler</a></li>
                     58:   <li><a href="#memory">Memory Management</a></li>
                     59:   <li><a href="#ipc">IPC</a></li>
                     60:   <li><a href="#except">Exception Handling</a></li>
                     61:   <li><a href="#int">Interrupt Framework</a></li>
                     62:   <li><a href="#timer">Timer</a></li>
                     63:   <li><a href="#device">Device I/O Service</a></li>
                     64:   <li><a href="#mutex">Mutex</a></li>
                     65:   <li><a href="#debug">Debug</a></li>
                     66: </ul>
                     67: <br>
                     68: 
                     69: <h2 id="over">Kernel Overview</h2>
                     70: 
                     71: <h3>Kernel Structure</h3>
                     72: <p>
                     73: The following figure illustrates the Prex kernel structure.
                     74: </p>
                     75: <p>
                     76: <img alt="Kernel Components" src="img/kernel.gif" border="1"
                     77: style="width: 602px; height: 414px;"><br>
                     78: 
                     79: <i><b>Figure 1. Prex kernel Structure</b></i>
                     80: </p>
                     81: <p>
                     82: A kernel object belongs in one of the following groups.
                     83: </p>
                     84: <ul>
                     85:   <li><b>kern</b>: kernel core components</li>
                     86:   <li><b>mem</b>: memory managers</li>
                     87:   <li><b>ipc</b>: inter process communication (*)</li>
                     88:   <li><b>sync</b>: synchronize objects</li>
                     89:   <li><b>arch</b>: architecture dependent components</li>
                     90: </ul>
                     91: <p>
                     92: <i>*) Since all messages in Prex are transferred among threads, the name of
                     93: "IPC" is not appropriate.
                     94: However, "IPC" is still used as a general term of the message transfer
                     95: via the kernel, in Prex.</i>
                     96: </p>
                     97: 
                     98: <h3>Naming Convention</h3>
                     99: <p>
                    100: The name of "group/object" in figure 1 is mapped to "directory/file" in the
                    101: Prex source tree. For example, the thread related functions are located
                    102: in "kern/thread.c", and the functions for semaphore are placed
                    103: in "sync/sem.c".
                    104: </p>
                    105: <p>
                    106: In addition, there is a standard naming convention about kernel
                    107: routines. The method named <i>bar</i> for the object named <i>foo</i>
                    108: should be named "foo_bar". For example, the routine to create a new 
                    109: thread is named "thread_create", and locking mutex will be "mutex_lock".
                    110: This rule is not applied to the name of the local function.
                    111: </p>
                    112: 
                    113: 
                    114: <h2 id="design">Design Policy</h2>
                    115: <p>
                    116: The Prex kernel focuses the following points to be designed.
                    117: </p>
                    118: <ul>
                    119:   <li>Portability</li>
                    120:   <li>Scalability</li>
                    121:   <li>Reliability</li>
                    122:   <li>Interoperability</li>
                    123:   <li>Maintainability</li>
                    124: </ul>
                    125: 
                    126: <h3>Portability</h3>
                    127: <p>
                    128: The Prex kernel is divided into two different layers -
                    129: a common kernel layer and an architecture dependent layer.
                    130: 
                    131: Any routine in the common kernel layer must not access to the H/W by itself.
                    132: Instead, it must request it to the architecture dependent layer.
                    133: The interface to the architecture dependent layer is strictly defined
                    134: by the Prex kernel. This interface is designed carefully to support various
                    135: different architecture with minimum code change.
                    136: So, it is easy to port the Prex kernel to different architecture.
                    137: </p>
                    138: <p>
                    139: The following functions must be provided by the architecture dependent layer.
                    140: </p>
                    141: <ul>
                    142:   <li><b>CPU</b>: initializes processor registers before kernel boot</li>
                    143:   <li><b>Context</b>: abstracts processor and hardware context</li>
                    144:   <li><b>MMU</b>: abstracts memory management unit (*)</li>
                    145:   <li><b>Trap</b>: abstracts processor trap</li>
                    146:   <li><b>Interrupt</b>: abstracts the interrupt control unit</li>
                    147:   <li><b>Clock</b>: abstracts clock timer unit</li>
                    148:   <li><b>Misc.</b>: abstracts system reset, idle power state</li>
                    149: </ul>
                    150: <p>
                    151: <i>*) In case of no-MMU system, MMU related routines will be
                    152: defined as no-operation routine.
                    153: So, the kernel common layer can not assume MMU is always available.</i>
                    154: </p>
                    155: 
                    156: <h3>Scalability</h3>
                    157: <p>
                    158: In order to obtain higher scalability, the kernel does not limit the maximum
                    159: number of the kernel objects to create.
                    160: So, the resource for all kernel objects are allocated dynamically after
                    161: system boot.
                    162: This can keep the memory prerequisite smaller than the static
                    163: resource allocation.
                    164: This means that the kernel can create task, thread, object, device,
                    165: event, mutex, timer as many as usable memory remains.
                    166: </p>
                    167: <p>
                    168: The kernel supports both of MMU and No-MMU systems. So, most of the kernel
                    169: components and VM sub-system are designed carefully to work without MMU.
                    170: </p>
                    171: 
                    172: <h3>Reliability</h3>
                    173: <p>
                    174: When the remaining memory is exhausted, what should OS do?
                    175: If the system can stop with panic() there, the error checks of many
                    176: portions in the kernel are not necessary.
                    177: But obviously, this is not allowed with the reliable system.
                    178: Even if the memory is exhausted,
                    179: a kernel must continue processing.
                    180: So, all kernel code is checking the error status returned
                    181: by the memory allocation routine.
                    182: </p>
                    183: <p>
                    184: In addition, the kernel must not crush anytime even if any invalid parameter
                    185: is passed via kernel API. Basically, the Prex kernel code is written with
                    186: "garbage in, error out" principle.
                    187: The Prex kernel never stops even if any malicious
                    188: program is loaded.
                    189: </p>
                    190: 
                    191: <h3>Interoperability</h3>
                    192: <p>
                    193: Although the Prex kernel was written from scratch, its applications
                    194: will be brought from the other operating systems like BSD. So, the system 
                    195: call interface is designed with consideration to support general OS
                    196: API like POSIX or APIs for generic RTOS.
                    197: </p>
                    198: <p>
                    199: The error code for the Prex system call is defined as the same name 
                    200: with POSIX. For example, EINVAL for "Invalid argument", or ENOMEM for
                    201: "Out of memory". So, peoples do not have to study new error codes if
                    202: they already have skills about POSIX programming.
                    203: This is important point to write applications and to read
                    204: the kernel code because study of a new error scheme will cause pain
                    205: for developers.
                    206: 
                    207: In addition, it simplify the POSIX emulation library because it
                    208: does not have to remap the error code.
                    209: </p>
                    210: 
                    211: <h3>Maintainability</h3>
                    212: <p>
                    213: All kernel codes are kept clean and simple for the maintenance.
                    214: All codes are well-commented and consistent.
                    215: It is easy to add or remove new system call into the kernel.
                    216: The kernel has the debugging facility like the function trace or 
                    217: the dump of the kernel objects.
                    218: </p>
                    219: 
                    220: 
                    221: <h2 id="thread">Thread</h2>
                    222: 
                    223: <h3>Thread Control Block</h3>
                    224: <p>
                    225: The thread control block includes data for
                    226: owner task, scheduler, timer, IPC, exception, mutex, and context.
                    227: The following thread structure is most important definition
                    228: in the kernel codes.
                    229: </p>
                    230: <pre>
                    231: struct thread {
                    232:         int             magic;          /* magic number */
                    233:         task_t          task;           /* pointer to owner task */
                    234:         struct list     task_link;      /* link for threads in same task */
                    235:         struct queue    link;           /* linkage on scheduling queue */
                    236:         int             state;          /* thread state */
                    237:         int             policy;         /* scheduling policy */
                    238:         int             prio;           /* current priority */
                    239:         int             base_prio;      /* base priority */
                    240:         int             ticks_left;     /* remaining ticks to run */
                    241:         u_int           total_ticks;    /* total running ticks */
                    242:         int             need_resched;   /* true if rescheduling is needed */
                    243:         int             lock_count;     /* schedule lock counter */
                    244:         int             suspend_count;  /* suspend counter */
                    245:         struct event    *sleep_event;   /* sleep event */
                    246:         int             sleep_result;   /* sleep result */
                    247:         struct timer    timeout;        /* thread timer */
                    248:         struct timer    *periodic;      /* pointer to periodic timer */
                    249:         struct queue    ipc_link;       /* linkage on IPC queue */
                    250:         void            *msg_addr;      /* kernel address of IPC message */
                    251:         size_t          msg_size;       /* size of IPC message */
                    252:         struct thread   *sender;        /* thread that sends IPC message */
                    253:         struct thread   *receiver;      /* thread that receives IPC message */
                    254:         object_t        send_obj;       /* IPC object sending to */
                    255:         object_t        recv_obj;       /* IPC object receiving from */
                    256:         uint32_t        exc_bitmap;     /* bitmap of pending exceptions */
                    257:         struct list     mutexes;        /* mutexes locked by this thread */
                    258:         struct mutex    *wait_mutex;    /* mutex pointer currently waiting */
                    259:         void            *kstack;        /* base address of kernel stack */
                    260:         struct context  context;        /* machine specific context */
                    261: };</pre>
                    262: 
                    263: <h3>Thread Creation</h3>
                    264: New thread can be created by thread_create().
                    265: The initial states of newly created thread are as follow:
                    266: <p>
                    267: </p>
                    268: 
                    269: <i><b>Table 1. Initial thread state</b></i>
                    270: <table border="1" width="60%" cellspacing="0">
                    271: <tbody>
                    272: <tr>
                    273:   <th>Data type</th>
                    274:   <th>Initial state</th>
                    275: </tr>
                    276: <tr>
                    277:   <td>Owner Task</td>
                    278:   <td>Inherit from parent thread</td>
                    279: </tr>
                    280: <tr>
                    281:   <td>Thread state</td>
                    282:   <td>Suspended</td>
                    283: </tr>
                    284: <tr>
                    285:   <td>Suspend count</td>
                    286:   <td>Task suspend count + 1</td>
                    287: </tr>
                    288: <tr>
                    289:   <td>Scheduling policy</td>
                    290:   <td>Round Robin</td>
                    291: </tr>
                    292: <tr>
                    293:   <td>Scheduling Priority</td>
                    294:   <td>Default (= 200)</td>
                    295: </tr>
                    296: <tr>
                    297:   <td>Time quantum</td>
                    298:   <td>Default (= 50 msec)</td>
                    299: </tr>
                    300: 
                    301: <tr>
                    302:   <td>Processor registers</td>
                    303:   <td>Default value</td>
                    304: </tr>
                    305: 
                    306: </tbody>
                    307: </table>
                    308: 
                    309: <p>
                    310: Since new thread is initially set to the suspended state, thread_resume()
                    311: must be called to start it.
                    312: </p>
                    313: 
                    314: <p>
                    315: Creating a thread and loading its register state are isolated
                    316: in different routines. These two routines are used by fork(), exec(),
                    317: and pthread_create() in the POSIX emulation library.
                    318: </p>
                    319: 
                    320: <i><b>Table 2. Usage of thread_create()/thread_load()</b></i>
                    321: <table border="1" width="80%" cellspacing="0">
                    322: <tbody>
                    323: <tr>
                    324:   <th>Library routine</th>
                    325:   <th>thread_create()</th>
                    326:   <th>thread_load()</th>
                    327: </tr>
                    328: <tr>
                    329:   <td>fork()</td>
                    330:   <td align="center">O</td>
                    331:   <td align="center">X</td>
                    332: </tr>
                    333: <tr>
                    334:   <td>exec()</td>
                    335:   <td align="center">X</td>
                    336:   <td align="center">O</td>
                    337: </tr>
                    338: <tr>
                    339:   <td>pthread_create()</td>
                    340:   <td align="center">O</td>
                    341:   <td align="center">O</td>
                    342: </tr>
                    343: </tbody>
                    344: </table>
                    345: 
                    346: <h3>Thread Termination</h3>
                    347: <p>
                    348: The kernel will usually release all resources owned by the terminated thread.
                    349: But, there are some complicated process to release the resources.
                    350: The priority adjustment may be required if the thread inherits its
                    351: priority.
                    352: </p>
                    353: <p>
                    354: If the thread is terminated with mutex locking, all threads waiting for
                    355: that mutex does sleep forever. So, the mutex held by the terminated thread
                    356: must be unlocked, or change its mutex owner if some thread is waiting for.
                    357: 
                    358: </p>
                    359: <p>
                    360: In general, there is a known issue about the thread termination.
                    361: If the termination target is current thread, the kernel can not release
                    362: the context of the current thread because the
                    363: thread switching always requires the current context.
                    364: There are the following 3 solutions for this.
                    365: </p>
                    366: <ol>
                    367:  <li>Create "clean up thread" to terminate thread</li>
                    368:  <li>Add condition check in thread switching code</li>
                    369:  <li>Defer termination in next termination request</li>
                    370: </ol>
                    371: The Prex kernel is using #3.
                    372: 
                    373: <h3>Thread Suspension</h3>
                    374: <p>
                    375: Each thread can be set to the suspended state by using thread_suspend()
                    376: interface.
                    377: Although a thread can be suspended any number of times,
                    378: it does not start to run unless it is resumed by the same
                    379: number of suspend.
                    380: </p>
                    381: 
                    382: <h3>Kernel Thread</h3>
                    383: <p>
                    384: A kernel thread is always executed in kernel mode, and it does not have user
                    385: mode context.
                    386: The scheduling policy is set to SCHED_FIFO by default.
                    387: </p>
                    388: <p>
                    389: Currently, the following kernel threads are running in kernel mode.
                    390: </p>
                    391: <ul>
                    392:   <li>Interrupt Service Threads</li>
                    393:   <li>Timer Thread</li>
                    394:   <li>Idle Thread</li>
                    395:   <li>DPC Thread</li>
                    396: </ul>
                    397: 
                    398: <h3>Idle Thread</h3>
                    399: <p>
                    400: An idle thread runs when no other thread is active.
                    401: It has the role of cutting down the power consumption of a system.
                    402: An idle thread has FIFO scheduling policy, and it does not
                    403: have time quantum.
                    404: The lowest scheduling priority (=255) is reserved for an idle thread.
                    405: </p>
                    406: 
                    407: <h2 id="task">Task</h2>
                    408: 
                    409: <h3>Task Creation</h3>
                    410: <p>
                    411: The task can be created by using task_create().
                    412: New child task will have the same memory image with the parent task.
                    413: Especially text region and read-only region are physically
                    414: shared among them.
                    415: The parent task receives the new task ID of child task from task_create(), but
                    416: child task will receive 0 as task ID.
                    417: </p>
                    418: <p>
                    419: The initial task states are as follow:
                    420: </p>
                    421: 
                    422: <i><b>Table 3. Initial task state</b></i>
                    423: <table border="1" width="60%" cellspacing="0">
                    424: <tbody>
                    425: <tr>
                    426:   <th>Data type</th>
                    427:   <th>Inherit from parent task?</th>
                    428: </tr>
                    429: 
                    430: <tr>
                    431:   <td>Object List</td>
                    432:   <td align="center">No</td>
                    433: </tr>
                    434: 
                    435: <tr>
                    436:   <td>Threads</td>
                    437:   <td align="center">No</td>
                    438: </tr>
                    439: 
                    440: <tr>
                    441:   <td>Memory Map</td>
                    442:   <td align="center">Yes</td>
                    443: </tr>
                    444: 
                    445: <tr>
                    446:   <td>Suspend Count</td>
                    447:   <td align="center">No</td>
                    448: </tr>
                    449: 
                    450: <tr>
                    451:   <td>Exception Handler</td>
                    452:   <td align="center">Yes</td>
                    453: </tr>
                    454: 
                    455: </tbody>
                    456: </table>
                    457: 
                    458: <p>
                    459: If the parent task is specified as NULL for task_create(),
                    460: all child state are initialized to default.
                    461: This is used in exec() emulation.
                    462: </p>
                    463: 
                    464: <h3>Task Suspension</h3>
                    465: <p>
                    466: When the task is set to suspend state, the thread suspend count of all threads
                    467: in the task is also incremented.
                    468: A thread can start to run only when both of the thread suspend count
                    469: and the task suspend count becomes 0.
                    470: </p>
                    471: 
                    472: <h3>Kernel Task</h3>
                    473: <p>
                    474: The kernel task is a special task that has only an idle thread
                    475: and interrupt threads. It does not have any user mode memory.
                    476: </p>
                    477: 
                    478: <h2 id="sched">Scheduler</h2>
                    479: <h3>Thread Priority</h3>
                    480: <p>
                    481: The Prex scheduler is based on the algorithm known as priority based
                    482: multi level queue. Each thread is assigned the priority between
                    483: 0 and 255. The lower number means higher priority like BSD unix.
                    484: It maintains 256 level run queues mapped to each priority.
                    485: The lowest priority (=255) is used only for an idle thread.
                    486: </p>
                    487: <p>
                    488: A thread has two different types of priority:
                    489: </p>
                    490: <ul>
                    491:   <li><b>Base priority:</b>
                    492:   This is a static priority which can be changed only by user mode program.
                    493:   <li><b>Current Priority:</b>
                    494:   An actual scheduling priority.
                    495:   A kernel may adjust this priority dynamically if it's needed.
                    496: </ul>
                    497: <p>
                    498: Although the base priority and the current priority are same value in almost
                    499: conditions,
                    500: kernel will sometimes change the current priority to avoid
                    501: "priority inversion".
                    502: </p>
                    503: 
                    504: <h3>Thread State</h3>
                    505: <p>
                    506: Each thread has one of the following states.
                    507: </p>
                    508: 
                    509: <ul>
                    510:   <li><b>RUN</b>     :Running or ready to run</li>
                    511:   <li><b>SLEEP</b>   :Sleep for some event</li>
                    512:   <li><b>SUSPEND</b> :Suspend count is not 0</li>
                    513:   <li><b>EXIT</b>    :Terminated</li>
                    514: </ul>
                    515: <p>
                    516: The thread is always preemptive even in kernel mode.
                    517: There are following 4 events to switch thread:
                    518: </p>
                    519: 
                    520: <i><b>Table 4. Events to switch thread</b></i>
                    521: <table border="1" width="90%" cellspacing="0">
                    522: <tbody>
                    523: <tr>
                    524:   <th>Event</th>
                    525:   <th>Condition</th>
                    526:   <th>Run queue position</th>
                    527: </tr>
                    528: <tr>
                    529:   <td><b>Block</b></td>
                    530:   <td>Thread sleep or suspend</td>
                    531:   <td>Move to the tail of runq</td>
                    532: </tr>
                    533: <tr>
                    534:   <td><b>Preemption</b></td>
                    535:   <td>Higher priority thread becomes runnable</td>
                    536:   <td>Keep the head of runq</td>
                    537: </tr>
                    538: <tr>
                    539:   <td><b>Quantum Expiration</b></td>
                    540:   <td>The thread consumes its time quantum</td>
                    541:   <td>Move to the tail of runq</td>
                    542: </tr>
                    543: <tr>
                    544:   <td><b>Yield</b></td>
                    545:   <td>The thread releases CPU by itself</td>
                    546:   <td>Move to the tail of runq</td>
                    547: </tr>
                    548: 
                    549: </tbody>
                    550: </table>
                    551: 
                    552: <h3>Scheduling Policy</h3>
                    553: <p>
                    554: There are following three types of scheduling policy.
                    555: </p>
                    556: <ul>
                    557:   <li><b>SCHED_FIFO</b>: First-in First-out
                    558:   <li><b>SCHED_RR</b>: Round Robin (SCHED_FIFO + timeslice)
                    559:   <li><b>SCHED_OTHER</b>: Not supported
                    560: </ul>
                    561: <p>
                    562: In early Prex development phase, SCHED_OTHER was implemented as a traditional
                    563: BSD scheduler. Since this scheduler changes the thread priority dynamically,
                    564: it is unpredictable and does not fit the real-time system.
                    565: Recently, SCHED_OTHER policy was dropped from Prex to
                    566: focus on real-time platform.
                    567: </p>
                    568: 
                    569: <h3>Scheduling Parameter</h3>
                    570: <p>
                    571: An application program can change the following scheduling parameters via
                    572: kernel API.
                    573: </p>
                    574: <ul>
                    575:   <li>Thread Priority</li>
                    576:   <li>Scheduling Policy</li>
                    577:   <li>Time Quantum (only for SCHED_RR)</li>
                    578: </ul>
                    579: 
                    580: <h3>Scheduling Lock</h3>
                    581: <p>
                    582: The thread scheduling can be disabled by locking the scheduler.
                    583: This is used to synchronize the thread execution to protect the
                    584: access to the global resources.
                    585: Since any interrupt handler can work while scheduling lock state,
                    586: it does not affect to the interrupt latency.
                    587: </p>
                    588: 
                    589: <h3>Named Event</h3>
                    590: <p>
                    591: The thread can sleep/wakeup for the specific event. The event works as
                    592: the queue of the sleeping threads. Since each event has its name,
                    593: it is easy to know which event the debugee is waiting for.
                    594: </p>
                    595: 
                    596: 
                    597: <h2 id="memory">Memory Management</h2>
                    598: 
                    599: <h3>Physical Page Allocator</h3>
                    600: <p>The physical page allocator provides the service for
                    601: page allocation/deallocation/reservation.
                    602: It works on the bottom layer for other memory managers.
                    603: </p>
                    604: <p>
                    605: <img alt="Memory Structure" src="img/memory.gif" border="1"
                    606: style="width: 448px; height: 308px;"><br>
                    607: 
                    608: <i><b>Figure 2. Prex Memory Structure</b></i>
                    609: </p>
                    610: <p>
                    611: The key point is that Prex kernel does not page out to
                    612: any external disk devices. This is an important design point to get
                    613: real-time performance and system simplicity.
                    614: 
                    615: </p>
                    616: 
                    617: <h3>Kernel Memory Allocator</h3>
                    618: <p>
                    619: The kernel memory allocator is optimized for the small
                    620: memory foot print system.
                    621: </p>
                    622: <p>
                    623: To allocate kernel memory, it is necessary to divide one page into
                    624: two or more blocks.
                    625: There are following 3 linked lists to manage used/free blocks for kernel
                    626: memory.
                    627: <ol>
                    628:   <li>All pages allocated for the kernel memory are linked.</li>
                    629:   <li>All blocks divided in the same page are linked.</li>
                    630:   <li>All free blocks of the same size are linked.</li>
                    631: </ol>
                    632: <p>
                    633: Currently, it can not handle the memory size exceeding one page.
                    634: Instead, a driver can use page_alloc() to allocate large memory.
                    635: <br>
                    636: When the kernel code illegally writes data into non-allocated memory,
                    637: the system will crash easily. The kmem modules are called from
                    638: not only kernel code but from various drivers. In order to
                    639: check the memory over run, each free block has a tag with magic ID.
                    640: </p>
                    641: <p>
                    642: The kernel maintains the array of the block headers for the free blocks.
                    643: The index of an array is decided by the size of each block.
                    644: All block has the size of the multiple of 16.
                    645: </p>
                    646: <pre>free_blks[0] = list for 16 byte block
                    647: free_blks[1] = list for 32 byte block
                    648: free_blks[2] = list for 48 byte block
                    649:      .
                    650:      .
                    651: free_blks[255] = list for 4096 byte block
                    652: </pre>
                    653: <p>
                    654: In generic design, only one list is used to search the free block
                    655: for a first fit algorithm.
                    656: However, the Prex kernel memory allocator is using multiple lists
                    657: corresponding to each block size.
                    658: A search is started from the list of the requested size. So,
                    659: it is not necessary to search smaller block's list wastefully.
                    660: </p>
                    661: <p>
                    662: In most of the "buddy" based memory allocators, their algorithm are
                    663: using <b>2^n</b> bytes as block size.
                    664: But, this logic will throw away much memory in case
                    665: the block size is not fit. So, this is not suitable for the
                    666: embedded systems that Prex aims to.
                    667: </p>
                    668: 
                    669: <h3>Virtual Memory Manager</h3>
                    670: <p>
                    671: A task owns its private virtual address space. All threads
                    672: in a same task share one memory space.
                    673: When new task is made, the address map of the parent task will be
                    674: automatically copied.
                    675: In this time, the read-only space is not copied and is shared with old map.
                    676: </p>
                    677: <p>
                    678: The kernel provide the following functions for VM:
                    679: </p>
                    680: <ul>
                    681:   <li>Allocate/deallocate memory region</li>
                    682:   <li>Change memory attribute (read/write/exec)</li>
                    683:   <li>Map another task's memory to current task</li>
                    684: </ul>
                    685: <p>
                    686: The VM allocator is using the traditional list-based algorithm.
                    687: </p>
                    688: <p>
                    689: The kernel task is a special task which has the virtual memory mapping
                    690: for kernel. All other user mode tasks will have the same kernel memory
                    691: image mapped from the kernel task. So, kernel threads can work with the
                    692: all user mode task context without switching memory map.
                    693: </p>
                    694: <p>
                    695: <img alt="Memory Mapping" src="img/memmap.gif" border="1"
                    696: style="width: 504px; height: 271px;"><br>
                    697: 
                    698: <i><b>Figure 3. Kernel Memory Mapping</b></i>
                    699: </p>
                    700: 
                    701: <p>
                    702: Since the Prex kernel does not do page out to an external storage,
                    703: it is guaranteed that the allocated memory is always continuing
                    704: and existing. Thereby, a kernel and drivers can be constructed
                    705: very simply.
                    706: </p>
                    707: <p>
                    708: <i>Note: "Copy-on-write" feature was supported with the Prex kernel before.
                    709: But, it was dropped to increase the real-time performance.</i>
                    710: </p>
                    711: 
                    712: 
                    713: <h2 id="ipc">IPC</h2>
                    714: <p>
                    715: The message passing model of Prex is very simple compared with other
                    716: modern microkernels. The Prex message is sent to the "object" from thread
                    717: to thread.
                    718: The "object" in Prex is similar concept that is called as "port"
                    719: in other microkernel.
                    720: </p>
                    721: 
                    722: <h3>Object</h3>
                    723: <p>
                    724: An object represents service, state, or policies etc.
                    725: For object manipulation, kernel provide 3 functions:
                    726: object_create(), object_delete(), object_lookup().
                    727: Prex task will create an object to publish its interface to other tasks.
                    728: For example, server tasks will create objects like "proc", "fs", "exec" to
                    729: allow clients to access their service.
                    730: And then, client tasks will send a request message to these objects
                    731: </p>
                    732: <p>
                    733: An actual object is stored in kernel space, and it is protected
                    734: from user mode code.
                    735: Each object data is managed with the hash table by using its name string.
                    736: Usually, an object has a unique name within a system. To send a
                    737: message to the specific object, it must obtain the target object ID
                    738: by looking up by the name.
                    739: </p>
                    740: <p>
                    741: An object can be created without its name. These objects can be used as
                    742: private objects for threads in same task.
                    743: </p>
                    744: 
                    745: <h3>Message</h3>
                    746: <p>
                    747: Each IPC message must have the pre-defined message header in it.
                    748: The kernel will store the sender task's ID into the message header.
                    749: This mechanism ensures the receiver task can get the exact task ID
                    750: of the sender task. Therefore, receiver task can check the sender
                    751: task's capability for various secure services.
                    752: </p>
                    753: <p>
                    754: It is necessary to recognize the pre-defined message format between
                    755: sender and receiver.
                    756: </p>
                    757: <p>
                    758: Messages are sent to the specific object using msg_send().
                    759: The transmission of a message is always synchronous. This means that
                    760: the thread which sent the message is blocked until it receives
                    761: a response from another threads. msg_receive() performs reception
                    762: of a message. msg_receive() is also blocked when no message is
                    763: reached to the target object. The receiver thread must answer the
                    764: message using msg_reply() after it finishes processing.
                    765: </p>
                    766: <p>
                    767: The receiver thread can not receive another message until it
                    768: replies to the sender. In short, a thread can receive only one
                    769: message at once. Once the thread receives message, it can send
                    770: another message to different object. This mechanism allows threads
                    771: to redirect the sender's request to another thread.
                    772: </p>
                    773: <p>
                    774: A thread can receive a message from the specific object which is
                    775: created by itself or thread in same task.
                    776: If the message has not arrived, it
                    777: blocks until any message comes in. The following figure shows the IPC transmit
                    778: sequence of Prex.
                    779: </p>
                    780: 
                    781: <img alt="ipc queue" src="img/msg.gif" border="1"
                    782: style="width: 505px; height: 347px;"><br>
                    783: 
                    784: <i><b>Figure 4. IPC Transmit Sequence</b></i>
                    785: 
                    786: <h3>Message Transfer</h3>
                    787: <p>
                    788: The message is copied to task to task directly without kernel
                    789: buffering. The memory region of sent message is
                    790: automatically mapped to the receiver's memory within kernel.
                    791: This mechanism allows to reduce the number of copy time while message
                    792: translation.
                    793: Since there is no page out of memory in Prex, we can
                    794: copy the message data via physical memory at anytime.
                    795: </p>
                    796: <img alt="Message transfer" src="img/ipcmap.gif" border="1"
                    797: style="width: 459px; height: 321px;"><br>
                    798: 
                    799: <i><b>Figure 5. IPC message transfer</b></i>
                    800: 
                    801: <h2 id="except">Exception Handling</h2>
                    802: <p>
                    803: A user mode task can specify its own exception handler with exception_setup().
                    804: There are two different types of exception.
                    805: </p>
                    806: <ul>
                    807:   <li><b>H/W exception</b>:
                    808:   This type of exception is caused by H/W trap &amp fault. The exception
                    809:   will be sent to the thread which caused the trap.
                    810:   If no exception handler is specified by the task, it will be
                    811:   terminated by kernel.</li>
                    812: 
                    813:   <li><b>S/W exception</b>:
                    814:   The user mode task can send S/W exception to another task by exception_raise().
                    815:   The exception
                    816:   will be sent to the thread that is sleeping with exception_wait().
                    817:   If no thread is waiting for the exception, the exception is sent
                    818:   to the first thread in the target task.</li>
                    819: </ul>
                    820: <p>
                    821: Kernel supports 32 types of exception.
                    822: The following pre-defined exceptions are raised by kernel itself.
                    823: </p>
                    824: 
                    825: <i><b>Table 5. Kernel exceptions</b></i>
                    826: <table border="1" width="80%" cellspacing="0">
                    827: <tbody>
                    828: <tr>
                    829:   <th>Exception</th>
                    830:   <th>Type</th>
                    831:   <th>Reason</th>
                    832: </tr>
                    833: <tr>
                    834:   <td>SIGILL</td>
                    835:   <td align="center">H/W</td>
                    836:   <td>Illegal instruction</td>
                    837: </tr>
                    838: <tr>
                    839:   <td>SIGTRAP</td>
                    840:   <td align="center">H/W</td>
                    841:   <td>Break point</td>
                    842: </tr>
                    843: <tr>
                    844:   <td>SIGFPE</td>
                    845:   <td align="center">H/W</td>
                    846:   <td>Math error</td>
                    847: </tr>
                    848: <tr>
                    849:   <td>SIGSEGV</td>
                    850:   <td align="center">H/W</td>
                    851:   <td>Invalid memory access</td>
                    852: </tr>
                    853: <tr>
                    854:   <td>SIGALRM</td>
                    855:   <td align="center">S/W</td>
                    856:   <td>Alarm event</td>
                    857: </tr>
                    858: </tbody>
                    859: </table>
                    860: 
                    861: <p>
                    862: POSIX emulation library will setup its own exception handler to convert
                    863: the Prex exceptions into UNIX signals. It will maintain its own signal mask.
                    864: And, it transfer control to the actual POSIX signal handler that is
                    865: defined by the user mode process.
                    866: 
                    867: </p>
                    868: 
                    869: <h2 id="int"> Interrupt Framework</h2>
                    870: <p>
                    871: Prex defines two different types of interrupt service to
                    872: optimize the response time of real-time operation.
                    873: </p>
                    874: 
                    875: <h3>Interrupt Service Routine (ISR)</h3>
                    876: <p>
                    877: ISR is started by an actual hardware interrupt. The associated
                    878: interrupt is disabled in ICU and CPU interrupt is enabled
                    879: while it runs. If ISR determines that its device generates
                    880: the interrupt, ISR must program the device to stop the interrupt.
                    881: Then, ISR should do minimum I/O operation and return control
                    882: as quickly as possible.
                    883: ISR will run within the context of current running thread at
                    884: interrupt time. So, only few kernel services are available within
                    885: ISR. IRQ_ASSERT() macro can be used to detect the invalid
                    886: function call from ISR.
                    887: </p>
                    888: 
                    889: <h3>Interrupt Service Thread (IST)</h3>
                    890: <p>
                    891: IST is automatically activated if ISR returns INT_CONTINUE to kernel. It
                    892: will be called when the system enters safer condition than ISR.
                    893: Any interrupt driven I/O operation should be done in IST not ISR.
                    894: Since ISR for same IRQ may be run during IST, the shared data,
                    895: resources, and device registers must be synchronized by using
                    896: irq_lock().
                    897: IST does not have to be reentrant, since it is not interrupted
                    898: by same IST itself.
                    899: </p>
                    900: 
                    901: <h3>Interrupt Nesting &amp Priority</h3>
                    902: <p>
                    903: Each ISR has its logical priority level, with 0 being the highest
                    904: priority. While one ISR is running, all lower priority interrupts
                    905: are masked off.
                    906: This interrupt nesting mechanism avoids delaying of
                    907: high priority interrupt events.
                    908: </p>
                    909: <p>
                    910: IST is executed as a normal thread dispatched by the scheduler. So,
                    911: the interrupt thread which has higher priority is executed first.
                    912: The driver writer can specify the thread priority of IST when
                    913: IST is attached to the specific interrupt line.
                    914: The important point is that even a user mode task can be performed
                    915: prior to an interrupt thread.
                    916: </p>
                    917: <p>
                    918: The following figure is the sample of the Prex interrupt processing.
                    919: </p>
                    920: <p>
                    921: <img alt="Interrupt Processing" src="img/irq.gif" border="1"><br>
                    922: <i><b>Figure 6. Prex Interrupt Processing</b></i>
                    923: </p>
                    924: <p>
                    925: </p>
                    926: 
                    927: <h3>Interrupt Locking</h3>
                    928: <p>
                    929: irq_lock() &amp irq_unlock() are used to disable all interrupts in
                    930: order to synchronize the access to the kernel or H/W resource. Since irq_lock()
                    931: increments a lock counter, irq_unlock() will automatically restore
                    932: to the original interrupt state when locking count becomes 0.
                    933: So the caller does not have to save the previous interrupt state.
                    934: </p>
                    935: 
                    936: <h3>Interrupt Stack</h3>
                    937: <p>
                    938: If each ISR uses the kernel stack of the current running thread, the
                    939: stack area may be over-flow when continuous interrupts are occurred at
                    940: one same thread. So the kernel stack will be switched to the dedicated stack
                    941: while ISR is running.
                    942: </p>
                    943: 
                    944: 
                    945: <h2 id="timer">Timer</h2>
                    946: 
                    947: <h3>Kernel Timers</h3>
                    948: 
                    949: <p>
                    950: The kernel timer provides the following feature.
                    951: </p>
                    952: <ul>
                    953:   <li><b>Sleep timer</b>:      Put caller thread to sleep in the specified time.</li>
                    954:   <li><b>Call back timer</b>:  Call the routine after specified time passes.</li>
                    955:   <li><b>Periodic timer</b>:   Call the routine at the specified interval.</li>
                    956: </ul>
                    957: 
                    958: <h3>Timer Jitter</h3>
                    959: 
                    960: <p>
                    961: The periodic timer is designed to minimize the deviation between desired and
                    962: actual expiration.
                    963: </p>
                    964: 
                    965: <h2 id="device">Device I/O Service</h2>
                    966: 
                    967: The Prex device driver module is separated from the kernel, and this module
                    968: is linked with the kernel at the boot time.
                    969: The kernel provides only simple and minimum services to help the 
                    970: communication between applications and drivers.
                    971: 
                    972: <h3>Device Object</h3>
                    973: <p>
                    974: Since the Prex kernel does not have the file system in it, the kernel
                    975: provides a device object service for I/O interface.
                    976: The device object is created by the device driver to communicate to the
                    977: application. Usually, the driver creates a device object for an existing
                    978: physical device. But, it can be used to handle logical or virtual devices.
                    979: </p>
                    980: 
                    981: <h3>Driver Interface</h3>
                    982: <p>
                    983: The interface between kernel and drivers are defined clearly as
                    984: "Driver Kernel Interface". The kernel provides the following services
                    985: for device drivers.
                    986: </p>
                    987: <ul>
                    988:   <li>Device object service</li>
                    989:   <li>Kernel memory allocation</li>
                    990:   <li>Physical page allocation</li>
                    991:   <li>Interrupt handling service</li>
                    992:   <li>Scheduler service</li>
                    993:   <li>Timer service</li>
                    994:   <li>Debug service</li>
                    995: </ul>
                    996: 
                    997: <h3>Application Interface</h3>
                    998: <p>
                    999: The kernel device I/O interface are provided to access the specific device
                   1000: object which is handled by a driver. 
                   1001: The Prex kernel provides the following 5 functions for applications.
                   1002: </p>
                   1003: <ul>
                   1004:  <li>Open a device</li>
                   1005:  <li>Close a device</li>
                   1006:  <li>Read from a device</li>
                   1007:  <li>Write to a device</li>
                   1008:  <li>Device I/O control</li>
                   1009: </ul>
                   1010: 
                   1011: <h2 id="mutex">Mutex</h2>
                   1012: <h3>Priority Inheritance</h3>
                   1013: <p>
                   1014: The thread priority is automatically changed at one of the following conditions.
                   1015: </p>
                   1016: <ol>
                   1017:   <li>
                   1018:   When the current thread fails to lock the mutex and the mutex
                   1019:   owner has lower priority than current thread, the priority
                   1020:   of mutex owner is boosted to the current priority.
                   1021:   If this mutex owner is waiting for another mutex, such related
                   1022:   mutexes are also processed.
                   1023:   </li>
                   1024:   <li>
                   1025:   When the current thread unlocks the mutex and its priority
                   1026:   has already been boosted, kernel recomputes the current priority.
                   1027:   In this case, the priority is set to the highest
                   1028:   priority among the threads waiting for the mutexes locked by the
                   1029:   current thread.
                   1030:   </li>
                   1031:   <li>
                   1032:    When the priority is changed by the user request, the related(inherited)
                   1033:    thread's priority is also changed.
                   1034:   </li>
                   1035: </ol>
                   1036: <p>
                   1037: There are following limitations about priority inheritance
                   1038: with Prex mutex.
                   1039: </p>
                   1040: <ol>
                   1041:   <li>
                   1042:   If the priority is changed by the user request, the priority
                   1043:   recomputation is done only when the new priority is higher
                   1044:   than old priority. The inherited priority is reset to base
                   1045:   priority when the mutex is unlocked.
                   1046:   </li>
                   1047:   <li>
                   1048:   Even if thread is killed with mutex waiting, the related
                   1049:   priority is not adjusted.
                   1050:   </li>
                   1051: </ol>
                   1052: <h2 id="debug">Debug</h2>
                   1053: There are following debugging support functions:
                   1054: <ul>
                   1055:   <li>printk(): Display the debug message in kernel.</li>
                   1056:   <li>panic(): Dump processor registers and stop system.</li>
                   1057:   <li>ASSERT(): If expression is false (zero), stop system and display information.</li>
                   1058:   <li>trace_on(): If the kernel trace is enabled, all entry/exit of functions
                   1059:   are logged.
                   1060: </ul>
                   1061:     </tr>
                   1062:     <tr>
                   1063:       <td id="footer" colspan="2" style="vertical-align: top;">
                   1064:         <a href="http://sourceforge.net">
                   1065:         <img src="http://sourceforge.net/sflogo.php?group_id=132028&amp;type=1"
                   1066:         alt="SourceForge.net Logo" border="0" height="31" width="88"></a><br>
                   1067:         Copyright&copy; 2005-2007 Kohsuke Ohtani
                   1068:       </td>
                   1069:     </tr>
                   1070: 
                   1071:   </tbody>
                   1072: </table>
                   1073: 
                   1074: </div>
                   1075: <div id="bottom"></div>
                   1076: 
                   1077: </body>
                   1078: </html>

CVSweb