sys/arch/mvme88k/stand/sboot/oc_cksum.S - annotate

Return to oc_cksum.S CVS log
Up to [local] / sys / arch / mvme88k / stand / sboot
Annotation of sys/arch/mvme88k/stand/sboot/oc_cksum.S, Revision 1.1

1.1     ! nbrk        1: |      $OpenBSD: oc_cksum.S,v 1.3 2006/05/16 22:52:26 miod Exp $
        !             2:
        !             3: | Copyright (c) 1988 Regents of the University of California.
        !             4: | All rights reserved.
        !             5: |
        !             6: | Redistribution and use in source and binary forms, with or without
        !             7: | modification, are permitted provided that the following conditions
        !             8: | are met:
        !             9: | 1. Redistributions of source code must retain the above copyright
        !            10: |    notice, this list of conditions and the following disclaimer.
        !            11: | 2. Redistributions in binary form must reproduce the above copyright
        !            12: |    notice, this list of conditions and the following disclaimer in the
        !            13: |    documentation and/or other materials provided with the distribution.
        !            14: | 3. All advertising materials mentioning features or use of this software
        !            15: |    must display the following acknowledgement:
        !            16: |      This product includes software developed by the University of
        !            17: |      California, Berkeley and its contributors.
        !            18: | 4. Neither the name of the University nor the names of its contributors
        !            19: |    may be used to endorse or promote products derived from this software
        !            20: |    without specific prior written permission.
        !            21: |
        !            22: | THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
        !            23: | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        !            24: | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
        !            25: | ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
        !            26: | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        !            27: | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
        !            28: | OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
        !            29: | HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
        !            30: | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
        !            31: | OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
        !            32: | SUCH DAMAGE.
        !            33: |
        !            34: |      @(#)oc_cksum.s  7.2 (Berkeley) 11/3/90
        !            35: |
        !            36: |
        !            37: | oc_cksum: ones complement 16 bit checksum for MC68020.
        !            38: |
        !            39: | oc_cksum (buffer, count, strtval)
        !            40: |
        !            41: | Do a 16 bit ones complement sum of 'count' bytes from 'buffer'.
        !            42: | 'strtval' is the starting value of the sum (usually zero).
        !            43: |
        !            44: | It simplifies life in in_cksum if strtval can be >= 2^16.
        !            45: | This routine will work as long as strtval is < 2^31.
        !            46: |
        !            47: | Performance
        !            48: | -----------
        !            49: | This routine is intended for MC 68020s but should also work
        !            50: | for 68030s.  It (deliberately) does not worry about the alignment
        !            51: | of the buffer so will only work on a 68010 if the buffer is
        !            52: | aligned on an even address.  (Also, a routine written to use
        !            53: | 68010 "loop mode" would almost certainly be faster than this
        !            54: | code on a 68010).
        !            55: |
        !            56: | We do not worry about alignment because this routine is frequently
        !            57: | called with small counts: 20 bytes for IP header checksums and 40
        !            58: | bytes for TCP ack checksums.  For these small counts, testing for
        !            59: | bad alignment adds ~10% to the per-call cost.  Since, by the nature
        !            60: | of the kernel allocator, the data we are called with is almost
        !            61: | always longword aligned, there is no benefit to this added cost
        !            62: | and we are better off letting the loop take a big performance hit
        !            63: | in the rare cases where we are handed an unaligned buffer.
        !            64: |
        !            65: | Loop unrolling constants of 2, 4, 8, 16, 32 and 64 times were
        !            66: | tested on random data on four different types of processors (see
        !            67: | list below -- 64 was the largest unrolling because anything more
        !            68: | overflows the 68020 Icache).  On all the processors, the
        !            69: | throughput asymptote was located between 8 and 16 (closer to 8).
        !            70: | However, 16 was substantially better than 8 for small counts.
        !            71: | (It is clear why this happens for a count of 40: unroll-8 pays a
        !            72: | loop branch cost and unroll-16 does not.  But the tests also showed
        !            73: | that 16 was better than 8 for a count of 20.  It is not obvious to
        !            74: | me why.)  So, since 16 was good for both large and small counts,
        !            75: | the loop below is unrolled 16 times.
        !            76: |
        !            77: | The processors tested and their average time to checksum 1024 bytes
        !            78: | of random data were:
        !            79: |      Sun 3/50 (15MHz)        190 us/KB
        !            80: |      Sun 3/180 (16.6MHz)     175 us/KB
        !            81: |      Sun 3/60 (20MHz)        134 us/KB
        !            82: |      Sun 3/280 (25MHz)        95 us/KB
        !            83: |
        !            84: | The cost of calling this routine was typically 10% of the per-
        !            85: | kilobyte cost.  E.g., checksumming zero bytes on a 3/60 cost 9us
        !            86: | and each additional byte cost 125ns.  With the high fixed cost,
        !            87: | it would clearly be a gain to "inline" this routine -- the
        !            88: | subroutine call adds 400% overhead to an IP header checksum.
        !            89: | However, in absolute terms, inlining would only gain 10us per
        !            90: | packet -- a 1% effect for a 1ms ethernet packet.  This is not
        !            91: | enough gain to be worth the effort.
        !            92:
        !            93: #include <machine/asm.h>
        !            94:
        !            95:        .text
        !            96:
        !            97:        .text; .even; .globl _oc_cksum; _oc_cksum:
        !            98:        movl    sp@(4),a0       | get buffer ptr
        !            99:        movl    sp@(8),d1       | get byte count
        !           100:        movl    sp@(12),d0      | get starting value
        !           101:        movl    d2,sp@-         | free a reg
        !           102:
        !           103:        | test for possible 1, 2 or 3 bytes of excess at end
        !           104:        | of buffer.  The usual case is no excess (the usual
        !           105:        | case is header checksums) so we give that the faster
        !           106:        | 'not taken' leg of the compare.  (We do the excess
        !           107:        | first because we are about the trash the low order
        !           108:        | bits of the count in d1.)
        !           109:
        !           110:        btst    #0,d1
        !           111:        jne     L5              | if one or three bytes excess
        !           112:        btst    #1,d1
        !           113:        jne     L7              | if two bytes excess
        !           114: L1:
        !           115:        movl    d1,d2
        !           116:        lsrl    #6,d1           | make cnt into # of 64 byte chunks
        !           117:        andl    #0x3c,d2        | then find fractions of a chunk
        !           118:        negl    d2
        !           119:        andb    #0xf,cc         | clear X
        !           120:        jmp     pc@(L3-.-2:b,d2)
        !           121: L2:
        !           122:        movl    a0@+,d2
        !           123:        addxl   d2,d0
        !           124:        movl    a0@+,d2
        !           125:        addxl   d2,d0
        !           126:        movl    a0@+,d2
        !           127:        addxl   d2,d0
        !           128:        movl    a0@+,d2
        !           129:        addxl   d2,d0
        !           130:        movl    a0@+,d2
        !           131:        addxl   d2,d0
        !           132:        movl    a0@+,d2
        !           133:        addxl   d2,d0
        !           134:        movl    a0@+,d2
        !           135:        addxl   d2,d0
        !           136:        movl    a0@+,d2
        !           137:        addxl   d2,d0
        !           138:        movl    a0@+,d2
        !           139:        addxl   d2,d0
        !           140:        movl    a0@+,d2
        !           141:        addxl   d2,d0
        !           142:        movl    a0@+,d2
        !           143:        addxl   d2,d0
        !           144:        movl    a0@+,d2
        !           145:        addxl   d2,d0
        !           146:        movl    a0@+,d2
        !           147:        addxl   d2,d0
        !           148:        movl    a0@+,d2
        !           149:        addxl   d2,d0
        !           150:        movl    a0@+,d2
        !           151:        addxl   d2,d0
        !           152:        movl    a0@+,d2
        !           153:        addxl   d2,d0
        !           154: L3:
        !           155:        dbra    d1,L2           | (NB- dbra does not affect X)
        !           156:
        !           157:        movl    d0,d1           | fold 32 bit sum to 16 bits
        !           158:        swap    d1              | (NB- swap does not affect X)
        !           159:        addxw   d1,d0
        !           160:        jcc     L4
        !           161:        addw    #1,d0
        !           162: L4:
        !           163:        andl    #0xffff,d0
        !           164:        movl    sp@+,d2
        !           165:        rts
        !           166:
        !           167: L5:    | deal with 1 or 3 excess bytes at the end of the buffer.
        !           168:        btst    #1,d1
        !           169:        jeq     L6              | if 1 excess
        !           170:
        !           171:        | 3 bytes excess
        !           172:        clrl    d2
        !           173:        movw    a0@(-3,d1:l),d2 | add in last full word then drop
        !           174:        addl    d2,d0           |  through to pick up last byte
        !           175:
        !           176: L6:    | 1 byte excess
        !           177:        clrl    d2
        !           178:        movb    a0@(-1,d1:l),d2
        !           179:        lsll    #8,d2
        !           180:        addl    d2,d0
        !           181:        jra     L1
        !           182:
        !           183: L7:    | 2 bytes excess
        !           184:        clrl    d2
        !           185:        movw    a0@(-2,d1:l),d2
        !           186:        addl    d2,d0
        !           187:        jra     L1
CVSweb