.net - switch statement performance depends on code size of un-entered inner cases -


my c# code generator spits out nested switch statements method in class compile dynamically @ runtime, load , instantiate, , execute. execution time of 100x faster when compared generic, non-compiled version has use hash tables (as hash table keys, turn switch cases in compiled version, known @ runtime).

as switch statements bigger, performance stays pretty same, if number of "switch hops" executed, not change, i.e. adding code in case statements not executed not affect performance.

however, works until code size, , performance drops factor of 7 (when running in 32 bit mode) or 12 (running in native 64 bit mode).

i had @ jitted code, , in fact change parts of code not changed, code grows. (not being familiar assembly , instruction sets,) assume there "short jump" , "long jump", former being limited amount of bytes can jump. could elucidate high-level programmer why generated machine code has be, or is, different?

n.b. i'm aware i'm testing code doing nothing, smallest differences in machine code naturally have huge impact on relative performance. point of generate code close nothing possible, called hundreds of thousands of times per second.

here 2 different versions of switch statement head when overall code size relatively small , performance good, copied visual studio using jit optimized release build, running in 32-bit mode:

        switch (a) 00000000  push        ebp  00000001  mov         ebp,esp  00000003  dec         edx  00000004  cmp         edx,3bh  00000007  jae         0000021d  0000000d  jmp         dword ptr [edx*4+00773ad8h]          {             case 1: return 1; 

and, more code in un-entered case blocks - still fast:

        switch (a) 00000000  push        ebp  00000001  mov         ebp,esp  00000003  lea         eax,[edx-1]  00000006  cmp         eax,3bh  00000009  jae         00001c51  0000000f  jmp         dword ptr [eax*4+00a35830h]          {             case 1:                 { 

and version bigger code, turns out 7 times slower.

        switch (a) 00000000  push        ebp  00000001  mov         ebp,esp  00000003  push        edi  00000004  push        esi  00000005  sub         esp,0fch  0000000b  mov         esi,ecx  0000000d  lea         edi,[ebp+fffffefch]  00000013  mov         ecx,3eh  00000018  xor         eax,eax  0000001a  rep stos    dword ptr es:[edi]  0000001c  mov         ecx,esi  0000001e  mov         dword ptr [ebp-0ch],edx  00000021  mov         eax,dword ptr [ebp-0ch]  00000024  mov         dword ptr [ebp-10h],eax  00000027  mov         eax,dword ptr [ebp-10h]  0000002a  dec         eax  0000002b  cmp         eax,3bh  0000002e  jae         00000037  00000030  jmp         dword ptr [eax*4+0077c488h]  00000037  jmp         0000888f          {             case 1:                 { 

n.b. i'm posting head of switch statement, thing gets executed in test, because call method in question value in no case statement (and there's no default case), fall through , (i hope) not execute code inside switch.

it looks main difference between first 2 examples , last is, jester pointed out, last example allocates 252 bytes on stack , zeros it. wouldn't because code in switch statement bigger, because code in switch statement uses local variables , temporaries previous 2 examples don't. first 2 examples either don't use local variables or temporaries, or jit optimizer managed allocate them in registers.

the other notable issue last example mov instructions @ addresses 0000001e - 00000027. these instructions store switch value a in 2 different locations stack , reload value stack each time. guess that values stored on stack never used again, making code unnecessary. if used later in code there's no need reload value stack. either way optimizer has failed. if i'm right , these stack locations unused optimizer may have failed eliminate other unnecessary temporaries resulting in more stack space being used necessary.

i should point out difference between first , second example shows how optimizer can similar case right. code different in first 2 examples because optimizer preserves value of a in second example, presumably because a used later in code. in examples assembly code normalizes range of switch statement 1 - 60 0 - 59. saves both entry in jump table , couple of instructions. in first example value of a lost when this, in later 2 examples value of a preserved. second example leaves value of in register passed function with. third example preserves in original register , saves 2 additional copies in separate locations on stack.

if overwhelmingly usual case no case in switch statement executed possible solution put check see if if switch value in range in own function. function call function containing switch statement if necessary. otherwise can try moving less used and/or high stack usage cases out of switch statement own functions.

(i'm not familiar microsoft's jit optimizer, may need use noinlining attribute prevent inlining separated functions together.)


Comments