my c# code generator spits out nested switch statements method in class compile dynamically @ runtime, load , instantiate, , execute. execution time of 100x faster when compared generic, non-compiled version has use hash tables (as hash table keys, turn switch cases in compiled version, known @ runtime).
as switch statements bigger, performance stays pretty same, if number of "switch hops" executed, not change, i.e. adding code in case statements not executed not affect performance.
however, works until code size, , performance drops factor of 7 (when running in 32 bit mode) or 12 (running in native 64 bit mode).
i had @ jitted code, , in fact change parts of code not changed, code grows. (not being familiar assembly , instruction sets,) assume there "short jump" , "long jump", former being limited amount of bytes can jump. could elucidate high-level programmer why generated machine code has be, or is, different?
n.b. i'm aware i'm testing code doing nothing, smallest differences in machine code naturally have huge impact on relative performance. point of generate code close nothing possible, called hundreds of thousands of times per second.
here 2 different versions of switch statement head when overall code size relatively small , performance good, copied visual studio using jit optimized release build, running in 32-bit mode:
switch (a) 00000000 push ebp 00000001 mov ebp,esp 00000003 dec edx 00000004 cmp edx,3bh 00000007 jae 0000021d 0000000d jmp dword ptr [edx*4+00773ad8h] { case 1: return 1; and, more code in un-entered case blocks - still fast:
switch (a) 00000000 push ebp 00000001 mov ebp,esp 00000003 lea eax,[edx-1] 00000006 cmp eax,3bh 00000009 jae 00001c51 0000000f jmp dword ptr [eax*4+00a35830h] { case 1: { and version bigger code, turns out 7 times slower.
switch (a) 00000000 push ebp 00000001 mov ebp,esp 00000003 push edi 00000004 push esi 00000005 sub esp,0fch 0000000b mov esi,ecx 0000000d lea edi,[ebp+fffffefch] 00000013 mov ecx,3eh 00000018 xor eax,eax 0000001a rep stos dword ptr es:[edi] 0000001c mov ecx,esi 0000001e mov dword ptr [ebp-0ch],edx 00000021 mov eax,dword ptr [ebp-0ch] 00000024 mov dword ptr [ebp-10h],eax 00000027 mov eax,dword ptr [ebp-10h] 0000002a dec eax 0000002b cmp eax,3bh 0000002e jae 00000037 00000030 jmp dword ptr [eax*4+0077c488h] 00000037 jmp 0000888f { case 1: { n.b. i'm posting head of switch statement, thing gets executed in test, because call method in question value in no case statement (and there's no default case), fall through , (i hope) not execute code inside switch.
it looks main difference between first 2 examples , last is, jester pointed out, last example allocates 252 bytes on stack , zeros it. wouldn't because code in switch statement bigger, because code in switch statement uses local variables , temporaries previous 2 examples don't. first 2 examples either don't use local variables or temporaries, or jit optimizer managed allocate them in registers.
the other notable issue last example mov instructions @ addresses 0000001e - 00000027. these instructions store switch value a in 2 different locations stack , reload value stack each time. guess that values stored on stack never used again, making code unnecessary. if used later in code there's no need reload value stack. either way optimizer has failed. if i'm right , these stack locations unused optimizer may have failed eliminate other unnecessary temporaries resulting in more stack space being used necessary.
i should point out difference between first , second example shows how optimizer can similar case right. code different in first 2 examples because optimizer preserves value of a in second example, presumably because a used later in code. in examples assembly code normalizes range of switch statement 1 - 60 0 - 59. saves both entry in jump table , couple of instructions. in first example value of a lost when this, in later 2 examples value of a preserved. second example leaves value of in register passed function with. third example preserves in original register , saves 2 additional copies in separate locations on stack.
if overwhelmingly usual case no case in switch statement executed possible solution put check see if if switch value in range in own function. function call function containing switch statement if necessary. otherwise can try moving less used and/or high stack usage cases out of switch statement own functions.
(i'm not familiar microsoft's jit optimizer, may need use noinlining attribute prevent inlining separated functions together.)
Comments
Post a Comment