i have function takes 3 arguments, dest, src0, src1, each pointer data of size 12. made 2 versions. 1 written in c , optimized compiler, other 1 written in _asm. yeah. 3 arguments? naturally like:
mov ecx, [src0] mov edx, [src1] mov eax, [dest] i bit confused compiler, saw fit add following:
_src0$ = -8 ; size = 4 _dest$ = -4 ; size = 4 _src1$ = 8 ; size = 4 ?vm_vec_add_scalar_asm@@yaxpauvec3d@@pbu1@1@z proc ; vm_vec_add_scalar_asm ; _dest$ = ecx ; _src0$ = edx ; 20 : { sub esp, 8 mov dword ptr _src0$[esp+8], edx mov dword ptr _dest$[esp+8], ecx ; 21 : _asm ; 22 : { ; 23 : mov ecx, [src0] mov ecx, dword ptr _src0$[esp+8] ; 24 : mov edx, [src1] mov edx, dword ptr _src1$[esp+4] ; 25 : mov eax, [dest] mov eax, dword ptr _dest$[esp+8] function body etc. add esp, 8 ret 0 what _src0$[esp+8] etc. means? why stuff before code? why try [apparently]stack badly?
in comparison, c++ version has following before body, pretty similar:
_src1$ = 8 ; size = 4 ?vm_vec_add@@yaxpauvec3d@@pbu1@1@z proc ; vm_vec_add ; _dest$ = ecx ; _src0$ = edx mov eax, dword ptr _src1$[esp-4] why little sufficient?
the answer of mats petersson explained __fastcall. guess not you're asking ...
actually _src0$[esp+8] means [_src0$ + esp + 8], , _src0$ defined above:
_src0$ = -8 ; size = 4 so, whole expression _src0$[esp+8] nothing [esp] ...
to see why these stuff, should first understand mats petersson said in post, __fastcall, or more generally, calling convention. see link in post detailed informations.
assuming have understood __fastcall, let's see happens codes. compiler using __fastcall. callee function f(dst, src0, src1), requires 3 parameters, according calling convention, when caller calls f, following:
- move
dstecx,src0edx - push
src1onto stack - push 4 bytes return address onto stack
- go starting address of function
f
and callee f, when code begins, knows parameters are: dst , src0 in registers ecx , edx, respectively; esp pointing 4 bytes return address, 4 bytes below (i.e. dword ptr[esp+4]) src1.
so, in "c++ version", function f should do:
mov eax, dword ptr _src1$[esp-4] here _src1$ = 8, _src1$[esp-4] [esp+4]. see, retrieves parameter src1 , stores in eax.
there tricky point here. in code of f, if want use parameter src1 multiple times, can that, because it's stored in stack, right below return address; if want use dst , src0 multiple times? in registers, , can destroyed @ time.
so in case, compiler should following: right after entering function f, should remember current values of ecx , edx (by pushing them onto stack). these 8 bytes so-called "shadow space". not done in "c++ version", because compiler knows sure these 2 parameters not used multiple times, or can handle other way.
now, happens _asm version? problem here using inline assembly. compiler loses control registers, , cannot assume registers ecx , edx safe in _asm block (they not, since used them in _asm block). forced save them @ beginning of function.
the saving goes follows: first raises esp 8 bytes (sub esp, 8), move edx , ecx [esp] , [esp+4] respectively.
and can enter safely _asm block. in mind (if has one), picture [esp] src0, [esp+4] dst, [esp+8] 4 byte return address, , [esp+12] src1. no longer thinks ecx , edx.
thus first instruction in _asm block, mov ecx, [src0], should interpreted mov ecx, [esp], same as
mov ecx, dword ptr _src0$[esp+8] and same other 2 instructions.
at point, might say, aha it's doing stupid things, don't want waste time , space on that, there way?
well there way - not use inline assembly... it's convenient, there compromise.
you can write assembly function f in .asm source file , public it. in c/c++ code, declare extern 'c' f(...). then, when begin assembly function f, can play directly ecx , edx.
Comments
Post a Comment