-
Notifications
You must be signed in to change notification settings - Fork 37
Added the possibility to access pre-registers in a post-hook #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks good to me after minimal testing! My only concern, although I understand the reasoning behind it with this implementation, would probably be the removal of caching. I can certainly see that resulting into a significant increase in computing time for hooks that have many callbacks. More so for functions that have many parameters because the conventions are calculating their offsets on the fly by traversing the whole structure type by type aligning them, etc. which means that not only they have to be wrapped by Python every time but also add quite a lot of looping and calculating every times an argument is retrieved or mutated. Perhaps calculating those only once on convention's construction could be worth considering. I will do more testings and profiling when I get the chance. |
Can't we just add another cache? |
Yeah, that would probably work very well. But first, I would also like to do some measuring, because I don't think it has a high impact on performance. |
I've tested the following code just now with and without cache: from memory import *
from time import *
from random import *
from sys import *
@Callback(Convention.CDECL, (DataType.POINTER,) * 64, DataType.VOID)
def callback(stack_data):
t = time()
for i in range(1000000):
repr(stack_data)
print(time() - t)
callback(*(Pointer(randint(0, maxsize)) for i in range(64))) And here was the results I got:
Basically, the larger the index is, the more work is done into GetArgumentPtr because every times the whole structure is recalculated in order to grab its offset, etc. I can only imagine the result with a custom convention that forwards every calls back to Python. But yeah, the cache was negating that effect to an extent, but regardless if we add a second cache as @CookStar suggested or not, I think this is definitely something we should look into optimizing. The types are known on construction, no reason to repeat that on the fly imo. |
To be honest, I didn't expect such a big difference in performance. But yeah, the Python stuff steals a lot CPU time. void* x86MsCdecl::GetArgumentPtr(int iIndex, CRegisters* pRegisters)
{
return (void *) (pRegisters->m_esp->GetValue<unsigned long>() + m_pOffsets[iIndex]);
} And these are my results (just using 100000 calls):
Not sure if it's worth improving it. I also changed the test code to only call
So, if the hook is only called once, the cache obviously slows it down, because it has to be evaluated, but doesn't contain cached data. |
Well, that's surprisingly unexpected. I would have expected all the looping, switching, calculating, etc. at run-time to be much more costly than simply retrieving an int from
Yeah, that's unfortunately the downside of caching; it's a significant improvement only if it is used, otherwise it's pretty much a waste of lookup. Though, perhaps it's a good trade-off if we consider that most plugins hook the same stuff and that, even if there is only one callback, scripters are not particularly known to re-use their data (eh? 👀) so that optimizes those scenarios as well. For the later, one could argue that this is their responsibility; and they would be entirely right. However, I will completely backtrack and say that we should keep the cache removed. The reason is quite simple; while the improvement is major at 1m iterations, it is greatly less at 100k as illustrated by your metrics, and would be next to null at ~10 (which would most likely even be exaggerated to consider a server having 10 callbacks on the same hook). But SP without plugins has many single-callback hooks by default, which would be slower by anticipating the fact that maybe perhaps maybe not many plugins will manually hook the same functions. So yeah, I think the best call after all is to; if we judge a function is being hooked very often, let's just provide listeners that re-use the objects like we did for the buttons for example. As for the PR, I've made quite a bit of testing and haven't noticed any issues and everything seems to be working fine on my side. The only issue I stumbled upon is the one referenced above that isn't directly related to this PR so I will just go ahead and approve it as is. 👍 |
* Preserve the registers from the pre-hook, so you can use it optionally in a post-hook * Speed up GetArgumentPtr by pre-calculating the stack offsets Source-Python-Dev-Team/Source.Python#384
Ran this code again and Ayuto/DynamicHooks@ed07f4e had quite a good impact after all: 81.87904047966003 Thanks for implementing it! :) |
This PR adds the possibility to access registers in a post-hook that were saved before the pre-hook was called. It fixes the issue that is explained here:
https://forums.sourcepython.com/viewtopic.php?p=7542#p7542
It's up to the programmer to decide whether or not he wants to access the pre-hook state or the post-hook state, because there are certainly scenarios were you need the pre-hook state or the post-hook state.
See also:
https://github.com/Ayuto/DynamicHooks/tree/pre_post_registers
The only problem I can think of (in this implementation) is that it won't work very well for recursive functions. To fix that as well, we would need a stack (similar to what we did with the return address). Oh, and if the stack is modified, it also won't fix the problem.
@jordanbriere What do you think?
Before we merge it, we definitely need to do some more tests and recompile DynamicHooks for Linux (Windows is already included in one of the commits).