Added the possibility to access pre-registers in a post-hook #384

Ayuto · 2021-02-01T20:26:52Z

This PR adds the possibility to access registers in a post-hook that were saved before the pre-hook was called. It fixes the issue that is explained here:
https://forums.sourcepython.com/viewtopic.php?p=7542#p7542

It's up to the programmer to decide whether or not he wants to access the pre-hook state or the post-hook state, because there are certainly scenarios were you need the pre-hook state or the post-hook state.

See also:
https://github.com/Ayuto/DynamicHooks/tree/pre_post_registers

The only problem I can think of (in this implementation) is that it won't work very well for recursive functions. To fix that as well, we would need a stack (similar to what we did with the return address). Oh, and if the stack is modified, it also won't fix the problem.

@jordanbriere What do you think?

Before we merge it, we definitely need to do some more tests and recompile DynamicHooks for Linux (Windows is already included in one of the commits).

from entities.hooks import EntityCondition
from entities.hooks import EntityPostHook
from entities.hooks import EntityPreHook

from memory.hooks import use_pre_registers

@EntityPreHook(EntityCondition.is_player, 'drop_weapon')
def pre_on_drop_weapon(stack_data):
    print(f'PRE: this = {stack_data[0].address}')

@EntityPostHook(EntityCondition.is_player, 'drop_weapon')
def post_on_drop_weapon(stack_data, ret_val):
    print(f'POST FALSE: this = {stack_data[0].address}')
    with use_pre_registers(stack_data):
        print(f'POST CORRECT: this = {stack_data[0].address}')

PRE: this = 540996256
POST FALSE: this = 11524119
POST CORRECT: this = 540996256

jordanbriere · 2021-02-02T05:20:56Z

Looks good to me after minimal testing!

My only concern, although I understand the reasoning behind it with this implementation, would probably be the removal of caching. I can certainly see that resulting into a significant increase in computing time for hooks that have many callbacks. More so for functions that have many parameters because the conventions are calculating their offsets on the fly by traversing the whole structure type by type aligning them, etc. which means that not only they have to be wrapped by Python every time but also add quite a lot of looping and calculating every times an argument is retrieved or mutated. Perhaps calculating those only once on convention's construction could be worth considering.

I will do more testings and profiling when I get the chance.

CookStar · 2021-02-02T10:05:15Z

Can't we just add another cache?

Ayuto · 2021-02-02T10:26:19Z

Yeah, that would probably work very well. But first, I would also like to do some measuring, because I don't think it has a high impact on performance.

jordanbriere · 2021-02-02T11:49:20Z

I've tested the following code just now with and without cache:

from memory import *
from time import *
from random import *
from sys import *

@Callback(Convention.CDECL, (DataType.POINTER,) * 64, DataType.VOID)
def callback(stack_data):
    t = time()
    for i in range(1000000):
        repr(stack_data)
    print(time() - t)

callback(*(Pointer(randint(0, maxsize)) for i in range(64)))

And here was the results I got:

Without cache: 94.56286525726318
With cache: 67.78554844856262

Basically, the larger the index is, the more work is done into GetArgumentPtr because every times the whole structure is recalculated in order to grab its offset, etc. I can only imagine the result with a custom convention that forwards every calls back to Python. But yeah, the cache was negating that effect to an extent, but regardless if we add a second cache as @CookStar suggested or not, I think this is definitely something we should look into optimizing. The types are known on construction, no reason to repeat that on the fly imo.

Ayuto · 2021-02-04T18:08:37Z

To be honest, I didn't expect such a big difference in performance. But yeah, the Python stuff steals a lot CPU time.
The offset calculation also takes a little bit time, but the cache is negating the performance impact almost completely. It's because on the first call to repr every argument is cached and there are no more calls to GetArgumentPtr. I improved the CDECL convention a little bit by pre-calculating every offset and saving the results in a dynamically allocated array. GetArgumentPtr now looks like this:

void* x86MsCdecl::GetArgumentPtr(int iIndex, CRegisters* pRegisters)
{
	return (void *) (pRegisters->m_esp->GetValue<unsigned long>() + m_pOffsets[iIndex]);
}

And these are my results (just using 100000 calls):

# Many calls to stack_data
Without cache:                          10.5
With cache:                             7.7
Without cache and improved convention:  9.3
With cache and improved convention:     7.6

Not sure if it's worth improving it. I also changed the test code to only call repr once, but the hooked function repeatedly:

# Many calls to the hooked function
Without cache:                          12.5
With cache:                             14.2
Without cache and improved convention:  11.3
With cache and improved convention:     14.0

So, if the hook is only called once, the cache obviously slows it down, because it has to be evaluated, but doesn't contain cached data.

jordanbriere · 2021-02-05T05:25:46Z

With cache: 7.7
With cache and improved convention: 7.6

Well, that's surprisingly unexpected. I would have expected all the looping, switching, calculating, etc. at run-time to be much more costly than simply retrieving an int from base + (index * sizeof). Either there is something else into play here, or the compiler does a very good job at optimizing it and deserve a raise. 😄

So, if the hook is only called once, the cache obviously slows it down, because it has to be evaluated, but doesn't contain cached data.

Yeah, that's unfortunately the downside of caching; it's a significant improvement only if it is used, otherwise it's pretty much a waste of lookup. Though, perhaps it's a good trade-off if we consider that most plugins hook the same stuff and that, even if there is only one callback, scripters are not particularly known to re-use their data (eh? 👀) so that optimizes those scenarios as well. For the later, one could argue that this is their responsibility; and they would be entirely right.

However, I will completely backtrack and say that we should keep the cache removed. The reason is quite simple; while the improvement is major at 1m iterations, it is greatly less at 100k as illustrated by your metrics, and would be next to null at ~10 (which would most likely even be exaggerated to consider a server having 10 callbacks on the same hook). But SP without plugins has many single-callback hooks by default, which would be slower by anticipating the fact that maybe perhaps maybe not many plugins will manually hook the same functions. So yeah, I think the best call after all is to; if we judge a function is being hooked very often, let's just provide listeners that re-use the objects like we did for the buttons for example.

As for the PR, I've made quite a bit of testing and haven't noticed any issues and everything seems to be working fine on my side. The only issue I stumbled upon is the one referenced above that isn't directly related to this PR so I will just go ahead and approve it as is. 👍

Source-Python-Dev-Team/Source.Python#384

* Preserve the registers from the pre-hook, so you can use it optionally in a post-hook * Speed up GetArgumentPtr by pre-calculating the stack offsets Source-Python-Dev-Team/Source.Python#384

jordanbriere · 2021-02-06T20:13:21Z

Ran this code again and Ayuto/DynamicHooks@ed07f4e had quite a good impact after all:

81.87904047966003

Thanks for implementing it! :)

Ayuto added 5 commits February 1, 2021 20:09

Remove redundant cpp files

efde9d7

Updated dynamic hooks

3eaaa05

Disable caching

3899c3a

Added the possibility to access pre-registers in a post-hook

121c038

Added context manager to temporarily set use_pre_registers

eef7b84

Ayuto requested a review from jordanbriere February 1, 2021 20:26

jordanbriere mentioned this pull request Feb 4, 2021

KeyError: '__builtins__' #385

Closed

jordanbriere approved these changes Feb 5, 2021

View reviewed changes

Ayuto added a commit to Ayuto/DynamicHooks that referenced this pull request Feb 6, 2021

Speed up GetArgumentPtr by pre-calculating the stack offsets

ed07f4e

Source-Python-Dev-Team/Source.Python#384

Updated DynamicHooks

be55d6d

Ayuto merged commit bcea09e into master Feb 6, 2021

Ayuto deleted the pre_post_registers branch February 6, 2021 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added the possibility to access pre-registers in a post-hook #384

Added the possibility to access pre-registers in a post-hook #384

Uh oh!

Ayuto commented Feb 1, 2021 •

edited

Loading

Uh oh!

jordanbriere commented Feb 2, 2021

Uh oh!

CookStar commented Feb 2, 2021

Uh oh!

Ayuto commented Feb 2, 2021

Uh oh!

jordanbriere commented Feb 2, 2021

Uh oh!

Ayuto commented Feb 4, 2021

Uh oh!

jordanbriere commented Feb 5, 2021

Uh oh!

jordanbriere commented Feb 6, 2021

Uh oh!

Uh oh!

Added the possibility to access pre-registers in a post-hook #384

Added the possibility to access pre-registers in a post-hook #384

Uh oh!

Conversation

Ayuto commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordanbriere commented Feb 2, 2021

Uh oh!

CookStar commented Feb 2, 2021

Uh oh!

Ayuto commented Feb 2, 2021

Uh oh!

jordanbriere commented Feb 2, 2021

Uh oh!

Ayuto commented Feb 4, 2021

Uh oh!

jordanbriere commented Feb 5, 2021

Uh oh!

jordanbriere commented Feb 6, 2021

Uh oh!

Uh oh!

Ayuto commented Feb 1, 2021 •

edited

Loading