Skip to content

Added the possibility to access pre-registers in a post-hook #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 6, 2021

Conversation

Ayuto
Copy link
Member

@Ayuto Ayuto commented Feb 1, 2021

This PR adds the possibility to access registers in a post-hook that were saved before the pre-hook was called. It fixes the issue that is explained here:
https://forums.sourcepython.com/viewtopic.php?p=7542#p7542

It's up to the programmer to decide whether or not he wants to access the pre-hook state or the post-hook state, because there are certainly scenarios were you need the pre-hook state or the post-hook state.

See also:
https://github.com/Ayuto/DynamicHooks/tree/pre_post_registers

The only problem I can think of (in this implementation) is that it won't work very well for recursive functions. To fix that as well, we would need a stack (similar to what we did with the return address). Oh, and if the stack is modified, it also won't fix the problem.

@jordanbriere What do you think?

Before we merge it, we definitely need to do some more tests and recompile DynamicHooks for Linux (Windows is already included in one of the commits).

from entities.hooks import EntityCondition
from entities.hooks import EntityPostHook
from entities.hooks import EntityPreHook

from memory.hooks import use_pre_registers

@EntityPreHook(EntityCondition.is_player, 'drop_weapon')
def pre_on_drop_weapon(stack_data):
    print(f'PRE: this = {stack_data[0].address}')

@EntityPostHook(EntityCondition.is_player, 'drop_weapon')
def post_on_drop_weapon(stack_data, ret_val):
    print(f'POST FALSE: this = {stack_data[0].address}')
    with use_pre_registers(stack_data):
        print(f'POST CORRECT: this = {stack_data[0].address}')
PRE: this = 540996256
POST FALSE: this = 11524119
POST CORRECT: this = 540996256

@Ayuto Ayuto requested a review from jordanbriere February 1, 2021 20:26
@jordanbriere
Copy link
Contributor

Looks good to me after minimal testing!

My only concern, although I understand the reasoning behind it with this implementation, would probably be the removal of caching. I can certainly see that resulting into a significant increase in computing time for hooks that have many callbacks. More so for functions that have many parameters because the conventions are calculating their offsets on the fly by traversing the whole structure type by type aligning them, etc. which means that not only they have to be wrapped by Python every time but also add quite a lot of looping and calculating every times an argument is retrieved or mutated. Perhaps calculating those only once on convention's construction could be worth considering.

I will do more testings and profiling when I get the chance.

@CookStar
Copy link
Contributor

CookStar commented Feb 2, 2021

Can't we just add another cache?

@Ayuto
Copy link
Member Author

Ayuto commented Feb 2, 2021

Yeah, that would probably work very well. But first, I would also like to do some measuring, because I don't think it has a high impact on performance.

@jordanbriere
Copy link
Contributor

I've tested the following code just now with and without cache:

from memory import *
from time import *
from random import *
from sys import *

@Callback(Convention.CDECL, (DataType.POINTER,) * 64, DataType.VOID)
def callback(stack_data):
    t = time()
    for i in range(1000000):
        repr(stack_data)
    print(time() - t)

callback(*(Pointer(randint(0, maxsize)) for i in range(64)))

And here was the results I got:

Without cache: 94.56286525726318
With cache: 67.78554844856262

Basically, the larger the index is, the more work is done into GetArgumentPtr because every times the whole structure is recalculated in order to grab its offset, etc. I can only imagine the result with a custom convention that forwards every calls back to Python. But yeah, the cache was negating that effect to an extent, but regardless if we add a second cache as @CookStar suggested or not, I think this is definitely something we should look into optimizing. The types are known on construction, no reason to repeat that on the fly imo.

@Ayuto
Copy link
Member Author

Ayuto commented Feb 4, 2021

To be honest, I didn't expect such a big difference in performance. But yeah, the Python stuff steals a lot CPU time.
The offset calculation also takes a little bit time, but the cache is negating the performance impact almost completely. It's because on the first call to repr every argument is cached and there are no more calls to GetArgumentPtr. I improved the CDECL convention a little bit by pre-calculating every offset and saving the results in a dynamically allocated array. GetArgumentPtr now looks like this:

void* x86MsCdecl::GetArgumentPtr(int iIndex, CRegisters* pRegisters)
{
	return (void *) (pRegisters->m_esp->GetValue<unsigned long>() + m_pOffsets[iIndex]);
}

And these are my results (just using 100000 calls):

# Many calls to stack_data
Without cache:                          10.5
With cache:                             7.7
Without cache and improved convention:  9.3
With cache and improved convention:     7.6

Not sure if it's worth improving it. I also changed the test code to only call repr once, but the hooked function repeatedly:

# Many calls to the hooked function
Without cache:                          12.5
With cache:                             14.2
Without cache and improved convention:  11.3
With cache and improved convention:     14.0

So, if the hook is only called once, the cache obviously slows it down, because it has to be evaluated, but doesn't contain cached data.

@jordanbriere
Copy link
Contributor

With cache: 7.7
With cache and improved convention: 7.6

Well, that's surprisingly unexpected. I would have expected all the looping, switching, calculating, etc. at run-time to be much more costly than simply retrieving an int from base + (index * sizeof). Either there is something else into play here, or the compiler does a very good job at optimizing it and deserve a raise. 😄

So, if the hook is only called once, the cache obviously slows it down, because it has to be evaluated, but doesn't contain cached data.

Yeah, that's unfortunately the downside of caching; it's a significant improvement only if it is used, otherwise it's pretty much a waste of lookup. Though, perhaps it's a good trade-off if we consider that most plugins hook the same stuff and that, even if there is only one callback, scripters are not particularly known to re-use their data (eh? 👀) so that optimizes those scenarios as well. For the later, one could argue that this is their responsibility; and they would be entirely right.

However, I will completely backtrack and say that we should keep the cache removed. The reason is quite simple; while the improvement is major at 1m iterations, it is greatly less at 100k as illustrated by your metrics, and would be next to null at ~10 (which would most likely even be exaggerated to consider a server having 10 callbacks on the same hook). But SP without plugins has many single-callback hooks by default, which would be slower by anticipating the fact that maybe perhaps maybe not many plugins will manually hook the same functions. So yeah, I think the best call after all is to; if we judge a function is being hooked very often, let's just provide listeners that re-use the objects like we did for the buttons for example.

As for the PR, I've made quite a bit of testing and haven't noticed any issues and everything seems to be working fine on my side. The only issue I stumbled upon is the one referenced above that isn't directly related to this PR so I will just go ahead and approve it as is. 👍

Ayuto added a commit to Ayuto/DynamicHooks that referenced this pull request Feb 6, 2021
* Preserve the registers from the pre-hook, so you can use it optionally in a post-hook

* Speed up GetArgumentPtr by pre-calculating the stack offsets

Source-Python-Dev-Team/Source.Python#384
@Ayuto Ayuto merged commit bcea09e into master Feb 6, 2021
@Ayuto Ayuto deleted the pre_post_registers branch February 6, 2021 13:06
@jordanbriere
Copy link
Contributor

Ran this code again and Ayuto/DynamicHooks@ed07f4e had quite a good impact after all:

81.87904047966003

Thanks for implementing it! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants