-
Notifications
You must be signed in to change notification settings - Fork 184
Migrate to Neuron Runtime 2.X in model.py #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
inferentia/README.md
Outdated
| to/from the model in correct order irrespective of how they are | ||
| specified in `config.pbtxt`. The indices should be consecutive | ||
| integers starting from 0. Additionally, `--neuron-core-range` | ||
| provides the neuron cores to be used while serving these models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifies the neuron ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanmayv25 how does the user specify non contiguous neuron cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support that because I believe it is not important... The data parallel operations need to be contiguous to derive best performance from caching..
| dt, shape = self.output_dict[name] | ||
| output_tensor = pb_utils.Tensor(name, | ||
| merged_result.astype(pb_utils.triton_string_to_numpy(dt))) | ||
| for i in range(len(self.output_dict)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an edge case where the outputs in model config can be ouput__0 and output__2 (1 is skipped). In this case range would not be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use for i in self.output_dict? Similarly for input.
We do not care the order in which the outputs are read as long as they use the right index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weed out the edge case in the validation logic in __validate_input_dict /validate_output_dict..
Basically, we don't support skipping of indices..
We do not care the order in which the outputs are read as long as they use the right index.
I think that is true.. I will fix it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we don't support skipping of indices..
Why do we want to skip it? The user should be allowed to ask for a subset of the outputs from the model, like with all other backends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not do this for inputs as the order does matter for placing the input in the list.
| type=int, | ||
| default=4, | ||
| help='The number of available neuron cores') | ||
| parser.add_argument('--neuron_core_range', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set a default core range. Maybe just the first core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would want the user to be aware of this option and not silently introduce the limited executions..
| config_input['name'], config_input['data_type'], | ||
| config_input['dims'] | ||
| ] | ||
| expected_input_count = expected_input_count + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+=?
| self.input_dict[config_input['name']] = [ | ||
| config_input['data_type'], config_input['dims'] | ||
| index = self._validate_and_get_index(config_input['name']) | ||
| self.input_dict[index] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a dict if index is used as key? List seems to be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle cases like INPUT__0 and INPUT__2, where INPUT__1 is missing. Using a dict helps in detecting missing indices with minimal iterations across input lists.
inferentia/README.md
Outdated
|
|
||
| Additionally, `--neuron-core-range` specifies the neuron cores to | ||
| be used while serving this models. Currently, only | ||
| `torch.neuron.DataParallel()` mode is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you specify what that means? Something like:
Currently, the core used is specified using `torch.neuron.DataParallel()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Small nit
No description provided.