You can set up the simulator by installing its Python dependencies. We recommend starting with a fresh Python environment.
# Create and activate new Python environment
conda create -n sim python=3.11
conda activate sim
# Install dependencies
pip install -r requirements.txtSplitwiseSim takes in a hierarchical set of YAML configuration files as input, and it produces several CSV files as output. It uses Hydra for configuration management. You can learn more about configuration management from the Hydra docs.
The top-level configuration file for SplitwiseSim is config.yaml, which points to lower-level configurations specified by other files in the configs/ directory. Specifically, config.yaml captures the following key components:
- cluster: the provisioned server SKUs in the cluster, along with their respective counts.
- trace: request trace that specifies the set of requests that arrive into the cluster.
- router: the cluster-level router that routes incoming requests to application-level schedulers; currently a no-op.
- arbiter: the cluster-level arbiter that manages compute resources between applications to support autoscaling; currently a no-op.
- application: the logical endpoint that the requests target, which specifies the model and the set of instances on which the request runs; currently, we support only one application.
- model_repo: the set of models (LLMs) available to run in the cluster; used for dynamic model instantiation.
- orchestrator_repo: the set of application resource orchestrators (i.e., schedulers and allocators) in the cluster; used for dynamic application management.
- hardware_repo: the set of available SKUs that can be provisioned in the cluster; used for dynamic server instantiation.
- performance_model: an analytical model that helps estimate request runtimes with different batch, model, and hardware configurations.
- start_state: starting state for the cluster, which helps simplify evaluation.
Several other aspects can be configured; please see config.yaml for details.
SplitwiseSim generates the following key outputs:
- Summary of application-level metrics (
summary.csv) - Per-request metrics for each completed request for each application (
detailed/{application_id}.csv) - Request node-level metrics (
request_nodes.csv) - Instance-level execution metrics (in
instances/, withdebugenabled)
We provide various utility functions to process outputs, as shown in notebooks/example.ipynb and notebooks/plots.ipynb.
Simply modify config.yaml to and execute python run.py.
- Use controller us3-dp in config.yaml
- Use annotated traces ES_26_dp.csv in configs/trace/enterprise_sydney.yaml
The following knobs are present in config.yaml to freely allow changes in execution configuration.
- feed_async: True/False to enable/disable the insertion of async requests whevnever memory utilisation falls below 0.5
- feed_async_granularity: Specify the number of async requests to insert at a time.
- scaling_level: Specify 0 for no scaling, 1 for scaling from/to spot only and 2 for inter model scaling along eith spot donations.
- scaling_interval: The number of seconds to wait between two scaling events per model endpoint. Use -1 to disable this knob, i.e., no restriction on the number of scaling events w.r.t time.
All short term scaling scripts should still run as they are!
Run with
python3 run_kunal.py trace.filename=ES_26 \
short_term_scaling=False \
long_term_scaling=True \
global_arbiter.arima_traces=$PWD/traces/forecasts/ \
global_arbiter.post_processing_strategy=<STRATEGY>where STRATEGY can be immediate, delay_changes, keep_maximum_instances, keep_minimum_instances.
Run with
python3 run.py trace.filename=final_data_day_1 short_term_scaling=True long_term_scaling=True global_arbiter.arima_traces=$PWD/traces/forecasts/ controller.regions.0.arbiter=global_arbiter_ARIMA_checking controller.regions.1.arbiter=global_arbiter_ARIMA_checking controller.regions.2.arbiter=global_arbiter_ARIMA_checking global_arbiter.arima_aware_arbiter=True
python3 run.py trace.filename=final_data_day_1 short_term_scaling=True long_term_scaling=True global_arbiter.arima_traces=$PWD/traces/forecasts/ controller.regions.0.arbiter=global_arbiter_memory_utilization controller.regions.1.arbiter=global_arbiter_memory_utilization controller.regions.2.arbiter=global_arbiter_memory_utilization global_arbiter.arima_aware_arbiter=True
python3 run.py trace.filename=final_data_day_1 short_term_scaling=True long_term_scaling=True global_arbiter.arima_traces=$PWD/traces/forecasts/ controller.regions.0.arbiter=global_arbiter_short_term_scaling controller.regions.1.arbiter=global_arbiter_short_term_scaling controller.regions.2.arbiter=global_arbiter_short_term_scaling global_arbiter.arima_aware_arbiter=True