Configuring and running experiments

Overview

The large number of component variants supported in this repository creates the need for configuring many components and their parameters before running a specific experiment. We rely on features provided by Hydra to make this process easier.

At the core, three main Hydra configs—train.yaml (generic training), eval.yaml (running evaluation), and unlearn.yaml (unlearning training)—provide the base configuration for the main types of experiments. These are then extended by experiment-specific configs and command-line overrides. We set up experiment configs for common usecases like LLaMA-2 unlearning on TOFU, LLaMA-2 evaluation on MUSE etc. which set the required datasets, models, and base train and eval configs to make things easier.

Experiment output directories are constructed based on the task mode (train / eval / unlearn) and the task name (provided by the user) as ./saves/${mode}/${task_name}. The experiment logging will display where the model checkpoints, logs and evaluation dumps are stored.

Example Commands

## runs a finetuning using experiment details from configs/finetune/tofu/default.yaml
python src/train.py --config-name=train.yaml experiment=finetune/tofu/default task_name=SAMPLE_TRAIN

## runs an unlearning training using experiment details from configs/unlearn/tofu/default.yaml
# output directory will be constructed as: saves/unlearn/SAMPLE_UNLEARN
python src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default task_name=SAMPLE_TRAIN


## runs an evaluation using experiment details from configs/eval/muse/default.yaml
python src/eval.py --config-name=eval.yaml experiment=eval/muse/default task_name=SAMPLE_EVAL
## Note: eval.yaml is the default config set in src/eval.py, so this argument can be omitted

## an extensively filled out configuration for an unlearning experiment
python src/train.py --config-name=unlearn.yaml experiment=unlearn/muse/default data_split=News \
trainer=NPO trainer.method_args.retain_loss_type=KL task_name=llama2_books_NPO_KL \
retain_logs_path=saves/eval/muse_books_retain/MUSE_EVAL.json

## an even more extensively filled out configuration for an unlearning experiment
python src/train.py --config-name=unlearn.yaml \
experiment=unlearn/tofu/default.yaml \
task_name=NPO_unlearn_tofu_llama_8 \
model=Llama-3.1-8B-Instruct \
model.model_args.pretrained_model_name_or_path=saves/finetune/path_model_llama \
trainer=NPO trainer.args.per_device_train_batch_size=4 \
forget_split=forget05 retain_split=retain95 \
retain_logs_path=saves/eval/tofu_retain95/TOFU_EVAL.json \
paths.output_dir=saves/unlearn/NPO/evals

Note

The unlearning experiments support evaluation during the unlearning finetuning. But this is supported only when a single accelerator process is used, checkpoints must be stored and evaluated after training.

Commonly Overridden Arguments

To understand the structure of an evaluation config and the kind of available parameters for overriding, refer to: configs/experiment/examples/tofu_eval.yaml.

To understand the structure of an unlearning config and the kind of available parameters for overriding, refer to: configs/experiment/examples/muse_unlearn.yaml.

The following tables list the most commonly used arguments while running experiments.

Model Settings

Argument	Description and examples
`model`	Selecting the model. Example: `model=Llama-2-7b-hf`
`model.model_args.pretrained_model_name_or_path`	Specifies the model checkpoint or HuggingFace ID.
`model.tokenizer_args.pretrained_model_name_or_path`	Specifies the tokenizer location. Make sure this matches the model from above by providing model path as needed..
`model.template_args`	Optional chat templating parameters (e.g., start/end tags). Example: `apply_chat_template: false, user_start_tag: "[INST] "`

Trainer Settings

Argument	Description and examples
`trainer`	Overall trainer or unlearning method selection, decides the finetuning algorithm. Example: `trainer=NPO` or `trainer=finetune`
`trainer.args`	Main training hyperparameters like `per_device_train_batch_size`, `per_device_eval_batch_size`, `gradient_accumulation_steps`, `learning_rate`, `num_train_epochs`, `optim` and other HuggingFace TrainingArguments.
`trainer.method_args`	Method-specific parameters for unlearning trainers. Example: `retain_loss_type`, NPO hyperparams like `gamma, alpha, beta` etc.

Data Settings

Argument	Description and examples
`data`	Overall data configuration/format. Example: `data=unlearn`, `data=finetune`.
`data.forget, data.retain, data.anchor` etc.	Set sub-datasets in the overall dataset using `data.forget=MUSE_forget data.retain=MUSE_retain`, set which sub-dataset to index over (others are randomly sampled) using `data.anchor=forget`
`data_split/forget_split/retain_split`	These arguments are custom to specific datasets and are used to populate dataset paths. `data_split` specifies the overall dataset split or type. Example: `data_split=News` or `data_split=Books` `forget_split/retain_split` splits are used to use various sub-parts of the dataset. Example: `forget_split=forget01 retain_split=retain99`

Experiment Settings

Argument	Description and examples
`task_name`	Experiment identifier used to generate custom output paths. Example: `task_name=llama2_books_NPO_KL`.
`eval`	Overall evaluation benchmark configuration selection. Example: `eval=muse`.
`retain_logs_path`	Path to load eval logs of retain models used some evaluation metrics Example: `retain_logs_path=saves/eval/muse_books_retain/MUSE_EVAL.json`.
`paths`	Contains attributes used to decide path configuration like `paths.output_dir=$LOCAL_PATH`.

Simple Finetuning

In addition to running unlearning based finetuning, we also support simple finetuning training with a given dataset.

These use src/train.py with the train.yaml config to set up a standard supervised training environment. Parameters such as learning rate, batch size, and optimizer settings can be adjusted via experiment-specific configs or command-line overrides.

Example:

python src/train.py --config-name=train.yaml experiment=finetune/tofu/default \
  trainer.args.learning_rate=5e-5 task_name=llama3.2-1B_finetune_example

Distributed Training

Distributed training configurations enable scaling experiments across multiple devices or nodes. In most cases, default distributed settings from configs/accelerate/default_config.yaml are sufficient. You can run distributed training with the below command that uses DeepSpeed for distributed training (which is our default setup):

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
  --config_file configs/accelerate/default_config.yaml --main_process_port 18765 \
  src/train.py --config-name=unlearn.yaml experiment=unlearn/muse/default.yaml task_name=DISTRIBUTED_TRAIN

You may also simply run CUDA_VISIBLE_DEVICES=0,1,.. python ... to leverage Accelerate's DDP or model parallel. For model parallel you can use device_map="auto" in the model_args while loading the model.

Caution

Train runs using multiple accelerate processes will not be able to run evaluations during training. To achieve this, you may want to use DDP/model parallel (see #94) or use a single GPU to run the evaluation code directly on a saved model checkpoint like below

CUDA_VISIBLE_DEVICES=0 python src/eval.py experiment=eval/muse/default.yaml task_name=SAMPLE_EVAL \
model.model_args.pretrained_model_name_or_path=saves/unlearn/muse_unlearn_exp \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments.md

experiments.md

Configuring and running experiments

Overview

Table of Contents

Example Commands

Commonly Overridden Arguments

Model Settings

Trainer Settings

Data Settings

Experiment Settings

Simple Finetuning

Distributed Training

Files

experiments.md

Latest commit

History

experiments.md

File metadata and controls

Configuring and running experiments

Overview

Table of Contents

Example Commands

Commonly Overridden Arguments

Model Settings

Trainer Settings

Data Settings

Experiment Settings

Simple Finetuning

Distributed Training