Skip to content

feature/Recipes: "Train on this dataset" card with training configuration #6178

@LeoBorcherding

Description

@LeoBorcherding

Summary

Add a Training Configuration card to the Recipe Studio graph as a first-class node type.
A completed dataset generation pipeline can connect directly into this card, which holds a
full training config and launches a training run on the generated output.

Motivation

This card is the first step toward a closed-loop pipeline:

Generate dataset -> Train model -> Benchmark model -> Regenerate dataset -> Benchmark ...
repeat until a target benchmark score is reached

Proposed design

A new node type in the Recipe Studio graph: Train

  • Connects downstream of the recipe output node
  • Card holds the full training configuration: base model, LoRA rank, epochs, batch size,
    learning rate, max seq length, output adapter path
  • Dataset source is auto-wired from the upstream recipe output when artifact_path is set
    on the completed run, with an override option for manual control
  • When the graph runs, dataset generation completes first, then the training job starts
    on the freshly generated output

Data to wire up

  • Recipe output: RecipePayload.run.artifact_path, run.dataset_name, run.output_formats
  • Training input: TrainingConfigState.datasetSource, uploadedFile, dataset, datasetFormat

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions