Thinking Agent

Overview • Getting Started • Evaluation • Results • Citation

Overview

AgentThink is a systematic evaluation framework that automatically identifies failure patterns in large language models.

Getting Started

First, clone the repository and install the required packages:

cd ThinkingAgent
pip install -r requirements.txt

The framework consists of two main components:

format_message.py: Processes and formats interaction logs into a standardized format
analyze_agent_think.py: Analyzes the formatted interactions and produces overthinking scores

Configuration

The framework uses a config.toml file to configure the LLM settings:

[llm]
model = "claude-3-5-sonnet-20241022"
api_key = ""  # Set via environment variable LLM_API_KEY
temperature = 0.0
max_output_tokens = 4096
num_retries = 3
retry_min_wait = 4
retry_max_wait = 10
retry_multiplier = 2

Evaluation

The evaluation process follows these steps:

Data Collection: Gather interaction logs from models performing agentic tasks
Message Formatting: Use format_message.py to standardize the interaction format
Analysis: Run analyze_overthinking.py to evaluate overthinking behaviors
Scoring: Generate scores (0-10) for each interaction based on:
- 0-3: Always interacting with the environment
- 4-7: Sometimes relies on internal reasoning
- 8-10: Completely relies on internal reasoning

Usage

To analyze a set of interactions:

# Load configuration and initialize LLM
config = load_config()
llm = LLM(config)

# Analyze responses
analyze_responses("path/to/interactions", iteration_number=None)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Models		Models
config		config
llm		llm
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
analysis_results.jsonl		analysis_results.jsonl
analyze_overthinking.py		analyze_overthinking.py
building_litellm.md		building_litellm.md
config.toml		config.toml
format_message.py		format_message.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Thinking Agent

Overview

Getting Started

Configuration

Evaluation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AlexCuadron/ThinkingAgent

Folders and files

Latest commit

History

Repository files navigation

Thinking Agent

Overview

Getting Started

Configuration

Evaluation

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages