diff --git a/.github/workflows/claude-code-review.yml b/.github/workflows/claude-code-review.yml new file mode 100644 index 000000000..4caf96a2f --- /dev/null +++ b/.github/workflows/claude-code-review.yml @@ -0,0 +1,54 @@ +name: Claude Code Review + +on: + pull_request: + types: [opened, synchronize] + # Optional: Only run on specific file changes + # paths: + # - "src/**/*.ts" + # - "src/**/*.tsx" + # - "src/**/*.js" + # - "src/**/*.jsx" + +jobs: + claude-review: + # Optional: Filter by PR author + # if: | + # github.event.pull_request.user.login == 'external-contributor' || + # github.event.pull_request.user.login == 'new-developer' || + # github.event.pull_request.author_association == 'FIRST_TIME_CONTRIBUTOR' + + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: read + issues: read + id-token: write + + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + + - name: Run Claude Code Review + id: claude-review + uses: anthropics/claude-code-action@v1 + with: + claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + prompt: | + Please review this pull request and provide feedback on: + - Code quality and best practices + - Potential bugs or issues + - Performance considerations + - Security concerns + - Test coverage + + Use the repository's CLAUDE.md for guidance on style and conventions. Be constructive and helpful in your feedback. + + Use `gh pr comment` with your Bash tool to leave your review as a comment on the PR. + + # See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md + # or https://docs.anthropic.com/en/docs/claude-code/sdk#command-line for available options + claude_args: '--allowed-tools "Bash(gh issue view:*),Bash(gh search:*),Bash(gh issue list:*),Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr list:*)"' + diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml new file mode 100644 index 000000000..ae36c007f --- /dev/null +++ b/.github/workflows/claude.yml @@ -0,0 +1,50 @@ +name: Claude Code + +on: + issue_comment: + types: [created] + pull_request_review_comment: + types: [created] + issues: + types: [opened, assigned] + pull_request_review: + types: [submitted] + +jobs: + claude: + if: | + (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) || + (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude'))) + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: read + issues: read + id-token: write + actions: read # Required for Claude to read CI results on PRs + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + + - name: Run Claude Code + id: claude + uses: anthropics/claude-code-action@v1 + with: + claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + + # This is an optional setting that allows Claude to read CI results on PRs + additional_permissions: | + actions: read + + # Optional: Give a custom prompt to Claude. If this is not specified, Claude will perform the instructions specified in the comment that tagged it. + # prompt: 'Update the pull request description to include a summary of changes.' + + # Optional: Add claude_args to customize behavior and configuration + # See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md + # or https://docs.anthropic.com/en/docs/claude-code/sdk#command-line for available options + # claude_args: '--model claude-opus-4-1-20250805 --allowed-tools Bash(gh pr:*)' + diff --git a/.gitignore b/.gitignore index ab64fddf3..a55667e96 100644 --- a/.gitignore +++ b/.gitignore @@ -14,8 +14,15 @@ notebooks/.env pages/research/local_research/ .DS_Store .vscode +agent.toml +content_suggestions/ +.claude/ +.qodo/ # app .next node_modules -prompts \ No newline at end of file +prompts + +#mcp +.mcp.json diff --git a/README.md b/README.md index d1b6d7ca3..9da0484bf 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,20 @@ # Prompt Engineering Guide +
+ Sponsored by     +
+ Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools. Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, lectures, references, and tools related to prompt engineering for LLMs. 🌐 [Prompt Engineering Guide (Web Version)](https://www.promptingguide.ai/) -🎉 We are excited to launch our brand new prompt engineering courses under the DAIR.AI Academy. [Join Now](https://dair-ai.thinkific.com/bundles/pro)! +🎉 We are excited to launch our new prompt engineering, RAG, and AI Agents courses under the DAIR.AI Academy. [Join Now](https://dair-ai.thinkific.com/bundles/pro)! + +The courses are meant to compliment this guide and provide a more hands-on approach to learning about prompt engineering, context engineering, and AI Agents. -Use code BLACKFRIDAY to get an extra 35% off. This offer ends on 29th November 2024. +Use code PROMPTING20 to get an extra 20% off. Happy Prompting! diff --git a/components/AnnouncementBar.tsx b/components/AnnouncementBar.tsx new file mode 100644 index 000000000..79f186ce6 --- /dev/null +++ b/components/AnnouncementBar.tsx @@ -0,0 +1,34 @@ +import React from 'react'; +import Link from 'next/link'; + +const AnnouncementBar: React.FC = () => { + return ( +
+ 🚀 Master Prompt Engineering and building AI Agents in our NEW courses! Use PROMPTING20 for 20% off ➜{' '} + + Enroll now + +
+ ); +}; + +export default AnnouncementBar; diff --git a/components/ContentFileNames.tsx b/components/ContentFileNames.tsx index 837ad452d..a0a64cbfb 100644 --- a/components/ContentFileNames.tsx +++ b/components/ContentFileNames.tsx @@ -15,15 +15,14 @@ const ContentFileNames = ({ section = 'research', lang = 'en' }) => { return ( {fileNames.map(({ slug, title }, index) => ( - } - title={title} - href={`/${section}/${slug}`} - > - {/* Add your desired content here, or an empty fragment if no content is needed */} - <> - + + } + title={title} + href={`/${section}/${slug}`} + children={<>} + /> + ))} ); diff --git a/components/TabsComponent.tsx b/components/TabsComponent.tsx index 8d52df68d..30b02556e 100644 --- a/components/TabsComponent.tsx +++ b/components/TabsComponent.tsx @@ -31,13 +31,13 @@ response = client.chat.completions.create( }; return ( - tab.model)}> - {tabsData.map((tab, index) => ( + tab.model)} children={ + tabsData.map((tab, index) => (
{renderCodeBlock(tab)}
- ))} -
+ )) + } /> ); }; diff --git a/components/copy-to-clipboard.tsx b/components/copy-to-clipboard.tsx index d5d08822b..5f206cc66 100644 --- a/components/copy-to-clipboard.tsx +++ b/components/copy-to-clipboard.tsx @@ -1,3 +1,5 @@ +'use client' + import type { ComponentProps, ReactElement } from 'react' import { useCallback, useEffect, useState } from 'react' import { CheckIcon } from './check' diff --git a/components/pre.tsx b/components/pre.tsx index 67b7a0206..ef2310249 100644 --- a/components/pre.tsx +++ b/components/pre.tsx @@ -1,3 +1,5 @@ +'use client' + import cn from 'clsx' import type { ComponentProps, ReactElement } from 'react' import { useCallback, useRef } from 'react' @@ -16,6 +18,7 @@ export const Pre = ({ }: ComponentProps<'pre'> & { filename?: string hasCopyCode?: boolean + children?: React.ReactNode }): ReactElement => { const preRef = useRef(null); @@ -31,7 +34,7 @@ export const Pre = ({ const renderChildren = () => { if (React.isValidElement(children) && children.type === 'code') { - return children.props.children; + return children.props.children as React.ReactNode; } return children; }; diff --git a/img/4o-image-generation/4o_image_generation.png b/img/4o-image-generation/4o_image_generation.png new file mode 100644 index 000000000..f661b6d3d Binary files /dev/null and b/img/4o-image-generation/4o_image_generation.png differ diff --git a/img/4o-image-generation/art_deco_starship.png b/img/4o-image-generation/art_deco_starship.png new file mode 100644 index 000000000..a0248d1c3 Binary files /dev/null and b/img/4o-image-generation/art_deco_starship.png differ diff --git a/img/4o-image-generation/combine_images.png b/img/4o-image-generation/combine_images.png new file mode 100644 index 000000000..2983e4813 Binary files /dev/null and b/img/4o-image-generation/combine_images.png differ diff --git a/img/4o-image-generation/combined.png b/img/4o-image-generation/combined.png new file mode 100644 index 000000000..9a0e32a97 Binary files /dev/null and b/img/4o-image-generation/combined.png differ diff --git a/img/4o-image-generation/image_gen_API.JPG b/img/4o-image-generation/image_gen_API.JPG new file mode 100644 index 000000000..639502760 Binary files /dev/null and b/img/4o-image-generation/image_gen_API.JPG differ diff --git a/img/4o-image-generation/inpainting_combined.png b/img/4o-image-generation/inpainting_combined.png new file mode 100644 index 000000000..92806f079 Binary files /dev/null and b/img/4o-image-generation/inpainting_combined.png differ diff --git a/img/4o-image-generation/meerkat.png b/img/4o-image-generation/meerkat.png new file mode 100644 index 000000000..92ec72b63 Binary files /dev/null and b/img/4o-image-generation/meerkat.png differ diff --git a/img/4o-image-generation/sam_and_jony.png b/img/4o-image-generation/sam_and_jony.png new file mode 100644 index 000000000..83bb14c8a Binary files /dev/null and b/img/4o-image-generation/sam_and_jony.png differ diff --git a/img/4o-image-generation/sam_and_jony_ghiblified.png b/img/4o-image-generation/sam_and_jony_ghiblified.png new file mode 100644 index 000000000..d62d8e5f5 Binary files /dev/null and b/img/4o-image-generation/sam_and_jony_ghiblified.png differ diff --git a/img/4o-image-generation/t-shirt.png b/img/4o-image-generation/t-shirt.png new file mode 100644 index 000000000..fced2b3d5 Binary files /dev/null and b/img/4o-image-generation/t-shirt.png differ diff --git a/img/4o-image-generation/teapot_1.png b/img/4o-image-generation/teapot_1.png new file mode 100644 index 000000000..c6f7c2fb9 Binary files /dev/null and b/img/4o-image-generation/teapot_1.png differ diff --git a/img/4o-image-generation/teapot_2.png b/img/4o-image-generation/teapot_2.png new file mode 100644 index 000000000..2b83a836b Binary files /dev/null and b/img/4o-image-generation/teapot_2.png differ diff --git a/img/4o-image-generation/text_edit_after.png b/img/4o-image-generation/text_edit_after.png new file mode 100644 index 000000000..227ec2659 Binary files /dev/null and b/img/4o-image-generation/text_edit_after.png differ diff --git a/img/4o-image-generation/text_edit_before.png b/img/4o-image-generation/text_edit_before.png new file mode 100644 index 000000000..4b0c3375c Binary files /dev/null and b/img/4o-image-generation/text_edit_before.png differ diff --git a/img/4o-image-generation/text_edit_combined.png b/img/4o-image-generation/text_edit_combined.png new file mode 100644 index 000000000..e89ad44b4 Binary files /dev/null and b/img/4o-image-generation/text_edit_combined.png differ diff --git a/img/4o-image-generation/text_in_images.png b/img/4o-image-generation/text_in_images.png new file mode 100644 index 000000000..9032b4704 Binary files /dev/null and b/img/4o-image-generation/text_in_images.png differ diff --git a/img/4o-image-generation/text_prompt.JPG b/img/4o-image-generation/text_prompt.JPG new file mode 100644 index 000000000..f37f199c8 Binary files /dev/null and b/img/4o-image-generation/text_prompt.JPG differ diff --git a/img/4o-image-generation/text_prompt_1.JPG b/img/4o-image-generation/text_prompt_1.JPG new file mode 100644 index 000000000..5e748857d Binary files /dev/null and b/img/4o-image-generation/text_prompt_1.JPG differ diff --git a/img/4o-image-generation/text_prompt_2.JPG b/img/4o-image-generation/text_prompt_2.JPG new file mode 100644 index 000000000..daf8f4313 Binary files /dev/null and b/img/4o-image-generation/text_prompt_2.JPG differ diff --git a/img/4o-image-generation/text_prompt_3.JPG b/img/4o-image-generation/text_prompt_3.JPG new file mode 100644 index 000000000..c1fa26c4f Binary files /dev/null and b/img/4o-image-generation/text_prompt_3.JPG differ diff --git a/img/4o-image-generation/tool_select.JPG b/img/4o-image-generation/tool_select.JPG new file mode 100644 index 000000000..5e811bb16 Binary files /dev/null and b/img/4o-image-generation/tool_select.JPG differ diff --git a/img/agents/Screenshot 2025-10-15 at 11.23.08.png b/img/agents/Screenshot 2025-10-15 at 11.23.08.png new file mode 100644 index 000000000..8b0b6e5c8 Binary files /dev/null and b/img/agents/Screenshot 2025-10-15 at 11.23.08.png differ diff --git a/img/agents/agent-components.png b/img/agents/agent-components.png new file mode 100644 index 000000000..edf948ced Binary files /dev/null and b/img/agents/agent-components.png differ diff --git a/img/agents/cs-persistent-storage.png b/img/agents/cs-persistent-storage.png new file mode 100644 index 000000000..3bb59976c Binary files /dev/null and b/img/agents/cs-persistent-storage.png differ diff --git a/img/agents/cs-planning.png b/img/agents/cs-planning.png new file mode 100644 index 000000000..9cf4f0d11 Binary files /dev/null and b/img/agents/cs-planning.png differ diff --git a/img/agents/cs-subagents.png b/img/agents/cs-subagents.png new file mode 100644 index 000000000..5f9a304fb Binary files /dev/null and b/img/agents/cs-subagents.png differ diff --git a/img/agents/cs-verification-agent.png b/img/agents/cs-verification-agent.png new file mode 100644 index 000000000..e83248445 Binary files /dev/null and b/img/agents/cs-verification-agent.png differ diff --git a/img/agents/customer-support-deep-agent.png b/img/agents/customer-support-deep-agent.png new file mode 100644 index 000000000..585537455 Binary files /dev/null and b/img/agents/customer-support-deep-agent.png differ diff --git a/img/agents/parallelization.png b/img/agents/parallelization.png new file mode 100644 index 000000000..757a3a90b Binary files /dev/null and b/img/agents/parallelization.png differ diff --git a/img/agents/prompt-chaining.png b/img/agents/prompt-chaining.png new file mode 100644 index 000000000..a8d09ec97 Binary files /dev/null and b/img/agents/prompt-chaining.png differ diff --git a/img/agents/routing.png b/img/agents/routing.png new file mode 100644 index 000000000..9f7162104 Binary files /dev/null and b/img/agents/routing.png differ diff --git a/img/agents/simple-dr-agent.png b/img/agents/simple-dr-agent.png new file mode 100644 index 000000000..76fe0daa0 Binary files /dev/null and b/img/agents/simple-dr-agent.png differ diff --git a/img/agents/task-planner-agent.png b/img/agents/task-planner-agent.png new file mode 100644 index 000000000..e96c7ff3d Binary files /dev/null and b/img/agents/task-planner-agent.png differ diff --git a/img/context-engineering-guide/context-engineering-diagram.jpg b/img/context-engineering-guide/context-engineering-diagram.jpg new file mode 100644 index 000000000..72600f307 Binary files /dev/null and b/img/context-engineering-guide/context-engineering-diagram.jpg differ diff --git a/img/context-engineering-guide/context-engineering-workflow.jpg b/img/context-engineering-guide/context-engineering-workflow.jpg new file mode 100644 index 000000000..71f1003ee Binary files /dev/null and b/img/context-engineering-guide/context-engineering-workflow.jpg differ diff --git a/img/deep-research/deep_research_OAI_post.JPG b/img/deep-research/deep_research_OAI_post.JPG new file mode 100644 index 000000000..6e95eacf8 Binary files /dev/null and b/img/deep-research/deep_research_OAI_post.JPG differ diff --git a/img/deep-research/deep_research_benchmark.JPG b/img/deep-research/deep_research_benchmark.JPG new file mode 100644 index 000000000..d6a84dcff Binary files /dev/null and b/img/deep-research/deep_research_benchmark.JPG differ diff --git a/img/deep-research/deep_research_browsecomp.JPG b/img/deep-research/deep_research_browsecomp.JPG new file mode 100644 index 000000000..c26d1f9be Binary files /dev/null and b/img/deep-research/deep_research_browsecomp.JPG differ diff --git a/img/deep-research/deep_research_clarify.JPG b/img/deep-research/deep_research_clarify.JPG new file mode 100644 index 000000000..b160d9bef Binary files /dev/null and b/img/deep-research/deep_research_clarify.JPG differ diff --git a/img/deep-research/deep_research_flowchart.JPG b/img/deep-research/deep_research_flowchart.JPG new file mode 100644 index 000000000..94f426619 Binary files /dev/null and b/img/deep-research/deep_research_flowchart.JPG differ diff --git a/img/deep-research/deep_research_pass_rate.JPG b/img/deep-research/deep_research_pass_rate.JPG new file mode 100644 index 000000000..93be4aa2c Binary files /dev/null and b/img/deep-research/deep_research_pass_rate.JPG differ diff --git a/img/deep-research/deep_research_prompt.JPG b/img/deep-research/deep_research_prompt.JPG new file mode 100644 index 000000000..5b3e5fcf6 Binary files /dev/null and b/img/deep-research/deep_research_prompt.JPG differ diff --git a/img/deep-research/deep_research_word_cloud.JPG b/img/deep-research/deep_research_word_cloud.JPG new file mode 100644 index 000000000..bdb0a2739 Binary files /dev/null and b/img/deep-research/deep_research_word_cloud.JPG differ diff --git a/img/reasoning-llms/agentic_rag.JPG b/img/reasoning-llms/agentic_rag.JPG new file mode 100644 index 000000000..a586c34da Binary files /dev/null and b/img/reasoning-llms/agentic_rag.JPG differ diff --git a/img/reasoning-llms/hybrid_reasoning_models.JPG b/img/reasoning-llms/hybrid_reasoning_models.JPG new file mode 100644 index 000000000..ccecb5ab0 Binary files /dev/null and b/img/reasoning-llms/hybrid_reasoning_models.JPG differ diff --git a/img/reasoning-llms/llm_as_a_judge.JPG b/img/reasoning-llms/llm_as_a_judge.JPG new file mode 100644 index 000000000..69476a116 Binary files /dev/null and b/img/reasoning-llms/llm_as_a_judge.JPG differ diff --git a/img/reasoning-llms/orchestrator_worker_LI_1.JPG b/img/reasoning-llms/orchestrator_worker_LI_1.JPG new file mode 100644 index 000000000..2fe58ce1b Binary files /dev/null and b/img/reasoning-llms/orchestrator_worker_LI_1.JPG differ diff --git a/pages/_app.tsx b/pages/_app.tsx index d7311fde9..4a5ee78ab 100644 --- a/pages/_app.tsx +++ b/pages/_app.tsx @@ -2,6 +2,7 @@ import '@fortawesome/fontawesome-svg-core/styles.css'; import type { AppProps } from 'next/app'; import Script from 'next/script'; import { Analytics } from '@vercel/analytics/react'; +import AnnouncementBar from '../components/AnnouncementBar'; import './style.css'; function MyApp({ Component, pageProps }: AppProps) { @@ -16,6 +17,7 @@ function MyApp({ Component, pageProps }: AppProps) { `} + diff --git a/pages/_meta.ar.json b/pages/_meta.ar.json index 8af657500..36d42f07f 100644 --- a/pages/_meta.ar.json +++ b/pages/_meta.ar.json @@ -12,14 +12,6 @@ "notebooks": "دفاتر ملاحظات", "datasets": "مجموعات البيانات", "readings": "قراءات إضافية", - "course": { - "title": "🎓 دورة هندسة التلقين", - "type": "page" - }, - "services": { - "title": "خدمات", - "type": "page" - }, "about": { "title": "حول الدليل", "type": "page" diff --git a/pages/_meta.ca.json b/pages/_meta.ca.json index 2ec28958f..9097f7994 100644 --- a/pages/_meta.ca.json +++ b/pages/_meta.ca.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Additional Readings", - "course":{ - "title": "Prompt Engineering Course", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" - }, "about": { "title": "About", "type": "page" diff --git a/pages/_meta.de.json b/pages/_meta.de.json index 63080ca1e..560d4e483 100644 --- a/pages/_meta.de.json +++ b/pages/_meta.de.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datensätze", "readings": "Zusatzlektüre", - "course":{ - "title": "Prompt Engineering Kurs", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" - }, "about": { "title": "Über", "type": "page" diff --git a/pages/_meta.en.json b/pages/_meta.en.json index cc9409811..a1a9463e4 100644 --- a/pages/_meta.en.json +++ b/pages/_meta.en.json @@ -1,7 +1,8 @@ { "index": "Prompt Engineering", "introduction": "Introduction", - "techniques": "Techniques", + "techniques": "Prompting Techniques", + "agents": "AI Agents", "guides": "Guides", "applications": "Applications", "prompts": "Prompt Hub", @@ -13,13 +14,39 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Additional Readings", - "course":{ - "title": "🎓 Prompt Engineering Course", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" + "courses":{ + "title": "🎓 Courses", + "type": "menu", + "items": { + "intro-prompt-engineering": { + "title": "Intro to Prompt Engineering", + "href": "https://dair-ai.thinkific.com/courses/introduction-prompt-engineering" + }, + "advanced-prompt-engineering": { + "title": "Advanced Prompt Engineering", + "href": "https://dair-ai.thinkific.com/courses/advanced-prompt-engineering" + }, + "intro-ai-agents": { + "title": "Intro to AI Agents", + "href": "https://dair-ai.thinkific.com/courses/introduction-ai-agents" + }, + "agents-with-n8n": { + "title": "Building Effective AI Agents with n8n", + "href": "https://dair-ai.thinkific.com/courses/agents-with-n8n" + }, + "rag-systems": { + "title": "Build RAG Systems", + "href": "https://dair-ai.thinkific.com/courses/introduction-rag" + }, + "advanced-agents": { + "title": "Building Advanced AI Agents", + "href": "https://dair-ai.thinkific.com/courses/advanced-ai-agents" + }, + "all-courses": { + "title": "See all →", + "href": "https://dair-ai.thinkific.com/pages/courses" + } + } }, "about": { "title": "About", diff --git a/pages/_meta.es.json b/pages/_meta.es.json index c365eb646..c3a821534 100644 --- a/pages/_meta.es.json +++ b/pages/_meta.es.json @@ -12,15 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Lecturas Adicionales", - - "course": { - "title": "Curso de Ingeniería de Prompt", - "type": "page" - }, - "services": { - "title": "Servicios", - "type": "page" - }, "about": { "title": "Acerca de", "type": "page" diff --git a/pages/_meta.fi.json b/pages/_meta.fi.json index 2ec28958f..9097f7994 100644 --- a/pages/_meta.fi.json +++ b/pages/_meta.fi.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Additional Readings", - "course":{ - "title": "Prompt Engineering Course", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" - }, "about": { "title": "About", "type": "page" diff --git a/pages/_meta.fr.json b/pages/_meta.fr.json index 0efde5f7d..4bd21b59d 100644 --- a/pages/_meta.fr.json +++ b/pages/_meta.fr.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Lectures supplémentaires", - "course":{ - "title": "Prompt Engineering Course", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" - }, "about": { "title": "À propos", "type": "page" diff --git a/pages/_meta.it.json b/pages/_meta.it.json index 97c559624..cdf5d094c 100644 --- a/pages/_meta.it.json +++ b/pages/_meta.it.json @@ -12,14 +12,6 @@ "notebooks": "Notebook", "datasets": "Dataset", "readings": "Letture", - "course":{ - "title": "Corso Prompt Engineering", - "type": "page" - }, - "services": { - "title": "Servizi", - "type": "page" - }, "about": { "title": "Informazioni", "type": "page" diff --git a/pages/_meta.jp.json b/pages/_meta.jp.json index 2ec28958f..9097f7994 100644 --- a/pages/_meta.jp.json +++ b/pages/_meta.jp.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Additional Readings", - "course":{ - "title": "Prompt Engineering Course", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" - }, "about": { "title": "About", "type": "page" diff --git a/pages/_meta.kr.json b/pages/_meta.kr.json index 2ec28958f..9097f7994 100644 --- a/pages/_meta.kr.json +++ b/pages/_meta.kr.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Additional Readings", - "course":{ - "title": "Prompt Engineering Course", - "type": "page" - }, - "services": { - "title": "Services", - "type": "page" - }, "about": { "title": "About", "type": "page" diff --git a/pages/_meta.pt.json b/pages/_meta.pt.json index 7c3e8748e..efd931050 100644 --- a/pages/_meta.pt.json +++ b/pages/_meta.pt.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Conjuntos de dados", "readings": "Leituras Adicionais", - "course":{ - "title": "Curso de Engenharia Prompt", - "type": "page" - }, - "services": { - "title": "Serviços", - "type": "page" - }, "about": { "title": "Sobre", "type": "page" diff --git a/pages/_meta.ru.json b/pages/_meta.ru.json index 6a9b4fa1d..b4f6a4127 100644 --- a/pages/_meta.ru.json +++ b/pages/_meta.ru.json @@ -12,14 +12,6 @@ "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Дополнительные статьи", - "course": { - "title": "Курс по инженерии промптов", - "type": "page" - }, - "services": { - "title": "Услуги", - "type": "page" - }, "about": { "title": "О нас", "type": "page" diff --git a/pages/_meta.tr.json b/pages/_meta.tr.json index 821bd3089..bd207228b 100644 --- a/pages/_meta.tr.json +++ b/pages/_meta.tr.json @@ -12,14 +12,6 @@ "notebooks": "Notlar", "datasets": "Veri Kümeleri", "readings": "Ek Okumalar", - "course":{ - "title": "İstem Mühendisliği Kursu", - "type": "page" - }, - "services": { - "title": "Hizmetler", - "type": "page" - }, "about": { "title": "Hakkında", "type": "page" diff --git a/pages/_meta.zh.json b/pages/_meta.zh.json index e1df05c11..337151b1c 100644 --- a/pages/_meta.zh.json +++ b/pages/_meta.zh.json @@ -12,14 +12,6 @@ "notebooks": "Prompt Engineering 笔记本", "datasets": "数据集", "readings": "阅读推荐", - "course":{ - "title": "提示工程课程", - "type": "page" - }, - "services": { - "title": "服务", - "type": "page" - }, "about": { "title": "关于", "type": "page" diff --git a/pages/about.en.mdx b/pages/about.en.mdx index f601c169f..612cef410 100644 --- a/pages/about.en.mdx +++ b/pages/about.en.mdx @@ -1,9 +1,15 @@ # About -The Prompt Engineering Guide is a project by [DAIR.AI](https://github.com/dair-ai). It aims to educate researchers and practitioners about prompt engineering. +The Prompt Engineering Guide is a project by [DAIR.AI](https://github.com/dair-ai). It aims to educate researchers and practitioners about prompt engineering, context engineering, RAG, and AI Agents. DAIR.AI aims to democratize AI research, education, and technologies. Our mission is to enable the next-generation of AI innovators and creators. +## Sponsorship + +We are open to sponsorship opportunities to help us continue building and maintaining this guide. If you're interested in sponsoring this project, please reach out to us at [hello@dair.ai](mailto:hello@dair.ai). + +## Contributions + We welcome contributions from the community. Lookout for the Edit buttons. License information [here](https://github.com/dair-ai/Prompt-Engineering-Guide#license). diff --git a/pages/agents.en.mdx b/pages/agents.en.mdx new file mode 100644 index 000000000..95283e4e1 --- /dev/null +++ b/pages/agents.en.mdx @@ -0,0 +1,12 @@ +# Agents + +import { Callout } from 'nextra/components' +import ContentFileNames from 'components/ContentFileNames' + +In this section, we provide an overview of LLM-based agents, including definitions, common design patterns, tips, use cases, and applications. + + +This content is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing agentic systems. + + + diff --git a/pages/agents/_meta.en.json b/pages/agents/_meta.en.json new file mode 100644 index 000000000..c09c9ca61 --- /dev/null +++ b/pages/agents/_meta.en.json @@ -0,0 +1,8 @@ +{ + "introduction": "Introduction to Agents", + "components": "Agent Components", + "ai-workflows-vs-ai-agents": "AI Workflows vs AI Agents", + "context-engineering": "Context Engineering for AI Agents", + "context-engineering-deep-dive": "Context Engineering Deep Dive", + "deep-agents": "Deep Agents" +} \ No newline at end of file diff --git a/pages/agents/ai-workflows-vs-ai-agents.en.mdx b/pages/agents/ai-workflows-vs-ai-agents.en.mdx new file mode 100644 index 000000000..df6fa7cad --- /dev/null +++ b/pages/agents/ai-workflows-vs-ai-agents.en.mdx @@ -0,0 +1,229 @@ +# AI Workflows vs. AI Agents + +import { Callout } from 'nextra/components' + +![AI Workflows vs. AI Agents](../../img/agents/task-planner-agent.png) + +Agentic systems represent a paradigm shift in how we orchestrate Large Language Models (LLMs) and tools to accomplish complex tasks. This guide explores the fundamental distinction between **AI workflows** and **AI Agents**, helping you understand when to use each approach in your AI applications. + + +This content is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing agentic systems. + + +## What Are Agentic Systems? + +Agentic systems can be categorized into two main types: + +### 1. AI Workflows + +**AI workflows** are systems where LLMs and tools are orchestrated through **predefined code paths**. These systems follow a structured sequence of operations with explicit control flow. + +**Key Characteristics:** + +Key characteristics of AI workflows include: + +- Predefined steps and execution paths +- High predictability and control +- Well-defined task boundaries +- Explicit orchestration logic + +**When to Use Workflows:** + +Use AI workflows in the following scenarios: + +- Well-defined tasks with clear requirements +- Scenarios requiring predictability and consistency +- Tasks where you need explicit control over execution flow +- Production systems where reliability is critical + +### 2. AI Agents + +**AI agents** are systems where LLMs **dynamically direct their own processes** and tool usage, maintaining autonomous control over how they accomplish tasks. + +**Key Characteristics:** + +Key characteristics of AI agents include: + +- Dynamic decision-making +- Autonomous tool selection and usage +- Reasoning and reflection capabilities +- Self-directed task execution + +**When to Use Agents:** + +Use AI agents in the following scenarios: + +- Open-ended tasks with variable execution paths +- Complex scenarios where the number of steps is difficult to define upfront +- Tasks requiring adaptive reasoning +- Situations where flexibility outweighs predictability + +## Common AI Workflow Patterns + +### Pattern 1: Prompt Chaining + + + +Prompt chaining involves breaking down a complex task into sequential LLM calls, where each step's output feeds into the next. + +**Example: Document Generation Workflow** + +![Prompt Chaining](../../img/agents/prompt-chaining.png) + +This workflow demonstrates a prompt chaining pattern for document generation that begins when a chat message is received. The system first uses GPT-4.1-mini to generate an initial outline, then checks the outline against predefined criteria. A manual "Set Grade" step evaluates the quality, followed by a conditional "If" node that determines the next action based on the grade. If the outline passes validation, it proceeds to expand the outline sections using GPT-4o and then refines and polishes the final document. If the outline fails validation, the workflow branches to an "Edit Fields" step for manual adjustments before continuing, ensuring quality control throughout the multi-stage document creation process. + +**Prompt Chaining Use Cases:** +- Content generation pipelines +- Multi-stage document processing +- Sequential validation workflows + +### Pattern 2: Routing + +Routing directs different requests to specialized LLM chains or agents based on query classification. + +**Example: Customer Support Router** + +![Routing](../../img/agents/routing.png) + +This workflow illustrates a routing pattern for intelligent query distribution in a customer support system. When a chat message is received, it's first processed by a Query Classifier using GPT-4.1-mini along with a Structured Output Parser to categorize the request type. Based on the classification, a "Route by Type" switch directs the query to one of three specialized LLM chains: a General LLM Chain for basic inquiries, a Refund LLM Chain for payment-related issues, or a Support LLM Chain for technical assistance. Each query type receives specialized handling while maintaining a unified response system, optimizing both accuracy and efficiency in customer service operations. + +**Routing Use Cases:** +- Customer support systems +- Multi-domain question answering +- Request prioritization and delegation +- Resource optimization by routing to appropriate models + +**Benefits:** +- Efficient resource utilization +- Specialized handling for different query types +- Cost optimization through selective model usage + +### Pattern 3: Parallelization + +Parallelization executes multiple independent LLM operations simultaneously to improve efficiency. + +**Example: Content Safety Pipeline** + +![Parallelization](../../img/agents/parallelization.png) + +**Parallelization Use Cases:** +- Content moderation systems +- Multi-criteria evaluation +- Concurrent data processing +- Independent verification tasks + +**Advantages:** +- Reduced latency +- Better resource utilization +- Improved throughput + +## AI Agents: Autonomous Task Execution + +AI agents combine LLMs with autonomous decision-making capabilities, enabling them to perform complex tasks through reasoning, reflection, and dynamic tool usage. + +**Example: Task Planning Agent** + +**Scenario**: User asks "Add a meeting with John tomorrow at 2 PM" + + +![Task Planning Agent](../../img/agents/task-planner-agent.png) + +This workflow demonstrates an autonomous Task Planner agent that showcases agent behavior with dynamic decision-making capabilities. When a chat message is received, it's routed to a Task Planner agent that has access to three key components: a Chat Model (Reasoning LLM) for understanding and planning, a Memory system for maintaining context across interactions, and a Tool collection. The agent can autonomously select from multiple tools including add_update_tasks (to append or update tasks in a Google Sheet) and search_task (to read and search existing tasks from the sheet). Unlike predefined workflows, the agent independently determines which tools to use, when to use them, and in what sequence based on the user's request, exemplifying the flexibility and autonomy that distinguishes AI agents from traditional AI workflows. + + +**Key Insight**: The agent determines which tools to use and in what order, based on the request context—not on predefined rules. + + +**AI Agent Use Cases:** + +- Deep research systems +- Agentic RAG systems +- Coding agents +- Data analysis and processing +- Content generation and editing +- Customer support and assistance +- Interactive chatbots and virtual assistants + + +**Core Components:** + +Here is a list of key components for building AI Agents: + +1. **Tool Access**: Integration with external systems (Google Sheets, search APIs, databases) +2. **Memory**: Context retention across interactions for continuity +3. **Reasoning Engine**: Decision-making logic for tool selection and task planning +4. **Autonomy**: Self-directed execution without predefined control flow + +### How Agents Differ from Workflows + +| Aspect | AI Workflows | AI Agents | +|--------|-------------|-----------| +| **Control Flow** | Predefined, explicit | Dynamic, autonomous | +| **Decision Making** | Hard-coded logic | LLM-driven reasoning | +| **Tool Usage** | Orchestrated by code | Self-selected by agent | +| **Adaptability** | Fixed paths | Flexible execution | +| **Complexity** | Lower, more predictable | Higher, more capable | +| **Use Cases** | Well-defined tasks | Open-ended problems | + + +## Design Considerations + +### Choosing Between Workflows and Agents + +**Use AI Workflows when:** +- Task requirements are clear and stable +- Predictability is essential +- You need explicit control over execution +- Debugging and monitoring are priorities +- Cost management is critical + +**Use AI Agents when:** +- Tasks are open-ended or exploratory +- Flexibility is more important than predictability +- The problem space is complex with many variables +- Human-like reasoning is beneficial +- Adaptability to changing conditions is required + +### Hybrid Approaches + +Many production systems combine both approaches: +- **Workflows for structure**: Use workflows for reliable, well-defined components +- **Agents for flexibility**: Deploy agents for adaptive, complex decision-making +- **Example**: A workflow routes requests to specialized agents, each handling open-ended subtasks + +We will introduce an example of this in an upcoming article. + +## Best Practices + +### For AI Workflows + +1. **Clear Step Definition**: Document each stage in the workflow +2. **Error Handling**: Implement fallback paths for failures +3. **Validation Gates**: Add checks between critical steps +4. **Performance Monitoring**: Track latency and success rates per step + +### For AI Agents + +1. **Tool Design**: Provide clear, well-documented tools with explicit purposes +2. **Memory Management**: Implement effective context retention strategies +3. **Guardrails**: Set boundaries on agent behavior and tool usage +4. **Observability**: Log agent reasoning and decision-making processes +5. **Iterative Testing**: Continuously evaluate agent performance on diverse scenarios + +We will discuss these more extensively in future articles. + +## Conclusion + +Understanding the distinction between AI workflows and AI agents is crucial for building effective agentic systems. Workflows provide control and predictability for well-defined tasks, while agents offer flexibility and autonomy for complex, open-ended problems. + +The choice between workflows and agents—or a combination of both—depends on your specific use case, performance requirements, and tolerance for autonomous decision-making. By aligning your system design with task characteristics, you can build more effective, efficient, and reliable AI applications. + + +This content is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing agentic systems. + + +## Additional Resources + +- [Anthropic: Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) +- [Prompt Engineering Guide](https://www.promptingguide.ai/) +- [Building Effective AI Agents with n8n](https://dair-ai.thinkific.com/courses/agents-with-n8n) diff --git a/pages/agents/components.en.mdx b/pages/agents/components.en.mdx new file mode 100644 index 000000000..f7ffdec73 --- /dev/null +++ b/pages/agents/components.en.mdx @@ -0,0 +1,54 @@ +# Agent Components + +import { Callout } from 'nextra/components' + +AI agents require three fundamental capabilities to effectively tackle complex tasks: planning abilities, tool utilization, and memory management. Let's dive into how these components work together to create functional AI agents. + +![Agent Components](../../img/agents/agent-components.png) + +## Planning: The Brain of the Agent + +At the core of any effective AI agent is its planning capability, powered by large language models (LLMs). Modern LLMs enable several crucial planning functions: + +- Task decomposition through chain-of-thought reasoning +- Self-reflection on past actions and information +- Adaptive learning to improve future decisions +- Critical analysis of current progress + +While current LLM planning capabilities aren't perfect, they're essential for task completion. Without robust planning abilities, an agent cannot effectively automate complex tasks, which defeats its primary purpose. + + +Learn how to build with AI agents in our new course. [Join now!](https://dair-ai.thinkific.com/courses/introduction-ai-agents) +Use code PROMPTING20 to get an extra 20% off. + + +## Tool Utilization: Extending the Agent's Capabilities + +The second critical component is an agent's ability to interface with external tools. A well-designed agent must not only have access to various tools but also understand when and how to use them appropriately. Common tools include: + +- Code interpreters and execution environments +- Web search and scraping utilities +- Mathematical calculators +- Image generation systems + +These tools enable the agent to execute its planned actions, turning abstract strategies into concrete results. The LLM's ability to understand tool selection and timing is crucial for handling complex tasks effectively. + +## Memory Systems: Retaining and Utilizing Information + +The third essential component is memory management, which comes in two primary forms: + +1. Short-term (Working) Memory + - Functions as a buffer for immediate context + - Enables in-context learning + - Sufficient for most task completions + - Helps maintain continuity during task iteration + +2. Long-term Memory + - Implemented through external vector stores + - Enables fast retrieval of historical information + - Valuable for future task completion + - Less commonly implemented but potentially crucial for future developments + +Memory systems allow agents to store and retrieve information gathered from external tools, enabling iterative improvement and building upon previous knowledge. + +The synergy between planning capabilities, tool utilization, and memory systems forms the foundation of effective AI agents. While each component has its current limitations, understanding these core capabilities is crucial for developing and working with AI agents. As the technology evolves, we may see new memory types and capabilities emerge, but these three pillars will likely remain fundamental to AI agent architecture. diff --git a/pages/agents/context-engineering-deep-dive.en.mdx b/pages/agents/context-engineering-deep-dive.en.mdx new file mode 100644 index 000000000..46e0029c3 --- /dev/null +++ b/pages/agents/context-engineering-deep-dive.en.mdx @@ -0,0 +1,284 @@ +# Context Engineering Deep Dive: Building a Deep Research Agent + +import { Callout } from 'nextra/components' + +[Context engineering](https://www.promptingguide.ai/guides/context-engineering-guide) requires significant iteration and careful design decisions to build reliable AI agents. This guide takes a deep dive into the practical aspects of context engineering through the development of a basic deep research agent, exploring some of the techniques and design patterns that improve agent reliability and performance. + + +This content is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing agentic systems. + + +## The Reality of Context Engineering + +Building effective AI agents requires substantial tuning of system prompts and tool definitions. The process involves spending hours iterating on: + +- System prompt design and refinement +- Tool definitions and usage instructions +- Agent architecture and communication patterns +- Input/output specifications between agents + + +Don't underestimate the effort required for context engineering. It's not a one-time task but an iterative process that significantly impacts agent reliability and performance. + +## Agent Architecture Design + +### The Original Design Problem + +![deep-research-agent](../../img/agents/simple-dr-agent.png) + +Let's look at a basic deep research agent architecture. The initial architecture connects the web search tool directly to the deep research agent. This design places too much burden on a single agent responsible for: + +- Managing tasks (creating, updating, deleting) +- Saving information to memory +- Executing web searches +- Generating final reports + +**Consequences of this design:** +- Context grew too long +- Agent forgot to execute web searches +- Task completion updates were missed +- Unreliable behavior across different queries + +### The Improved Multi-Agent Architecture + +The solution involved separating concerns by introducing a dedicated search worker agent: + +**Benefits of the multi-agent design:** + +1. **Separation of Concerns**: The parent agent (Deep Research Agent) handles planning and orchestration, while the search worker agent focuses exclusively on executing web searches +2. **Improved Reliability**: Each agent has a clear, focused responsibility, reducing the likelihood of missed tasks or forgotten operations +3. **Model Selection Flexibility**: Different agents can use different language models optimized for their specific tasks + - Deep Research Agent: Uses Gemini 2.5 Pro for complex planning and reasoning + - Search Worker Agent: Uses Gemini 2.5 Flash for faster, more cost-effective search execution + +If you are using models from other providers like OpenAI, you can leverage GPT-5 (for planning and reasoning) and GPT-5-mini (for search execution) for similar performance. + + +**Design Principle**: Separating agent responsibilities improves reliability and enables cost-effective model selection for different subtasks. + + +## System Prompt Engineering + +Here is the full system prompt for the deep research agent we built in n8n: + +```md +You are a deep research agent who will help with planning and executing search tasks to generate a deep research report. + +## GENERAL INSTRUCTIONS + +The user will provide a query, and you will convert that query into a search plan with multiple search tasks (3 web searches). You will execute each search task and maintain the status of those searches in a spreadsheet. + +You will then generate a final deep research report for the user. + +For context, today's date is: {{ $now.format('yyyy-MM-dd') }} + +## TOOL DESCRIPTIONS + +Below are some useful instructions for how to use the available tools. + +Deleting tasks: Use the delete_task tool to clear up all the tasks before starting the search plan. + +Planning tasks: You will create a plan with the search tasks (3 web searches) and add them to the Google Sheet using the append_update_task tool. Make sure to keep the status of each task updated after completing each search. Each task begins with a todo status and will be updated to a "done" status once the search worker returns information regarding the search task. + +Executing tasks: Use the Search Worker Agent tool to execute the search plan. The input to the agent are the actual search queries, word for word. + +Use the tools in the order that makes the most sense to you but be efficient. +``` + +Let's break it down into parts and discuss why each section is important: + + +### High-Level Agent Definition + +The system prompt begins with a clear definition of the agent's role: + +```md +You are a deep research agent who will help with planning and executing search tasks to generate a deep research report. +``` + +### General Instructions + +Provide explicit instructions about the agent's workflow: + +```md +## GENERAL INSTRUCTIONS + +The user will provide a query, and you will convert that query into a search plan with multiple search tasks (3 web searches). You will execute each search task and maintain the status of those searches in a spreadsheet. + +You will then generate a final deep research report for the user. +``` + +### Providing Essential Context + +**Current Date Information:** + +Including the current date is crucial for research agents to get up-to-date information: + +```md +For context, today's date is: {{ $now.format('yyyy-MM-dd') }} +``` + +**Why this matters:** +- LLMs typically have knowledge cutoffs months or years behind the current date +- Without current date context, agents often search for outdated information +- This ensures agents understand temporal context for queries like "latest news" or "recent developments" + +In n8n, you can dynamically inject the current date using built-in functions with customizable formats (date only, date with time, specific timezones, etc.). + +## Tool Definitions and Usage Instructions + +### The Importance of Detailed Tool Descriptions + +Tool definitions typically appear in two places: + +1. **In the system prompt**: Detailed explanations of what tools do and when to use them +2. **In the actual tool implementation**: Technical specifications and parameters + + +**Key Insight**: The biggest performance improvements often come from clearly explaining tool usage in the system prompt, not just defining tool parameters. + + +### Example Tool Instructions + +The system prompt also includes detailed instructions for using the available tools: + +```md +## TOOL DESCRIPTIONS + +Below are some useful instructions for how to use the available tools. + +Deleting tasks: Use the delete_task tool to clear up all the tasks before starting the search plan. + +Planning tasks: You will create a plan with the search tasks (3 web searches) and add them to the Google Sheet using the append_update_task tool. Make sure to keep the status of each task updated after completing each search. Each task begins with a todo status and will be updated to a "done" status once the search worker returns information regarding the search task. + +Executing tasks: Use the Search Worker Agent tool to execute the search plan. The input to the agent are the actual search queries, word for word. + +Use the tools in the order that makes the most sense to you but be efficient. +``` + + +Initially, without explicit status definitions, the agent would use different status values across runs: +- Sometimes "pending", sometimes "to-do" +- Sometimes "completed", sometimes "done", sometimes "finished" + +Be explicit about allowed values. This eliminates ambiguity and ensures consistent behavior. + +Note that the system prompt also includes this instruction: + +```md +Use the tools in the order that makes most sense to you, but be efficient. +``` + +What's the reasoning behind this decision? + +This provides flexibility for the agent to optimize its execution strategy. During testing, the agent might: +- Execute only 2 searches instead of 3 if it determines that's sufficient +- Combine redundant search queries +- Skip searches that overlap significantly + +Here is a specific instruction you can use, if you require all search tasks to be executed: + +```md +You MUST execute a web search for each and every search task you create. +Do NOT skip any tasks, even if they seem redundant. +``` + +**When to use flexible vs. rigid approaches:** +- **Flexible**: During development and testing to observe agent decision-making patterns +- **Rigid**: In production when consistency and completeness are critical + +## Context Engineering Iteration Process + +### The Iterative Nature of Improving Context + +Context engineering is not a one-time effort. The development process involves: + +1. **Initial implementation** with basic system prompts +2. **Testing** with diverse queries +3. **Identifying issues** (missed tasks, wrong status values, incomplete searches) +4. **Adding specific instructions** to address each issue +5. **Re-testing** to validate improvements +6. **Repeating** the cycle + +### What's Still Missing + +Even after multiple iterations, there are opportunities for further improvement: + +**Search Task Metadata:** +- Augmenting search queries +- Search type (web search, news search, academic search, PDF search) +- Time period filters (today, last week, past month, past year, all time) +- Domain focus (technology, science, health, etc.) +- Priority levels for task execution order + +**Enhanced Search Planning:** +- More detailed instructions on how to generate search tasks +- Preferred formats for search queries +- Guidelines for breaking down complex queries +- Examples of good vs. bad search task decomposition + +**Date Range Specification:** +- Start date and end date for time-bounded searches +- Format specifications for date parameters +- Logic for inferring date ranges from time period keywords + +Based on the recommended improvements, it's easy to appreciate that web search for AI agents is a challenging effort that requires a lot of context engineering. + + +## Advanced Considerations + +### Sub-Agent Communication + +When designing multi-agent systems, carefully consider: + +**What information does the sub-agent need?** +- For the search worker: Just the search query text +- Not the full context or task metadata +- Keep sub-agent inputs minimal and focused + +**What information should the sub-agent return?** +- Search results and relevant findings +- Error states or failure conditions +- Metadata about the search execution + +### Context Length Management + +As agents execute multiple tasks, context grows: +- Task history accumulates +- Search results add tokens +- Conversation history expands + +**Strategies to manage context length:** +- Use separate agents to isolate context +- Implement memory management tools +- Summarize long outputs before adding to context +- Clear task lists between research queries + +### Error Handling in System Prompts + +Include instructions for failure scenarios: + +```text +ERROR HANDLING: +- If search_worker fails, retry once with rephrased query +- If task cannot be completed, mark status as "failed" with reason +- If critical errors occur, notify user and request guidance +- Never proceed silently when operations fail +``` + +## Conclusion + +Context engineering is a critical practice for building reliable AI agents that requires: + +- **Significant iteration time** spent tuning prompts and tool definitions +- **Careful architectural decisions** about agent separation and communication +- **Explicit instructions** that eliminate assumptions +- **Continuous refinement** based on observed behavior +- **Balance between flexibility and control** + +The deep research agent example demonstrates how thoughtful context engineering transforms an unreliable prototype into a robust, production-ready system. By applying these principles—clear role definitions, explicit tool instructions, essential context provision, and iterative improvement—you can build AI agents that consistently deliver high-quality results. + + +Learn how to build production-ready AI agents with hands-on examples and templates. [Join our comprehensive course!](https://dair-ai.thinkific.com/courses/agents-with-n8n) +Use code PROMPTING20 to get an extra 20% off. + diff --git a/pages/agents/context-engineering.en.mdx b/pages/agents/context-engineering.en.mdx new file mode 100644 index 000000000..c50a03a85 --- /dev/null +++ b/pages/agents/context-engineering.en.mdx @@ -0,0 +1,242 @@ +# Why Context Engineering? + +import { Callout } from 'nextra/components' + +[Context engineering](https://www.promptingguide.ai/guides/context-engineering-guide) is a critical practice for building reliable and effective AI agents. This guide explores the importance of context engineering through a practical example of building a deep research agent. + +Context engineering involves carefully crafting and refining the prompts, instructions, and constraints that guide an AI agent's behavior to achieve desired outcomes. + + +This content is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing agentic systems. + + +## What is Context Engineering? + +[Context engineering](https://www.promptingguide.ai/guides/context-engineering-guide) is the process of designing, testing, and iterating on the contextual information provided to AI agents to shape their behavior and improve task performance. Unlike simple prompt engineering for single LLM calls, context engineering for agents involves (but not limited to): + +- **System prompts** that define agent behavior and capabilities +- **Task constraints** that guide decision-making +- **Tool descriptions** that clarify when and how to use available functions/tools +- **Memory management** for tracking state across multiple steps +- **Error handling** patterns for robust execution + +## Building a Deep Research Agent: A Case Study + +Let's explore context engineering principles through an example: a minimal deep research agent that performs web searches and generates reports. + +![Agent Workflow](../../img/agents/simple-dr-agent.png) + +### The Context Engineering Challenge + +When building the first version of this agent system, the initial implementation revealed several behavioral issues that required careful context engineering: + +#### Issue 1: Incomplete Task Execution + +**Problem**: When running the agentic workflow, the orchestrator agent often creates three search tasks but only executes searches for two of them, skipping the third task without explicit justification. + +**Root Cause**: The agent's system prompt lacked explicit constraints about task completion requirements. The agent made assumptions about which searches were necessary, leading to inconsistent behavior. + +**Solution**: Two approaches are possible: + +1. **Flexible Approach** (current): Allow the agent to decide which searches are necessary, but require explicit reasoning for skipped tasks +2. **Strict Approach**: Add explicit constraints requiring search execution for all planned tasks + +Example system prompt enhancement: + +```text +You are a deep research agent responsible for executing comprehensive research tasks. + +TASK EXECUTION RULES: +- For each search task you create, you MUST either: + 1. Execute a web search and document findings, OR + 2. Explicitly state why the search is unnecessary and mark it as completed with justification + +- Do NOT skip tasks silently or make assumptions about task redundancy +- If you determine tasks overlap, consolidate them BEFORE execution +- Update task status in the spreadsheet after each action +``` + +#### Issue 2: Lack of Debugging Visibility + +**Problem**: Without proper logging and state tracking, it was difficult to understand why the agent made certain decisions. + +**Solution**: For this example, it helps to implement a task management system using a spreadsheet or text file (for simplicity) with the following fields: + +- Task ID +- Search query +- Status (todo, in_progress, completed) +- Results summary +- Timestamp + +This visibility enables: +- Real-time debugging of agent decisions +- Understanding of task execution flow +- Identification of behavioral patterns +- Data for iterative improvements + +### Context Engineering Best Practices + +Based on this case study, here are key principles for effective context engineering: + +#### 1. Eliminate Prompt Ambiguity + +**Bad Example:** +```text +Perform research on the given topic. +``` + +**Good Example:** +```text +Perform research on the given topic by: +1. Breaking down the query into 3-5 specific search subtasks +2. Executing a web search for EACH subtask using the search_tool +3. Documenting findings for each search in the task tracker +4. Synthesizing all findings into a comprehensive report +``` + +#### 2. Make Expectations Explicit + +Don't assume the agent knows what you want. Be explicit about: +- Required vs. optional actions +- Quality standards +- Output formats +- Decision-making criteria + +#### 3. Implement Observability + +Build debugging mechanisms into your agentic system: +- Log all agent decisions and reasoning +- Track state changes in external storage +- Record tool calls and their outcomes +- Capture errors and edge cases + + +Pay close attention to every run of your agentic system. Strange behaviors and edge cases are opportunities to improve your context engineering efforts. + + +#### 4. Iterate Based on Behavior + +Context engineering is an iterative process: + +1. **Deploy** the agent with initial context +2. **Observe** actual behavior in production +3. **Identify** deviations from expected behavior +4. **Refine** system prompts and constraints +5. **Test** and validate improvements +6. **Repeat** + +#### 5. Balance Flexibility and Constraints + +Consider the tradeoff between: +- **Strict constraints**: More predictable but less adaptable +- **Flexible guidelines**: More adaptable but potentially inconsistent + +Choose based on your use case requirements. + +## Advanced Context Engineering Techniques + +### Layered Context Architecture + +Context engineering applies to all stages of the AI agent build process. Depending on the AI Agent, it's sometimes helpful to think of context as a hierarchical structure. For our basic agentic system, we can organize context into hierarchical layers: + +1. **System Layer**: Core agent identity and capabilities +2. **Task Layer**: Specific instructions for the current task +3. **Tool Layer**: Descriptions and usage guidelines for each tool +4. **Memory Layer**: Relevant historical context and learnings + +### Dynamic Context Adjustment + +Another approach is to dynamically adjust context based on the task complexity, available resources, previous execution history, and error patterns. Based on our example, we can adjust context based on: + +- Task complexity +- Available resources +- Previous execution history +- Error patterns + +### Context Validation + +Evaluation is key to ensuring context engineering techniques are working as they should for your AI agents. Before deployment, validate your context design: + +- **Completeness**: Does it cover all important scenarios? +- **Clarity**: Is it unambiguous? +- **Consistency**: Do different parts align? +- **Testability**: Can you verify the behavior? + +## Common Context Engineering Pitfalls + +Below are a few common context engineering pitfalls to avoid when building AI agents: + +### 1. Over-Constraint + +**Problem**: Too many rules make the agent inflexible and unable to handle edge cases. + +**Example**: +```text +NEVER skip a search task. +ALWAYS perform exactly 3 searches. +NEVER combine similar queries. +``` + +**Better Approach**: +```text +Aim to perform searches for all planned tasks. If you determine that tasks are redundant, consolidate them before execution and document your reasoning. +``` + +### 2. Under-Specification + +**Problem**: Vague instructions lead to unpredictable behavior. + +**Example**: +```text +Do some research and create a report. +``` + +**Better Approach**: +```text +Execute research by: +1. Analyzing the user query to identify key information needs +2. Creating 3-5 specific search tasks covering different aspects +3. Executing searches using the search_tool for each task +4. Synthesizing findings into a structured report with sections for: + - Executive summary + - Key findings per search task + - Conclusions and insights +``` + +### 3. Ignoring Error Cases + +**Problem**: Context doesn't specify behavior when things go wrong. + +**Solution**: In some cases, it helps to add error handling instructions to your AI Agents: +```text +ERROR HANDLING: +- If a search fails, retry once with a rephrased query +- If retry fails, document the failure and continue with remaining tasks +- If more than 50% of searches fail, alert the user and request guidance +- Never stop execution completely without user notification +``` + +## Measuring Context Engineering Success + +Track these metrics to evaluate context engineering effectiveness: + +1. **Task Completion Rate**: Percentage of tasks completed successfully +2. **Behavioral Consistency**: Similarity of agent behavior across similar inputs +3. **Error Rate**: Frequency of failures and unexpected behaviors +4. **User Satisfaction**: Quality and usefulness of outputs +5. **Debugging Time**: Time required to identify and fix issues + +It's important to not treat context engineering as a one-time activity but an ongoing practice that requires: + +- **Systematic observation** of agent behavior +- **Careful analysis** of failures and edge cases +- **Iterative refinement** of instructions and constraints +- **Rigorous testing** of changes + +We will be covering these principles in more detail in upcoming guides. By applying these principles, you can build AI agent systems that are reliable, predictable, and effective at solving complex tasks. + + + +Learn how to build production-ready AI agents in our comprehensive course. [Join now!](https://dair-ai.thinkific.com/courses/agents-with-n8n) +Use code PROMPTING20 to get an extra 20% off. + diff --git a/pages/agents/deep-agents.en.mdx b/pages/agents/deep-agents.en.mdx new file mode 100644 index 000000000..b86e0844e --- /dev/null +++ b/pages/agents/deep-agents.en.mdx @@ -0,0 +1,78 @@ +# Deep Agents + +import { Callout } from 'nextra/components' + +Most agents today are shallow. + +They easily break down on long, multi-step problems (e.g., deep research or agentic coding). + +That’s changing fast! + +We’re entering the era of "Deep Agents", systems that strategically plan, remember, and delegate intelligently for solving very complex problems. + +We at the [DAIR.AI Academy](https://dair-ai.thinkific.com/) and other folks from [LangChain](https://docs.langchain.com/labs/deep-agents/overview), [Claude Code](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk), as well as more recently, individuals like [Philipp Schmid](https://www.philschmid.de/agents-2.0-deep-agents), have been documenting this idea. + +Here is an example of a deep agent built to power the [DAIR.AI Academy's](https://dair-ai.thinkific.com/) customer support system intended for students to ask questions regarding our trainings and courses: + +![deep-agent](../../img/agents/customer-support-deep-agent.png) + + +This post is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing deep agents. + + +Here’s roughly the core idea behind Deep Agents (based on my own thoughts and notes that I've gathered from others): + +## Planning + +![cs-planning](../../img/agents/cs-planning.png) + +Instead of reasoning ad-hoc inside a single context window, Deep Agents maintain structured task plans they can update, retry, and recover from. Think of it as a living to-do list that guides the agent toward its long-term goal. To experience this, just try out Claude Code or Codex for planning; the results are significantly better once you enable it before executing any task. + +We have also written recently on the power of brainstorming for longer with Claude Code, and this shows the power of planning, expert context, and human-in-the-loop (your expertise gives you an important edge when working with deep agents). Planning will also be critical for long-horizon problems (think agents for scientific discovery, which comes next). + +## Orchestrator & Sub-agent Architecture + +![cs-subagents](../../img/agents/cs-subagents.png) + +One big agent (typically with a very long context) is no longer enough. I've seen [arguments](https://cognition.ai/blog/dont-build-multi-agents) against multi-agent systems and in favor of monolithic systems, but I'm skeptical about this. + +The orchestrator-sub-agent architecture is one of the most powerful LLM-based agentic architectures you can leverage today for any domain you can imagine. An orchestrator manages specialized sub-agents such as search agents, coders, KB retrievers, analysts, verifiers, and writers, each with its own clean context and domain focus. + +The orchestrator delegates intelligently, and subagents execute efficiently. The orchestrator integrates their outputs into a coherent result. Claude Code popularized the use of this approach for coding and sub-agents, which, it turns out, are particularly useful for efficiently managing context (through separation of concerns). + +I wrote a few notes on the power of using orchestrator and subagents [here](https://x.com/omarsar0/status/1960877597191245974) and [here](https://x.com/omarsar0/status/1971975884077965783). + +## Context Retrieval and Agentic Search + +![persistent-storage](../../img/agents/cs-persistent-storage.png) + +Deep Agents don’t rely on conversation history alone. They store intermediate work in external memory like files, notes, vectors, or databases, letting them reference what matters without overloading the model’s context. High-quality structured memory is a thing of beauty. + +Take a look at recent works like [ReasoningBank](https://arxiv.org/abs/2509.25140) and [Agentic Context Engineering](https://arxiv.org/abs/2510.04618) for some really cool ideas on how to better optimize memory building and retrieval. Building with the orchestrator-subagents architecture means that you can also leverage hybrid memory techniques (e.g., agentic search + semantic search), and you can let the agent decide what strategy to use. + +## Context Engineering + +One of the worst things you can do when interacting with these types of agents is underspecified instructions/prompts. Prompt engineering was and is important, but we will use the new term [context engineering](https://www.promptingguide.ai/guides/context-engineering-guide) to emphasize the importance of building context for agents. The instructions need to be more explicit, detailed, and intentional to define when to plan, when to use a sub-agent, how to name files, and how to collaborate with humans. Part of context engineering also involves efforts around structured outputs, system prompt optimization, compacting context, evaluating context effectiveness, and [optimizing tool definitions](https://www.anthropic.com/engineering/writing-tools-for-agents). + +Read our previous guide on context engineering to learn more: [Context Engineering Deep Dive](https://www.promptingguide.ai/guides/context-engineering-guide) + +## Verification + +![verification agent](../../img/agents/cs-verification-agent.png) + +Next to context engineering, verification is one of the most important components of an agentic system (though less often discussed). Verification boils down to verifying outputs, which can be automated (LLM-as-a-Judge) or done by a human. Because of the effectiveness of modern LLMs at generating text (in domains like math and coding), it's easy to forget that they still suffer from hallucination, sycophancy, prompt injection, and a number of other issues. Verification helps with making your agents more reliable and more production-ready. You can build good verifiers by leveraging systematic evaluation pipelines. + +## Final Words + +This is a huge shift in how we build with AI agents. Deep agents also feel like an important building block for what comes next: personalized proactive agents that can act on our behalf. I will write more on proactive agents in a future post. + +I've been teaching these ideas to agent builders over the past couple of months. If you are interested in more hands-on experience for how to build deep agents check out the new course in our academy: https://dair-ai.thinkific.com/courses/agents-with-n8n + + +The figures you see in the post describe an agentic RAG system that students need to build for the course final project. + + +This post is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing deep agents. + + +*Written by Elvis Saravia (creator of the Prompting Engineering Guide and co-founder of the DAIR.AI Academy)* \ No newline at end of file diff --git a/pages/agents/introduction.en.mdx b/pages/agents/introduction.en.mdx new file mode 100644 index 000000000..ae76db71d --- /dev/null +++ b/pages/agents/introduction.en.mdx @@ -0,0 +1,49 @@ +# Introduction to AI Agents + +import { Callout } from 'nextra/components' + +Agents are revolutionizing the way we approach complex tasks, leveraging the power of large language models (LLMs) to work on our behalf and achieve remarkable results. In this guide we will dive into the fundamentals of AI agents, exploring their capabilities, design patterns, and potential applications. + +## What is an Agent? + +![Agent Components](../../img/agents/agent-components.png) + +In this guide, we refer to an agent as an LLM-powered system designed to take actions and solve complex tasks autonomously. Unlike traditional LLMs, AI agents go beyond simple text generation. They are equipped with additional capabilities, including: + +* **Planning and reflection:** AI agents can analyze a problem, break it down into steps, and adjust their approach based on new information. +* **Tool access:** They can interact with external tools and resources, such as databases, APIs, and software applications, to gather information and execute actions. +* **Memory:** AI agents can store and retrieve information, allowing them to learn from past experiences and make more informed decisions. + +This lecture discusses the concept of AI agents and their significance in the realm of artificial intelligence. + +## Why build with Agents? + +While large language models (LLMs) excel at simple, narrow tasks like translation or email generation, they fall short when dealing with complex, broader tasks that require multiple steps, planning, and reasoning. These complex tasks often necessitate access to external tools and information beyond the LLM's knowledge base. + +For example, developing a marketing strategy might involve researching competitors, analyzing market trends, and accessing company-specific data. These actions necessitate real-world information, the latest insights, and internal company data, which a standalone LLM might not have access to. + +AI agents bridge this gap by combining the capabilities of LLMs with additional features such as memory, planning, and external tools. + +By leveraging these abilities, AI agents can effectively tackle complex tasks like: + +* Developing marketing strategies +* Planning events +* Providing customer support + + +Learn how to build with AI agents in our new course. [Join now!](https://dair-ai.thinkific.com/courses/introduction-ai-agents) +Use code PROMPTING20 to get an extra 20% off. + + +## Common Use Cases for AI Agents + +Here is a non-exhaustive list of common use cases where agents are being applied in the industry: + +* **Recommendation systems:** Personalizing suggestions for products, services, or content. +* **Customer support systems:** Handling inquiries, resolving issues, and providing assistance. +* **Research:** Conducting in-depth investigations across various domains, such as legal, finance, and health. +* **E-commerce applications:** Facilitating online shopping experiences, managing orders, and providing personalized recommendations. +* **Booking:** Assisting with travel arrangements and event planning. +* **Reporting:** Analyzing vast amounts of data and generating comprehensive reports. +* **Financial analysis:** Analyzing market trends, assess financial data, and generate reports with unprecedented speed and accuracy. + diff --git a/pages/applications/context-caching.en.mdx b/pages/applications/context-caching.en.mdx index 9f7763191..8d2e22ca2 100644 --- a/pages/applications/context-caching.en.mdx +++ b/pages/applications/context-caching.en.mdx @@ -1,6 +1,6 @@ # Context Caching with Gemini 1.5 Flash -import {Cards, Card} from 'nextra-theme-docs' +import {Cards, Card, Callout} from 'nextra-theme-docs' import {CodeIcon} from 'components/icons' Google recently released a new feature called [context-caching](https://ai.google.dev/gemini-api/docs/caching?lang=python) which is available via the Gemini APIs through the Gemini 1.5 Pro and Gemini 1.5 Flash models. This guide provides a basic example of how to use context-caching with Gemini 1.5 Flash. @@ -53,3 +53,8 @@ The notebook can be found below: href="https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/gemini-context-caching.ipynb" /> + + +Learn more about caching methods in our new AI courses. [Join now!](https://dair-ai.thinkific.com/) +Use code PROMPTING20 to get an extra 20% off. + diff --git a/pages/applications/finetuning-gpt4o.en.mdx b/pages/applications/finetuning-gpt4o.en.mdx index 8a84c38e0..bd6222ac3 100644 --- a/pages/applications/finetuning-gpt4o.en.mdx +++ b/pages/applications/finetuning-gpt4o.en.mdx @@ -1,5 +1,7 @@ # Fine-Tuning with GPT-4o Models +import { Callout } from 'nextra/components' + OpenAI recently [announced](https://openai.com/index/gpt-4o-fine-tuning/) the availability of fine-tuning for its latest models, GPT-4o and GPT-4o mini. This new capability enables developers to customize the GPT-4o models for specific use cases, enhancing performance and tailoring outputs. ## Fine-Tuning Details and Costs @@ -28,4 +30,9 @@ This demonstration highlights the potential of fine-tuning in enhancing model pe Once the fine-tuning process is complete, developers can access and evaluate their custom models through the OpenAI playground. The playground allows for interactive testing with various inputs and provides insights into the model's performance. For more comprehensive evaluation, developers can integrate the fine-tuned model into their applications via the OpenAI API and conduct systematic testing. -OpenAI's introduction of fine-tuning for GPT-4o models unlocks new possibilities for developers seeking to leverage the power of LLMs for specialized tasks. \ No newline at end of file +OpenAI's introduction of fine-tuning for GPT-4o models unlocks new possibilities for developers seeking to leverage the power of LLMs for specialized tasks. + + +Learn more about advanced methods in our new AI courses. [Join now!](https://dair-ai.thinkific.com/) +Use code PROMPTING20 to get an extra 20% off. + diff --git a/pages/applications/function_calling.en.mdx b/pages/applications/function_calling.en.mdx index 24bd1c354..584ed7c6f 100644 --- a/pages/applications/function_calling.en.mdx +++ b/pages/applications/function_calling.en.mdx @@ -1,6 +1,6 @@ # Function Calling with LLMs -import {Cards, Card} from 'nextra-theme-docs' +import {Cards, Card, Callout} from 'nextra-theme-docs' import {CodeIcon} from 'components/icons' ## Getting Started with Function Calling @@ -111,6 +111,11 @@ You can then choose to call an external weather API for the actual weather. Once ## Notebooks + +Learn more about function calling in our new AI courses. [Join now!](https://dair-ai.thinkific.com/) +Use code PROMPTING20 to get an extra 20% off. + + Here is a notebook with a simple example that demonstrates how to use function calling with the OpenAI APIs: @@ -147,4 +152,4 @@ Below is a list of use cases that can benefit from the function calling capabili - [OpenAI's Function Calling](https://platform.openai.com/docs/guides/function-calling) - [How to call functions with chat models](https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models) - [Pushing ChatGPT's Structured Data Support To Its Limits](https://minimaxir.com/2023/12/chatgpt-structured-data/) -- [Math Problem Solving with Function Calling](https://github.com/svpino/openai-function-calling/blob/main/sample.ipynb) \ No newline at end of file +- [Math Problem Solving with Function Calling](https://github.com/svpino/openai-function-calling/blob/main/sample.ipynb) diff --git a/pages/applications/generating.en.mdx b/pages/applications/generating.en.mdx index 28719cd8d..938dd173d 100644 --- a/pages/applications/generating.en.mdx +++ b/pages/applications/generating.en.mdx @@ -1,4 +1,7 @@ # Generating Data + +import { Callout } from 'nextra/components' + LLMs have strong capabilities to generate coherent text. Using effective prompt strategies can steer the model to produce better, consistent, and more factual responses. LLMs can also be especially useful for generating data which is really useful to run all sorts of experiments and evaluations. For example, we can use it to generate quick samples for a sentiment classifier like so: *Prompt:* @@ -41,4 +44,9 @@ Q: I just got some terrible news. A: Negative ``` -This is very useful. We actually use this example for a different test in another section of the guides. \ No newline at end of file +This is very useful. We actually use this example for a different test in another section of the guides. + + +Learn more about advanced prompting methods in our new AI courses. [Join now!](https://dair-ai.thinkific.com/) +Use code PROMPTING20 to get an extra 20% off. + diff --git a/pages/course.en.mdx b/pages/courses.en.mdx similarity index 92% rename from pages/course.en.mdx rename to pages/courses.en.mdx index aa5717f3f..e239fc1cc 100644 --- a/pages/course.en.mdx +++ b/pages/courses.en.mdx @@ -4,8 +4,6 @@ import { Callout } from 'nextra/components' Learn more about advanced prompt engineering techniques and best practices in our new AI courses. [Join now!](https://dair-ai.thinkific.com/) - -Use code BLACKFRIDAY to get an extra 35% off. This offer ends on 29th November 2024. Our hands-on courses are built to compliment this prompt engineering guide. They are designed to help expand your skills and knowledge by teaching you how to effectively apply the concepts learned in this guide to real-world use cases and applications. diff --git a/pages/guides/4o-image-generation.en.mdx b/pages/guides/4o-image-generation.en.mdx new file mode 100644 index 000000000..2a63011ce --- /dev/null +++ b/pages/guides/4o-image-generation.en.mdx @@ -0,0 +1,187 @@ +## OpenAI 4o Image Generation Guide + +A practical guide to using the 4o Image Generation Model + +![A stylized title in front of an OpenAI logo, behind frosted glass.](../../img/4o-image-generation/4o_image_generation.png) + +### What is the 4o Image Generation model? + +4o Image Generation is OpenAI’s latest image model embedded into ChatGPT. It can create photorealistic outputs, take images as inputs and transform them, and follow detailed instructions, including generating text into images. OpenAI has confirmed that the model is autoregressive, and uses the same architecture as the GPT-4o LLM. The model essentially generates images in the same way as the LLM generates text. This enables improved capabilities in rendering text on top of images, more granular image editing, and editing images based on image inputs. + +### How to access 4o Image Generation + +Access 4o Image Generation in the ChatGPT application (web or mobile) by prompting with text, or by selecting “Create an image” from the tools. The model is also accessible in Sora, or via OpenAI API with gpt-image-1. + +Text prompting: “Generate an image of…” +![text_prompt](../../img/4o-image-generation/text_prompt_3.JPG) + +Selecting "Create an image" from the toolbox: +![tool_select](../../img/4o-image-generation/tool_select.JPG) + +With the OpenAI API [OpenAI API](https://platform.openai.com/docs/guides/images-vision?api-mode=responses). +![Screenshot of the OpenAI API documentation page](../../img/4o-image-generation/image_gen_API.JPG) + +**The 4o image generation is accessible with these models:** +- gpt-4o +- gpt-4o-mini +- gpt-4.1 +- gpt-4.1-mini +- gpt-4.1-nano +- o3 + +### What can the 4o image generation model do? + +**Create images in aspect ratios of:** +- Square 1:1 1024x1024 (default) +- Landscape 3:2 1536x1024 +- Portrait 2:3 1024x1536 + +**Use reference images in the file types:** +- PNG +- JPEG +- WEBP +- Non-animated GIF + +**Edit images by:** + +**Inpainting** (only images generated in that chat) +![Example of inpainting.](../../img/4o-image-generation/inpainting_combined.png) + +**Prompting** (“what would it look like during the winter?”) +![Example image before text prompt revision](../../img/4o-image-generation/text_edit_combined.png) + +**Reference images & transfer the style** +The model is very good at retexturing and changing image styles when provided a reference image. The ability to ‘Ghiblify’ images went viral when the model was launched. + +![Image of Sam Altman and Jony Ive](../../img/4o-image-generation/sam_and_jony.png) ![Image of Sam Altman and Jony Ive Ghiblified](../../img/4o-image-generation/sam_and_jony_ghiblified.png) + +**Transparent backgrounds (png)** +Needs to be specified in the prompt by mentioning “transparent PNG” or “transparent background”. +![Example of a sticker with a transparent background, suitable for use as a PNG.](../../img/4o-image-generation/inpainting_combined.png) + +**Generate text in images** +![An image of the DAIR.AI Academy text generated with 4o Image Generation.](../../img/4o-image-generation/text_in_images.png) + +**Generate the same image in different styles** +![Photorealistic teapot.](../../img/4o-image-generation/teapot_1.png) ![Teapot in the style of Van Gogh.](../../img/4o-image-generation/teapot_2.png) + +**Combine images** +![Meerkat and a T-shirt](../../img/4o-image-generation/combine_images.png) +![Combined.](../../img/4o-image-generation/combined.png) + +### Prompting Tips for 4o Image Generation + +#### Detailed prompts give you more control. +If your prompt is not descriptive, ChatGPT often fills in additional details. This can be useful for quick tests or exploration, but if you have something specific in mind, write a detailed and descriptive prompt. + + + If you are struggling with descriptions, ask o3 to write 3 varied prompts optimized for 4o image generation based on your own description, with the details filled in. Then select the parts you like most and use that as the prompt. + + +#### Lighting, Composition, Style +Define these in your prompt if you have a specific goal in mind. The model is quite good at estimating them based on the general information in a prompt, but when you need specific results you must define them accurately. If you want the image to resemble a photo taken with a specific camera and lens type, add it to the prompt. + +Other details to consider: +- Subject +- Medium +- Environment +- Color +- Mood + +#### Select different models for different image generation tasks +4o is fastest for one-off edits or simple image generation tasks. + +If you expect the generation to take multiple steps, use a reasoning model. If you are iteratively adding or removing elements when doing creative exploration, the reasoning model will perform better at keeping the consistent elements of an image ‘in mind’. E.g., your image needs a specific style, font, colors, etc. You can find an example in this [link to a thumbnail creation process](https://chatgpt.com/share/68404206-5710-8007-8262-6efaba15a852). + +#### Image aspect ratio +It helps to specify the aspect ratio you want in your prompt, even when using a reference image. The model can select the correct aspect ratio if it has clues in the prompt (e.g. images of rockets are often 2:3), but defaults to 1:1 when not clearly instructed otherwise. + +*Prompt to test:* +``` +A high-resolution photograph of a majestic Art Deco-style rocket inspired by the scale and grandeur of the SpaceX Starship, standing on a realistic launch pad during golden hour. The rocket has monumental vertical lines, stepped geometric ridges like the American Radiator Building, and a mirror-polished metallic surface reflecting a vivid sunset sky. The rocket is photorealistic, awe-inspiring, and elegant, bathed in cinematic warm light with strong shadows and a vast landscape stretching to the horizon. +``` + +![A photorealistic, Art Deco-style rocket on a launchpad at sunset, generated from the provided test prompt.](../../img/4o-image-generation/art_deco_starship.png) + +#### Be aware of consistency in the model’s generations +This can be good if you want to change minor details on an image, but a challenge if you want to be more creative. The model ‘remembers’ images generated in the same chat. For independent and different image generation tasks it's good to start fresh in a new chat every time. + + + If the first few iterations on an image are not even close to what you were going for, **ask the model to output the prompt that was used in generating the image**, and try to see if you spot the misplaced emphasis. Then start a new chat and continue generating with a revised prompt. + + + +#### Generating multiple images with one prompt +Reasoning models such as o3 and o4-mini can generate multiple images with a single prompt, but this needs to be explicitly stated in the prompt, and does not always work. Example: [Chat Link](https://chatgpt.com/share/68496cf8-0120-8007-b95f-25a940298c09) + +*Prompt to test:* +``` +Generate an image of [decide this yourself], in the style of an oil painting by Van Gogh. Use a 3:2 aspect ratio. Before you generate the image, recite the rules of this image generation task. Then send the prompt to the 4o Image Generation model. Do not use DALL-E 3. If the 4o Image Generation model is timed out, tell me how much time is left until you can queue the next prompt to the model. + +Rules: +- Use only the aspect ratio mentioned earlier. +- Output the prompt you sent to the image generation model exactly as you sent it, do this every time in between image generations +- Create three variations with a different subject, but the same rules. After an image is generated, immediately start creating the next one, without ending your turn or asking me for confirmation for moving forward. +``` + +#### Enforcing strict prompt adherence is difficult +Prompts with multiple components sometimes get changed somewhere between the chat model and the 4o Image Generation model. If you have generated multiple images in the same chat, the previously generated images may affect outputs despite the changes you make in the prompts. + +### Limitations +- ChatGPT can change your initial prompt before it is sent to the image 4o Image Generation model. This is more likely to happen in multi-turn generation tasks, if the prompt lacks description, or when using a long prompt. +- It is not clear what the generation amount per user or subscription are. OpenAI has stated that the system is dynamic, so it likely depends on your subscription, and server load in your region. +- Generations on the free tier often get queued, and can take a long time to generate. +- Generated images may have a yellow tint. +- Generated images may be too dark if dark elements are in the prompt or reference image(s). +- Generation refusals: The image generation is subject to the same general rules as the rest of OpenAI’s services: [Usage Policies](https://openai.com/policies/usage-policies/). If prohibited subjects are detected inside the prompt, reference images or the generated output image, the generation often gets refused and the partially generated image is deleted. +- No upscaling feature inside ChatGPT. +- The model can make errors in cropping, and output images with only a part of the generated image. +- Hallucinations similar to LLMs. +- Generating images with many concepts or individual subjects at once is difficult. +- Generating images which visualize graph data is not precise. +- Difficulty in generating non-Latin language text in images. +- Requests to edit specific portions of an image generation, such as typos are not always effective. +- Model naming: This model has been given multiple names, which can get confusing: Imagegen, gpt-image-1, 4o Image Generation, image_gen.text2im… +- In some cases the aspect ratio will be wrong, regardless of being specified in the prompt. + +### Tips & Best Practices + + + **Use ChatGPT Personalization:** To avoid switching to the older DALL-E 3 model, add this instruction to the ‘What traits should ChatGPT have’ section in your settings: + > "Never use the DALL-E tool. Always generate images with the new image gen tool. If the image tool is timed out, tell me instead of generating with DALL-E." + + +- If you hit the generation limit, ask ChatGPT how much time is left until you can generate more images. The backend has this information available for the user. +- Image generation and editing works best when you use clear terms like "draw" or "edit" in your prompt. +- Using reasoning models to generate images gives you the added benefit of seeing how the model reasons through the prompt creation and revision process. Open the thinking traces to see what the model is focusing on. + +### Use Cases to try + +- **Generating a logo:** Use reference images and detailed descriptions. This is often a multi-turn task, so use a reasoning model. [Example Chat](https://chatgpt.com/share/6848aaa7-be7c-8007-ba6c-c69ec1eb9c25). +- **Generating marketing assets:** Use your existing visual assets as references and prompt the model to change text, products, or environments. +- **Generating coloring book pages:** Use the 2:3 aspect ratio to create custom coloring book pages. [Example Chat](https://chatgpt.com/share/684ac538-25c4-8007-861a-3fe682df47ab). +- **Sticker images:** Remember to mention a transparent background. [Example Chat](https://chatgpt.com/share/684960b3-dc00-8007-bf16-adfae003dde5). +- **Material transfer:** Use a reference image for a material and apply it to a subject from a second image or prompt. [Example Chat](https://chatgpt.com/share/684ac8d5-e3f8-8007-9326-ea6291a891e3). +- **Interior design:** Take a picture of a room and prompt for specific furniture and feature changes. [Example Chat](https://chatgpt.com/share/684ac69f-6760-8007-83b9-2e8094e5ae31). + +### Prompt & Chat Examples +- [Course thumbnail image generation process](https://chatgpt.com/share/68404206-5710-8007-8262-6efaba15a852) +- [Subject revision in multi-turn image generation](https://chatgpt.com/share/6848a5e1-3730-8007-8a16-56360794722c) +- [Textured icon on a transparent background](https://chatgpt.com/share/6848a7ab-0ab4-8007-843d-e19e3f7daec8) +- [Logo design for a drone flower delivery start-up](https://chatgpt.com/share/6848aaa7-be7c-8007-ba6c-c69ec1eb9c25) +- [White outline sticker of a raccoon eating a strawberry](https://chatgpt.com/share/684960b3-dc00-8007-bf16-adfae003dde5) +- [Generate multiple images with one prompt](https://chatgpt.com/share/68496cf8-0120-8007-b95f-25a940298c09) +- [Editing an image with a text prompt (summer to winter)](https://chatgpt.com/share/684970b8-9718-8007-a591-db40ad5f13ae) +- [A bumblebee napping in the style of Studio Ghibli](https://chatgpt.com/share/68497515-62e8-8007-b927-59d4b5e9a876) +- [Interior design by adding furniture to your own images](https://chatgpt.com/share/684ac69f-6760-8007-83b9-2e8094e5ae31) +- [Material transfer using two reference images](https://chatgpt.com/share/684ac8d5-e3f8-8007-9326-ea6291a891e3) + +### References +- [Introducing 4o Image Generation](https://openai.com/index/introducing-4o-image-generation/) +- [Addendum to GPT-4o System Card: Native Image Generation](https://cdn.openai.com/11998be9-5319-4302-bfbf-1167e093f1fb/Native_Image_Generation_System_Card.pdf) +- [Gpt-image-1 in the OpenAI API](https://openai.com/index/image-generation-api/) +- [OpenAI Docs: gpt-image-1](https://platform.openai.com/docs/models/gpt-image-1) +- [OpenAI Docs: Image Generation Guide](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) +- [More prompt and image examples from OpenAI](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1&gallery=open) + +import { Callout } from 'nextra/components' diff --git a/pages/guides/_meta.en.json b/pages/guides/_meta.en.json index 2c1e56fa2..8629717c4 100644 --- a/pages/guides/_meta.en.json +++ b/pages/guides/_meta.en.json @@ -1,3 +1,5 @@ { - "optimizing-prompts": "Optimizing Prompts" + "optimizing-prompts": "Optimizing Prompts", + "deep-research": "OpenAI Deep Research", + "reasoning-llms": "Reasoning LLMs" } \ No newline at end of file diff --git a/pages/guides/context-engineering-guide.en.mdx b/pages/guides/context-engineering-guide.en.mdx new file mode 100644 index 000000000..964464da7 --- /dev/null +++ b/pages/guides/context-engineering-guide.en.mdx @@ -0,0 +1,289 @@ +# **Context Engineering Guide** + +## **Table of Contents** + +* [What is Context Engineering?](#what-is-context-engineering) +* [Context Engineering in Action](#context-engineering-in-action) + * [System Prompt](#system-prompt) + * [Instructions](#instructions) + * [User Input](#user-input) + * [Structured Inputs and Outputs](#structured-inputs-and-outputs) + * [Tools](#tools) + * [RAG & Memory](#rag--memory) + * [States & Historical Context](#states--historical-context) +* [Advanced Context Engineering](#advanced-context-engineering-wip) +* [Resources](#resources) + + + +## **What is Context Engineering?** + +A few years ago, many, even top AI researchers, claimed that prompt engineering would be dead by now. + +Obviously, they were very wrong, and in fact, prompt engineering is now even more important than ever. It is so important that it is now being rebranded as ***context engineering***. + +Yes, another fancy term to describe the important process of tuning the instructions and relevant context that an LLM needs to perform its tasks effectively. + +Much has been written already about context engineering ([Ankur Goyal](https://x.com/ankrgyl/status/1913766591910842619), [Walden Yan](https://cognition.ai/blog/dont-build-multi-agents), [Tobi Lutke](https://x.com/tobi/status/1935533422589399127), and [Andrej Karpathy](https://x.com/karpathy/status/1937902205765607626)), but I wanted to write about my thoughts on the topic and show you a concrete step-by-step guide putting context engineering into action in developing an AI agent workflow. + +I am not entirely sure who coined context engineering, but we will build on this figure from [Dex Horthy](https://x.com/dexhorthy/status/1933283008863482067) that briefly explains a bit about what context engineering is. + +![A diagram showing overlapping aspects of context engineering](../../img/context-engineering-guide/context-engineering-diagram.jpg) + +I like the term context engineering as it feels like a broader term that better explains most of the work that goes into prompt engineering, including other related tasks. + +The doubt about prompt engineering being a serious skill is that many confuse it with blind prompting (a short task description you use in an LLM like ChatGPT). In blind prompting, you are just asking the system a question. In prompt engineering, you have to think more carefully about the context and structure of your prompt. Perhaps it should have been called context engineering from early on. + +Context engineering is the next phase, where you architect the full context, which in many cases requires going beyond simple prompting and into more rigorous methods to obtain, enhance, and optimize knowledge for the system. + +From a developer's point of view, context engineering involves an iterative process to optimize instructions and the context you provide an LLM to achieve a desired result. This includes having formal processes (e.g., eval pipelines) to measure whether your tactics are working. + +Given the fast evolution of the AI field, I suggest a broader definition of context engineering: ***the process of designing and optimizing instructions and relevant context for the LLMs and advanced AI models to perform their tasks effectively.*** This encompasses not only text-based LLMs but also optimizing context for multimodal models, which are becoming more widespread. This can include all the prompt engineering efforts and the related processes such as: + +* Designing and managing prompt chains (when applicable) +* Tuning instructions/system prompts +* Managing dynamic elements of the prompt (e.g., user inputs, date/time, etc.) +* Searching and preparing relevant knowledge (i.e., RAG) +* Query augmentation +* Tool definitions and instructions (in the case of agentic systems) +* Preparing and optimizing few-shot demonstrations +* Structuring inputs and outputs (e.g., delimiters, JSON schema) +* Short-term memory (i.e., managing state/historical context) and long-term memory (e.g., retrieving relevant knowledge from a vector store) +* And the many other tricks that are useful to optimize the LLM system prompt to achieve the desired tasks. + +In other words, what you are trying to achieve in context engineering is optimizing the information you are providing in the context window of the LLM. This also means filtering out noisy information, which is a science on its own, as it requires systematically measuring the performance of the LLM. + +Everyone is writing about context engineering, but here we are going to walk you through a concrete example of what context engineering looks like when building AI agents. + + +## **Context Engineering in Action** + +Let’s look at a concrete example of some recent context engineering work I did for a multi-agent deep research application I built for personal use. + +I built the agentic workflow inside of n8n, but the tool doesn’t matter. The complete agent architecture I built looks like the following: + +![An image of an n8n workflow displaying a multi-agent deep research application](../../img/context-engineering-guide/context-engineering-workflow.jpg) + +The Search Planner agent in my workflow is in charge of generating a search plan based on the user query. + +### **System Prompt** + +Below is the system prompt I have put together for this subagent: + +``` +You are an expert research planner. Your task is to break down a complex research query (delimited by ) into specific search subtasks, each focusing on a different aspect or source type. + +The current date and time is: {{ $now.toISO() }} + +For each subtask, provide: +1. A unique string ID for the subtask (e.g., 'subtask_1', 'news_update') +2. A specific search query that focuses on one aspect of the main query +3. The source type to search (web, news, academic, specialized) +4. Time period relevance (today, last week, recent, past_year, all_time) +5. Domain focus if applicable (technology, science, health, etc.) +6. Priority level (1-highest to 5-lowest) + +All fields (id, query, source_type, time_period, domain_focus, priority) are required for each subtask, except time_period and domain_focus which can be null if not applicable. + +Create 2 subtasks that together will provide comprehensive coverage of the topic. Focus on different aspects, perspectives, or sources of information. + +Each substask will include the following information: + +id: str +query: str +source_type: str # e.g., "web", "news", "academic", "specialized" +time_period: Optional[str] = None # e.g., "today", "last week", "recent", "past_year", "all_time" +domain_focus: Optional[str] = None # e.g., "technology", "science", "health" +priority: int # 1 (highest) to 5 (lowest) + +After obtaining the above subtasks information, you will add two extra fields. Those correspond to start_date and end_date. Infer this information given the current date and the time_period selected. start_date and end_date should use the format as in the example below: + +"start_date": "2024-06-03T06:00:00.000Z", +"end_date": "2024-06-11T05:59:59.999Z", +``` + +There are many parts to this prompt that require careful consideration about what exact context we are providing the planning agent to carry out the task effectively. As you can see, it’s not just about designing a simple prompt or instruction; this process requires experimentation and providing important context for the model to perform the task optimally. + +Let’s break down the problem into core components that are key to effective context engineering. + +### **Instructions** + +The instruction is the high-level instructions provided to the system to instruct it exactly what to do. + +``` +You are an expert research planner. Your task is to break down a complex research query (delimited by ) into specific search subtasks, each focusing on a different aspect or source type. +``` + +Many beginners and even experienced AI developers would stop here. Given that I shared the full prompt above, you can appreciate how much more context we need to give the system for it to work as we want. That’s what context engineering is all about; it informs the system more about the problem scope and the specifics of what exactly we desire from it. + +### **User Input** + +The user input wasn’t shown in the system prompt, but below is an example of how it would look. + +``` + What's the latest dev news from OpenAI? +``` + +Notice the use of the delimiters, which is about structuring the prompt better. This is important to avoid confusion and adds clarity about what the user input is and what things we want the system to generate. Sometimes, the type of information we are inputting is related to what we want the model to output (e.g., the query is the input, and subqueries are the outputs). + +### **Structured Inputs and Outputs** + +In addition to the high-level instruction and the user input, you might have noticed that I spent a considerable amount of effort on the details related to the subtasks the planning agent needs to produce. Below are the detailed instructions I have provided to the planning agent to create the subtasks given the user query. + +``` +For each subtask, provide: +1. A unique string ID for the subtask (e.g., 'subtask_1', 'news_update') +2. A specific search query that focuses on one aspect of the main query +3. The source type to search (web, news, academic, specialized) +4. Time period relevance (today, last week, recent, past_year, all_time) +5. Domain focus if applicable (technology, science, health, etc.) +6. Priority level (1-highest to 5-lowest) + +All fields (id, query, source_type, time_period, domain_focus, priority) are required for each subtask, except time_period and domain_focus which can be null if not applicable. + +Create 2 subtasks that together will provide comprehensive coverage of the topic. Focus on different aspects, perspectives, or sources of information. +``` + +If you look closely at the instructions above, I have decided to structure a list of the required information I want the planning agent to generate, along with some hints/examples to better help steer the data generation process. This is crucial to give the agent additional context on what is expected. As an example, if you don’t tell it that you want the priority level to be on a scale of 1-5, then the system might prefer to use a scale of 1-10. Again, this context matters a lot\! + +Next, let’s talk about structured outputs. In order to get consistent outputs from the planning agent, we are also providing some context on the subtask format and field types that we expect. Below is the example we are passing as additional context to the agent. This will provide the agent with hints and clues on what we expect as the output: + +``` +Each substask will include the following information: + +id: str +query: str +source_type: str # e.g., "web", "news", "academic", "specialized" +time_period: Optional[str] = None # e.g., "today", "last week", "recent", "past_year", "all_time" +domain_focus: Optional[str] = None # e.g., "technology", "science", "health" +priority: int # 1 (highest) to 5 (lowest) +``` + +In addition to this, inside of n8n, you can also use a tool output parser, which essentially is going to be used to structure the final outputs. The option I am using is providing a JSON example as follows: + +``` +{ + "subtasks": [ + { + "id": "openai_latest_news", + "query": "latest OpenAI announcements and news", + "source_type": "news", + "time_period": "recent", + "domain_focus": "technology", + "priority": 1, + "start_date": "2025-06-03T06:00:00.000Z", + "end_date": "2025-06-11T05:59:59.999Z" + }, + { + "id": "openai_official_blog", + "query": "OpenAI official blog recent posts", + "source_type": "web", + "time_period": "recent", + "domain_focus": "technology", + "priority": 2, + "start_date": "2025-06-03T06:00:00.000Z", + "end_date": "2025-06-11T05:59:59.999Z" + }, +... +} +``` + +Then the tool will automatically generate the schema from these examples, which in turn allows the system to parse and generate proper structured outputs, as shown in the example below: + +``` +[ + { + "action": "parse", + "response": { + "output": { + "subtasks": [ + { + "id": "subtask_1", + "query": "OpenAI recent announcements OR news OR updates", + "source_type": "news", + "time_period": "recent", + "domain_focus": "technology", + "priority": 1, + "start_date": "2025-06-24T16:35:26.901Z", + "end_date": "2025-07-01T16:35:26.901Z" + }, + { + "id": "subtask_2", + "query": "OpenAI official blog OR press releases", + "source_type": "web", + "time_period": "recent", + "domain_focus": "technology", + "priority": 1.2, + "start_date": "2025-06-24T16:35:26.901Z", + "end_date": "2025-07-01T16:35:26.901Z" + } + ] + } + } + } +] +``` + +This stuff looks complicated, but many tools today enable structured output functionalities out of the box, so it’s likely you won’t need to implement it yourself. n8n makes this part of context engineering a breeze. This is one underrated aspect of context engineering that I see many AI devs ignore for some reason. Hopefully, context engineering sheds more light on these important techniques. This is a really powerful approach, especially when your agent is getting inconsistent outputs that need to be passed in a special format to the next component in the workflow. + +### **Tools** + +We are using n8n to build our agent, so it’s easy to put in the context the current date and time. You can do it like so: + +``` +The current date and time is: {{ $now.toISO() }} +``` + +This is a simple, handy function that’s being called in n8n, but it’s typical to build this as a dedicated tool that can help with making things more dynamic (i.e., only get the date and time if the query requires it). That’s what context engineering is about. It forces you, the builder, to make concrete decisions about what context to pass and when to pass it to the LLM. This is great because it eliminates assumptions and inaccuracies from your application. + +The date and time are important context for the system; otherwise, it tends not to perform well with queries that require knowledge of the current date and time. For instance, if I asked the system to search for the latest dev news from OpenAI that happened last week, it would just guess the dates and time, which would lead to suboptimal queries and, as a result, inaccurate web searches. When the system has the correct date and time, it can better infer date ranges, which are important for the search agent and tools. I added this as part of the context to allow the LLM to generate the date range: + +``` +After obtaining the above subtasks information, you will add two extra fields. Those correspond to start_date and end_date. Infer this information given the current date and the time_period selected. start_date and end_date should use the format as in the example below: + +"start_date": "2024-06-03T06:00:00.000Z", +"end_date": "2024-06-11T05:59:59.999Z", +``` + +We are focusing on the planning agent of our architecture, so there aren’t too many tools we need to add here. The only other tool that would make sense to add is a retrieval tool that retrieves relevant subtasks given a query. Let’s discuss this idea below. + +### **RAG & Memory** + +This first version of the deep research application I have built doesn’t require the use of short-term memory, but we have built a version of it that caches subqueries for different user queries. This is useful to achieve some speed-ups/optimizations in the workflow. If a similar query was already used by a user before, it is possible to store those results in a vector store and search over them to avoid the need to create a new set of subqueries for a plan that we already generated and exists in the vector store. Remember, every time you call the LLM APIs, you are increasing latency and costs. + +This is clever context engineering as it makes your application more dynamic, cheaper, and efficient. You see, context engineering is not just about optimizing your prompt; it’s about choosing the right context for the goals you are targeting. You can also get more creative about how you are maintaining that vector store and how you pull those existing subtasks into context. Creative and novel context engineering is the moat\! + +### **States & Historical Context** + +We are not showing it in v1 of our deep research agent, but an important part of this project was to optimize the results to generate the final report. In many cases, the agentic system might need to revise all or a subset of the queries, subtasks, and potentially the data it’s pulling from the web search APIs. This means that the system will take multiple shots at the problem and needs access to the previous states and potentially all the historical context of the system. + +What does this mean in the context of our use case? In our example, it could be giving the agent access to the state of the subtasks, the revisions (if any), the past results from each agent in the workflow, and whatever other context is necessary to help in the revision phase. For this type of context, what we are passing would depend on what you are optimizing for. Lots of decision-making will happen here. Context engineering isn’t always straightforward, and I think you can start to imagine how many iterations this component will require. This is why I continue to emphasize the importance of other areas, such as evaluation. If you are not measuring all these things, how do you know whether your context engineering efforts are working? + + +## **Advanced Context Engineering \[WIP\]** + +There are many other aspects of context engineering we are not covering in this article, such as context compression, context management techniques, context safety, and evaluating context effectiveness (i.e., measuring how effective that context is over time). We will be sharing more ideas about these topics in future articles. + +Context can dilute or become inefficient (i.e., be filled with stale and irrelevant information), which requires special evaluation workflows to capture these issues. + +I expect that context engineering continues to evolve as an important set of skills for AI developers/engineers. Beyond manual context engineering, there are also opportunities to build methods that automate the processing of effective context engineering. I’ve seen a few tools that have attempted this, but there needs to be more progress in this area. + + + This content is based on our new course ["Building Effective AI Agents with n8n"](https://dair-ai.thinkific.com/courses/agents-with-n8n), which provides comprehensive insights, downloadable templates, prompts, and advanced tips into designing and implementing agentic systems. + + Use code PROMPTING20 for 20% off Pro membership. + + +## **Resources** + +Below are some recommended readings from other folks who have recently written about context engineering: + +* [https://rlancemartin.github.io/2025/06/23/context\_engineering/](https://rlancemartin.github.io/2025/06/23/context_engineering/) +* [https://x.com/karpathy/status/1937902205765607626](https://x.com/karpathy/status/1937902205765607626) +* [https://www.philschmid.de/context-engineering](https://www.philschmid.de/context-engineering) +* [https://simple.ai/p/the-skill-thats-replacing-prompt-engineering?](https://simple.ai/p/the-skill-thats-replacing-prompt-engineering?) +* [https://github.com/humanlayer/12-factor-agents](https://github.com/humanlayer/12-factor-agents) +* [https://blog.langchain.com/the-rise-of-context-engineering/](https://blog.langchain.com/the-rise-of-context-engineering/) + + +import { Callout } from 'nextra/components' diff --git a/pages/guides/deep-research.en.mdx b/pages/guides/deep-research.en.mdx new file mode 100644 index 000000000..bb00fbabd --- /dev/null +++ b/pages/guides/deep-research.en.mdx @@ -0,0 +1,190 @@ +## OpenAI Deep Research Guide + +