Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. Here, I'd like to discuss Reflection. It's relatively quick to implement, and I've seen it lead to surprising performance gains. You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection. Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. Then, we can prompt it to reflect on its own output, perhaps as follows: Here’s code intended for task X: [previously generated code] Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it. Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback, and ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions. And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement. Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses. Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results. If you’re interested in learning more about reflection, I recommend: - Self-Refine: Iterative Refinement with Self-Feedback, by Madaan et al. (2023) - Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, by Gou et al. (2024) [Original text: https://lnkd.in/g4bTuWtU ]
How to Make LLM Output More Human-Like
Explore top LinkedIn content from expert professionals.
Summary
Making large language model (LLM) output more human-like means training and guiding AI systems to produce responses that reflect human reasoning, creativity, and social awareness, rather than just following rigid patterns. This involves techniques that encourage the model to think critically, self-reflect, and simulate conversations between multiple perspectives for richer, nuanced answers.
- Implement self-reflection: Prompt the model to review and critique its own responses, allowing it to spot errors and suggest improvements before presenting the final output.
- Simulate agentic debate: Set up prompts where multiple agents challenge each other’s answers and work toward agreement, encouraging deeper reasoning and more creative solutions.
- Refine with alignment techniques: Adjust the model’s instructions, training data, and rules so that its responses match human preferences for correctness and natural communication.
-
-
Making an eye-popping AI demo is easy. Production is hard. Just ask the Google Gemini team! The problem is alignment. Here are five potential solutions. Alignment is the art and science of getting LLMs to answer more like humans do, or more “correctly.” Often, alignment is what separates a “cool demo” where you control the script from a production system where you don’t. Here’s a quick roundup of alignment techniques arranged roughly from “common” to “exotic.” 1. Prompt engineering: Improve the “system prompt” of the model. 95% of AI applications live here, maybe for good reason. It’s super easy to do, and can get you surprisingly far. Here are some best practices. https://lnkd.in/gnXQtGXC 2. Retrieval Augmented Generation: Extract some content from a reference document or database, stick it in the model prompt dynamically and ask the LLM answer based on it. Tools like LangChain and LlamaIndex are leading in this space. 3. Fine-tuning: Use a dataset with example inputs and outputs to make the model better at your specific task. This requires more skill, because you are changing model weights, but it’s getting easier thanks to free tools like Axolotl (https://lnkd.in/gRa7VArS) and Trainer from HuggingFace (https://lnkd.in/gfijn6KA) as well as many commercial offerings. There are also some interesting advances with the technique, like the move from RLHF to DPO, which I’ve written about 4. Ensembling: Run several instances of your model (or several models) in parallel, and have them vote. It’s effective, not (very) technically hard, but expensive computationally. I wrote about this a couple of weeks ago as well 5. Representation Learning: Add vectors directly to the model activations to change the model to be happier, more honest, less power-seeking, etc. Thanks to Ilya Kulyatin for bringing this fascinating approach to my attention. Representation learning paper: https://lnkd.in/gTMgjqTD and code https://lnkd.in/gtfkQHNP Alignment was what set apart GPT3 (~thousands of users) from ChatGPT (~180M users). It is the bridge that takes us from a cool demo to a robust production AI system. It just might make the difference in your use case. Which alignment techniques are you using? Which techniques did I miss? If you found this helpful, please repost.
-
🚀 My favorite prompting trick that you probably haven't seen: Simulating Agentic Behavior With a Prompt 🤖 After spending now likely thousands of hours prompting #LLMs, one thing I've found that can vastly improve the quality of outputs is something I haven't seen talked about much. ✨ "Instantiate two agents competing to find the real answer to the given problem and poke holes in the other agent's answers until they agree, which they are loathe to do." ✨ This works especially well with #CLAUDE3 and #Opus. For a more advanced version that often works even better: ✨"Instantiate two agents competing to find the real answer and poke holes in the other's answer until they agree, which they are loathe to do. Each agent has unique skills and perspective and thinks about the problem from different vantage points. Agent 1: Top-down agent Agent 2: Bottom-up agent Both agents: Excellent at the ability to think counter factually, think step by step, think from first principles, think laterally, think about second order implications, are highly skilled at simulating in their mental model and thinking critically before answering, having looked at the problem from many directions." ✨ This often solves the following issues you will encounter with LLMs: 1️⃣ Models often will pick the most likely answer without giving it proper thought, and will not go back to reconsider. With these kinds of prompts, the second agent forces this, and the result is a better-considered answer. 2️⃣ Continuing down the wrong path. There's an inertia to an answer, and the models can often get stuck, biased toward a particular kind of wrong answer or previous mistake. This agentic prompting improves this issue significantly. 3️⃣ Overall creativity of output and solution suggestions. Having multiple agents considering solutions results in the model considering solutions that might otherwise be difficult to elicit from the model. If you haven't tried something like this and have a particularly tough problem, try it out and let me know if it helps!
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development