Claude 3.5 core coding prompt revealed, the whole network of programmers are excited! Four-step tuning method, the latest V2 version released

2024-07-16

New Intelligence Report

Editor: Peach

【New Wisdom Introduction】The core Claude 3.5 coding system tips are popular in the Reddit community. Just now, the original author released the evolved second version, and some netizens have already added it to their workflow.

A system tip about the Claude Sonnet 3.5 core code that has been going viral on Reddit recently!

A user named ssmith12345uk shared his experience in practicing Claude code and constantly adjusting system prompts to optimize the model under the r/ClaudeAI main board.

He said that the system prompts incorporate some ideas from Anthropic Meta-Prompt and solve some problems encountered before.

Finally, he put out all the prompt words.

Developers in the AI community have been forwarding and collecting this post, saying that this is exactly the tip that programmers want most!

Netizens summarized this: ReAct + Planning + XML is all you need.

Some netizens who benefited from this also said that this tip was very helpful in their own projects.

Just yesterday, the original author released an evolved version of the V2 prompt in the Reddit community, and provided detailed instructions and explanations.

Before explaining these system prompt tips, let me first answer a question from netizens - where to input?

You need to create a project (Pro subscription user) and then you can enter the page for entering prompt instructions.

Sonnet 3.5's most powerful coding tips, 4-step tutorial

Here, the V1 and V2 system prompts are put together so that everyone can feel the differences after the upgrade more intuitively.

The system prompts of V2 are shown in the right picture below. Compared with V1, it is basically a minor revision.

In the latest version, the model is still guided through four steps to complete CoT reasoning - code review, planning, output, and security review.

In the first paragraph, the role definition of Claude 3.5 remains unchanged.

You are a web development expert proficient in CSS, JavaScript, React, Tailwind, Node.JS, and Hugo/Markdown.

However, in the second sentence, some fine-tuning was made - "Don't make unnecessary apologies. Review the conversation history to avoid repeating previous mistakes."

Next, Claude 3.5 is required to break down the task into independent steps in the dialogue and, after each stage, suggest a small test to make sure everything is on the right track.

Provide code only when needed to illustrate an issue or when explicitly requested. If you can answer without code, that's best.

However, further elaboration will be requested if necessary.

The next step is code review - before writing or proposing code, do a comprehensive code review of the existing code and describe how it works between tags.

After completing the code review, you need to build a change plan between tags, asking for additional source files or documents that may be relevant.

Follow the DRY (Don't Repeat Yourself) principle to avoid code duplication and balance the maintainability and flexibility of the code.

Also, in this step, possible trade-offs and implementation choices are proposed, and relevant frameworks and libraries are considered and suggested. If we haven’t agreed on a plan, stop at this step.

Once agreement is reached, code is generated between the tags.

Here, the Reddit author also reminded Claude 3.5 of the things that should be paid attention to when outputting code:

Pay attention to variable names, identifiers, and string literals, and check that they are copied exactly from the original file. Use double colons and uppercase letters (e.g., ::UPPERCASE::) to indicate items named by convention. Maintain the existing coding style and use idioms appropriate to the language. When generating code blocks, specify the programming language after the first backtick: for example: ```JavaScript, ```Python

Finally, a security and operational review of the planning and output is required, paying special attention to items that could compromise data or introduce vulnerabilities.

For sensitive changes (e.g. input handling, currency calculations, authentication), conduct a thorough review and provide your analysis within the tag.

Author Analysis

Next, in a long explanation, the Reddit author uses ‍⬛ to indicate "superstition" and to indicate something he is sure of.

This prompt is an example of a guided “thought chaining” prompt, telling Claude what steps to take and in what order, and is used as a system prompt (the first set of instructions the model receives).

The use of XML tags to separate steps was inspired by Anthropic meta hints.

The author believes that Claude is particularly sensitive to XML tags, which may be related to model training. Therefore, he prefers to process HTML separately or at the end of the session‍⬛.

项目地址：https://github.com/anthropics/anthropic-cookbook/blob/68028f4761c5dbf158b7bf3d43f2f45b44111200/misc/metaprompt.ipynb#

The guided thinking chain follows these steps: code review, planning, output, security review.

1. Code Review

Bring structured code analysis into context to inform subsequent planning.

The goal is to prevent LLM from making local changes to the code without considering the wider context. The authors are confident that this approach works in their testing.

2 Planning

This step produces a high-level design and implementation plan that can be reviewed before generating code.

The "stop" here avoids filling the context with generated code that is not needed and does not meet our needs, or we modify it back and forth repeatedly.

It will usually present some relevant and appropriate options.

At this stage, you can dive into the details of the plan to further refine it (e.g., tell me more about step 3, can we reuse implementation Y, show me a code snippet, what to think about the library, etc.).

3 Output

Once the plan is agreed upon, we can move on to the code generation phase.

Regarding the variable naming hint, it is because the author often encounters the problem of lost or hallucinated variable names in the regenerated code during long sessions. The current hint improvement seems to have solved this problem.

At some point I might export old conversations and do some statistical analysis, but for now I’m happy with how well this works.

The code fencing prompt is because the author switched to a front-end that could not infer correct highlighting, and verified that this was the correct approach.

4 Security Review

The author prefers to perform a security review after the fact and finds this step very helpful.

It provides a “second pair of eyes” and may suggest new improvements.

Answering questions from netizens

Finally, the Reddit author also responded to questions from netizens.

Should I use this prompt on Claude.ai? / Where should I enter the prompt?

We don't know exactly what the official system hint for Sonnet 3.5 will be, but assuming Pliny, who previously leaked Claude's official hint, is correct, it certainly helps. I speculate that Anthropic's system hint may include an automated CoT, but that may not be the case, or the input may be automatically processed through a meta hint ‍⬛.

However, use this tip anyway and you'll get good results, unless you're using Artifacts.

Assuming again that Pliny's excerpt about Artifacts is correct, I strongly recommend turning off Artifacts when doing non-trivial or non-Artifacts-related coding tasks.

If you use a tool that allows you to directly set system prompts, the author reminds you to remember to adjust the temperature parameter.

We don't need such complicated prompts now/I entered a lot of code into Sonnet and it worked directly

Automated Chain of Thoughts (CoR)/default prompts do solve a lot of problems, but test it against a simple "You are a helpful AI" prompt.

The authors claim to have conducted such a test and found that simple prompts were less effective when dealing with complex problems.

He also mentioned that early tests showed the sensitivity of the system prompts, that different prompts could lead to significantly different results, and more batch tests will be considered in the future to further verify this.

He acknowledged that Sonnet 3.5 excels at basic tasks, but stressed that even for high-performance models, proper guidance still helps.

This prompt is too long and will cause the AI to hallucinate/forget/lose coherence/focus

The authors measured this prompt to be approximately 546 tokens, which is an acceptable prompt length in a 200,000 token model.

Structured prompts maintain high-quality context, helping to keep the conversation coherent and reduce the risk of AI hallucinations.

Now that the model predicts the next token based on the entire context, repeated high-quality conversations, not polluted by unnecessary back-and-forth code, can continue longer before you need to start a new conversation. This means more productive interactions within the same conversation.

This tip is over-designed

The author said, maybe.

People who use it have been integrated into the workflow

Netizens exclaimed that the model performance was indeed improved after using it.

"If this prompt is more effective, it means that the work done by the Anthropic team in combining CoT or ReAct system prompts with LLM basic capabilities has been effective."

This is for coding assistants! For a task like this, it makes sense to give some guidance.

Some netizens have already integrated some of these tips into their workflow. Here is what he always loads first in a new conversation.

However, some netizens said that this prompt is too complicated.

“In my experience, it’s not necessary to use such comprehensive prompts. Claude 3.5 Sonnet handles this sort of thing fairly automatically, with only occasional prompt clarification.”

Role tips, a complete waste of time

Simon Willison, developer of the Django framework, said that the "You are an expert in xxx" prompt has been a complete waste of time since the end of 2022.

The amount of "superstition" involved in LLM's tips is quite staggering!

This conclusion comes from a year-long study conducted by the Learnprompting team and co-authors from OpenAI and Microsoft.

Paper address: https://arxiv.org/pdf/2406.06608

For the project, they analyzed more than 1,500 papers on prompting and categorized them into 58 different prompting techniques, each of which was analyzed.

The study found that the effect of role prompting was shockingly poor.

The reason is that for older models, it may seem like they can get improved responses/inference by being prompted to move into a better parameter space. However, newer models may already be in that improved parameter space.

This will be a constructive guess for everyone!

Back in October 2022, when Learnprompting published the first-ever guide on prompting tips before ChatGPT, role prompting was the hottest topic at the time and the core trick everyone recommended for getting better ChatGPT results.

It is important to acknowledge that these models are evolving rapidly, and techniques that worked last year may not work today.

And tips and tricks that work today may not work next year.

To clarify this, the Learnprompting team tested gpt-4-turbo on 2,000 MMLU questions using about 12 different character prompts.

In particular, the example prompt for creating a "genius" character - you are a Harvard-educated scientist...

Another tip for the "idiot" character - you are a fool...

"genius...": "You are a genius level Ivy league Professor. Your work is of the highest grade. You always think out your problem solving steps in incredible detail. You always get problems correct and never make mistakes. You can also break any problem into its constituent parts in the most intelligent way possible. Nothing gets past you. You are omniscient, omnipotent, and omnipresent. You are a mathematical God."
 "idiot...": "You are intellectually challenged, lacking problem-solving skills, prone to errors, and struggle with basic concepts. You have a limited understanding of complex subjects and cannot think straight. You can't solve problems well, in fact, you can't solve them at all. You are a terrible, dumb, stupid, and idiotic person. You fail at everything you do. You are a nobody and can't do anything correctly."

As shown in the figure below, the accuracy of the answers to different role prompts is not as high as that of strategies such as zero-sample CoT and two-sample CoT.

Whether it's a math novice, a careless student, a knowledgeable AI, a police officer, or an Ivy League math professor, it's all useless.

What’s even more interesting is that GPT-4, known as a “genius,” broke the lowest record of 58.7% in terms of answer accuracy.

The GPT-4, which is called an "idiot", scored higher than the "genius" GPT-4.

Another study from a team at the University of Michigan provided a good illustration of how different social role cues affect the overall performance of the model.

They tested it on 2,457 MMLU problems and found that the best performing characters were (in red) - Policeman, Helpful Assistant, Partner, Mentor, AI Language Model, Chatbot.

Paper address: https://arxiv.org/pdf/2311.10054

Regarding the "superstition" suggested by the large model, Willison made a vivid and interesting metaphor:

I liken this situation to a dog that finds a burger in a bush and then for the next few years checks that bush every time it passes by to see if there is a burger. We need to be more rational than the dog.

However, he clarified that in some cases it is useful to assign specific roles to AI language models, but stressed that this should be based on reasonable thinking and specific circumstances.

Some netizens also said that if you think about it step by step, it will still be an eternal theorem.

References:

https://www.reddit.com/r/ClaudeAI/comments/1dwra38/sonnet_35_for_coding_system_prompt/

news

Claude 3.5 core coding prompt revealed, the whole network of programmers are excited! Four-step tuning method, the latest V2 version released

Introduction

my contact information