More

AndyNemmity · 2026-01-02T18:27:48 1767378468

I keep having the same conversation with people struggling with Claude Code.

Someone tells me it "forgets" their instructions. Or it hallucinates fixes. Or it ignores the rules they put in CLAUDE.md. And when I ask what their setup looks like, it's always the same thing: a massive system prompt with every rule for every language, stuffed into context.

So I wrote up how I solve this.

AndyNemmity · 2026-01-01T00:50:42 1767228642

These are excellent every year, thank you for all the wonderful work you do.

tkgally · 2026-01-01T02:03:40 1767233020

Same here. Simon is one of the main reasons I’ve been able to (sort of) keep up with developments in AI.

I look forward to learning from his blog posts and HN comments in the year ahead, too.

password4321 · 2026-01-01T04:36:02 1767242162

Don't forget you can pay Simon to keep up with less!

> At the end of every month I send out a much shorter newsletter to anyone who sponsors me for $10 or more on GitHub

https://simonwillison.net/about/#monthly

AndyNemmity · 2025-12-30T04:18:47 1767068327

Exactly right, well said. None of these solutions work in this case for the reasons you outlined.

It will just as easily get around it by running it as a bash command or any number of ways.

AndyNemmity · 2025-12-30T02:07:17 1767060437

and put it in all caps, so it knows you mean business.

wellthisisgreat · 2025-12-30T08:24:30 1767083070

alarm emoji alarm emoji alarm emoji

AndyNemmity · 2025-12-30T02:02:48 1767060168

The funny part is, the vast majority of them are barely doing anything at all.

All of these systems are for managing context.

You can generally tell which ones are actually doing something if they are using skills, with programs in them.

Because then, you're actually attaching some sort of feature to the system.

Otherwise, you're just feeding in different prompts and steps, which can add some value, but okay, it doesn't take much to do that.

Like adding image generation to claude code with google nano banana, a python script that does it.

That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"

austinbaggio · 2025-12-30T02:07:51 1767060471

It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?

AndyNemmity · 2025-12-30T02:09:55 1767060595

An example of a skill i gave, adding image generation to nano banana.

another is one claude code ships with, using rip grep.

Those are actual features. It's adding deterministic programs that the llm calls when it needs something.

austinbaggio · 2025-12-30T02:38:17 1767062297

Oh got it - tool use

AndyNemmity · 2025-12-30T02:41:05 1767062465

Exactly. That adds actual value. Some of the 1000s of projects do this. Those pieces add value, if the tool adds value which also isn’t a given

troupo · 2025-12-30T08:17:16 1767082636

> You can generally tell which ones are actually doing something if they are using skills, with programs in them.

> Otherwise, you're just feeding in different prompts and steps

"skills" are literally just .md files with different prompts and steps.

> That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"

It's not adding anything but a prompt saying "when asked to do X invoke script Y or do steps Z"

AndyNemmity · 2025-12-30T16:49:45 1767113385

Skill are md files, but they are not just that. They are also scripts. That's what adding things are. You can make a skill that is just a prompt, but that misses the point of the value.

You're packaging the tool with the skill, or multiple tools to do a single thing.

troupo · 2025-12-30T21:29:30 1767130170

In the end it's still an .md file pointing to a script that ends being just a prompt for the agent that the agent may or may not pick up, may or may not discover, may or may not forget after context compaction etc.

There's no inherent magic to skills, or any fundamental difference between them and "just feeding in different prompts and steps". It literally is just feeding different prompt and steps.

AndyNemmity · 2025-12-31T17:50:05 1767203405

I find in my experience that it's trivial to have the skill systematically call the script, and perform the action correctly. This has not been a challenge to me.

Also, the pick up or not pick up, or discover or may not discover is solved as well. It's handled by my router, which I wrote about here - https://vexjoy.com/posts/the-do-router/

So these are solved problems to me. There are many more problems which are not solved, which are the interesting space to continue with.

AndyNemmity · 2025-12-30T01:52:15 1767059535

Yeah, at a certainly level, it's just a ton of fun to do. I think that's why so many of us are playing with it.

It's also deeply interesting because it's essentially unsolved space. It's the same excitement as the beginning of the internet.

None of us know what the answers will be.

AndyNemmity · 2025-12-30T01:23:29 1767057809

It was happening significantly before the rise of AI. It's even more now.

I am not sure where exactly we are headed through all this, but I feel like overall having data be a shared commons has been beneficial.

AndyNemmity · 2025-12-30T00:45:51 1767055551

I wish that were true. Models don't feel like they've really had massive leaps.

They do get better, but not enough to change any of the configuration I have.

But you are correct, there is a real possibility that the time invested with be obsolete at some point.

For sure the work towards MCPs are basically obsolete via skills. These things happen.

parpfish · 2025-12-30T01:55:38 1767059738

It doesn’t require any major improvement to the underlying model. As long they tinker with system prompts and builtin tools/settings, the coding agent will evolve in unpredictable ways out of my control

AndyNemmity · 2025-12-30T01:59:07 1767059947

That's a rational argument. In practice, what we're actually doing for the most part is managing context, and creating programs to run parts of tasks, so really the system prompts and builtin tools and settings have very little relevance.

dnautics · 2025-12-30T01:40:16 1767058816

i don't understand this mcp/skill distinction? one of the mcps i use indexes the runtime dependency of code modules so that claude can refactor without just blindly grepping.

how would that be a "skill"? just wrap the mcp in a cli?

fwiw this may be a skill issue, pun intended, but i can't seem to get claude to trigger skills, whereas it reaches for mcps more... i wonder if im missing something. I'm plenty productive in claude though.

AndyNemmity · 2025-12-30T01:49:16 1767059356

So MCPs are a bunch of, essenntially skill type objects. But it has to tell you about all of them, and information about all of them up front.

So a Skill is just a smaller granulatrity level of that concept. It's just one of the individual things an MCP can do.

This is about context management at some level. When you need to do a single thing within that full list of potential things, you don't need the instructions about a ton of other unrelated things in the context.

So it's just not that deep. It would be having a python script or whatever that the skill calls that returns the runtime dependencies and gives them back to the LLM so they can refactor without blindly greping.

Does that make sense?

dnautics · 2025-12-30T10:09:02 1767089342

no that makes no sense. the skill doesn't do anything by itself, the mcp (can be) attached to a deterministic oracle that can return correct information.

AndyNemmity · 2025-12-30T16:51:41 1767113501

But the skill includes the scripts to do things.

So in my nano banana image generation skill, it contains a python script that does all the actual work. The skill just knows how to call the python script.

We're attaching tools to the md files. This is at the granular level of how to hammer a nail, how to use a screw driver, etc. And then the agent, the handyman, has his tool box of skills to call depending on what he needs.

dnautics · 2025-12-31T05:54:07 1767160447

lets say i'm in erlang. you gonna include a script to unpack erlang bytecode across all active modules and look through them for a function call? oorrr... have that code running on localhost:4000 so that its a single invocation away, versus having the llm copypasta the entire script you provided and pray for the best?

AndyNemmity · 2025-12-31T17:52:38 1767203558

The LLM doesn't copy the script, it runs it.

But for sure, there are places it makes sense, and there are places it doesn't. I'm arguing to maximully use it for places that make sense.

People are not doing this. They are leaving the LLM to everything. I am arguing it is better to move everything possible into tools that you can, and have the LLM focus only on the bits that a program doesn't make sense for.

austinbaggio · 2025-12-30T02:04:49 1767060289

In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool

dnautics · 2025-12-30T10:13:46 1767089626

yeah but a skill without the mcp server is just going to be super inefficient at certain things.

again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc which would obscure a result obtained by grep

how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.

AndyNemmity · 2025-12-30T16:52:46 1767113566

A skill is not just a text file nudging the llm. You group scripts and programming to the skill, and the skill calls it.

dnautics · 2025-12-31T05:55:32 1767160532

that seems token inefficient. why have the llm do a full round trip. load the skill which contains the potentially hundreds of lines code then copy and paste the code back into the compiler when it could just run it?

not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?

AndyNemmity · 2025-12-31T17:54:17 1767203657

I don't understand. The Skill runs the tools. In the cases there are problems where you can have programs replace the LLM, I think we should maximully do that.

That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.

So I'm not exactly following.

dnautics · 2026-01-03T02:18:27 1767406707

what you are proposing is functionally equivalent to "wrapping an mcp in a cli" which is what I mentioned in my root comment.

AndyNemmity · 2025-12-30T00:43:58 1767055438

Your approach essentially matches mine, but I call them plans. I agree with you that the other tools don't seem to add any value compared to this structure.

I think at this point in time, we both have it right.

AndyNemmity · 2025-12-30T00:19:44 1767053984

There is certainly a level where at any time you could be building some abstraction that is no longer required in a month, or 3.

I feel that way too. I have a lot of these things.

But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.

ramoz · 2025-12-30T00:38:16 1767055096

My somewhat naive heuristic would be that memory abstractions are a complete mistep in terms of optimization. There is no "super claude mem" or "continual claude" until there actually is.

https://backnotprop.com/blog/50-first-dates-with-mr-meeseeks...

AndyNemmity · 2025-12-30T00:41:01 1767055261

I tend to agree with you, however compacting has gotten much worse.

So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.

But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.

But that is more than memory. That is also about having a detailed set of things that must occur.

ramoz · 2025-12-30T00:44:19 1767055459

I’m interested to see your setup.

I think planning is a critical part of the process. I just built https://github.com/backnotprop/plannotator for a simple UX enhancement

Before planning mode I used to write plans to a folder with descriptive file names. A simple ls was a nice memory refresher for the agent.

AndyNemmity · 2025-12-30T01:17:25 1767057445

I understand the use case for plannotator. I understand why you did it that way.

I am working alone. So I am instead having plans automatically update. Same conception, but without a human in the mix.

But I am utilizing skills heavily here. I also have a python script which manages how the LLM calls the plans so it's all deterministic. It happens the same way every time.

That's my big push right now. Every single thing I do, I try to make as much of it as deterministic as possible.

edmundsauto · 2025-12-30T16:49:29 1767113369

Would you share an overview of how it works? Sounds interesting

AndyNemmity · 2025-12-30T17:15:37 1767114937

Perhaps I can release it as a standalone github skill, and then do a blog post on it or something.

I'm just also working on real projects as well, so a lot of my priority is focused on new skills building, and not worrying about managing the current ones I have as github repos.

edmundsauto · 2026-01-01T21:13:07 1767301987

That would probably be a lot of work for little gain. Would you be open to asking Claude to summarize your approach and just put it into a paste? I'm less interested in specific implementations and more about approaches, what the tradeoffs are and where it best applies.