I keep having the same conversation with people struggling with Claude Code.
Someone tells me it "forgets" their instructions. Or it hallucinates fixes. Or it ignores the rules they put in CLAUDE.md. And when I ask what their setup looks like, it's always the same thing: a massive system prompt with every rule for every language, stuffed into context.
It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?
Skill are md files, but they are not just that. They are also scripts. That's what adding things are. You can make a skill that is just a prompt, but that misses the point of the value.
You're packaging the tool with the skill, or multiple tools to do a single thing.
In the end it's still an .md file pointing to a script that ends being just a prompt for the agent that the agent may or may not pick up, may or may not discover, may or may not forget after context compaction etc.
There's no inherent magic to skills, or any fundamental difference between them and "just feeding in different prompts and steps". It literally is just feeding different prompt and steps.
I find in my experience that it's trivial to have the skill systematically call the script, and perform the action correctly. This has not been a challenge to me.
Also, the pick up or not pick up, or discover or may not discover is solved as well. It's handled by my router, which I wrote about here - https://vexjoy.com/posts/the-do-router/
So these are solved problems to me. There are many more problems which are not solved, which are the interesting space to continue with.
It doesn’t require any major improvement to the underlying model. As long they tinker with system prompts and builtin tools/settings, the coding agent will evolve in unpredictable ways out of my control
That's a rational argument. In practice, what we're actually doing for the most part is managing context, and creating programs to run parts of tasks, so really the system prompts and builtin tools and settings have very little relevance.
i don't understand this mcp/skill distinction? one of the mcps i use indexes the runtime dependency of code modules so that claude can refactor without just blindly grepping.
how would that be a "skill"? just wrap the mcp in a cli?
fwiw this may be a skill issue, pun intended, but i can't seem to get claude to trigger skills, whereas it reaches for mcps more... i wonder if im missing something. I'm plenty productive in claude though.
So MCPs are a bunch of, essenntially skill type objects. But it has to tell you about all of them, and information about all of them up front.
So a Skill is just a smaller granulatrity level of that concept. It's just one of the individual things an MCP can do.
This is about context management at some level. When you need to do a single thing within that full list of potential things, you don't need the instructions about a ton of other unrelated things in the context.
So it's just not that deep. It would be having a python script or whatever that the skill calls that returns the runtime dependencies and gives them back to the LLM so they can refactor without blindly greping.
no that makes no sense. the skill doesn't do anything by itself, the mcp (can be) attached to a deterministic oracle that can return correct information.
So in my nano banana image generation skill, it contains a python script that does all the actual work. The skill just knows how to call the python script.
We're attaching tools to the md files. This is at the granular level of how to hammer a nail, how to use a screw driver, etc. And then the agent, the handyman, has his tool box of skills to call depending on what he needs.
lets say i'm in erlang. you gonna include a script to unpack erlang bytecode across all active modules and look through them for a function call? oorrr... have that code running on localhost:4000 so that its a single invocation away, versus having the llm copypasta the entire script you provided and pray for the best?
But for sure, there are places it makes sense, and there are places it doesn't. I'm arguing to maximully use it for places that make sense.
People are not doing this. They are leaving the LLM to everything. I am arguing it is better to move everything possible into tools that you can, and have the LLM focus only on the bits that a program doesn't make sense for.
In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool
yeah but a skill without the mcp server is just going to be super inefficient at certain things.
again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc which would obscure a result obtained by grep
how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.
that seems token inefficient. why have the llm do a full round trip. load the skill which contains the potentially hundreds of lines code then copy and paste the code back into the compiler when it could just run it?
not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?
I don't understand. The Skill runs the tools. In the cases there are problems where you can have programs replace the LLM, I think we should maximully do that.
That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.
Your approach essentially matches mine, but I call them plans. I agree with you that the other tools don't seem to add any value compared to this structure.
I think at this point in time, we both have it right.
There is certainly a level where at any time you could be building some abstraction that is no longer required in a month, or 3.
I feel that way too. I have a lot of these things.
But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.
My somewhat naive heuristic would be that memory abstractions are a complete mistep in terms of optimization. There is no "super claude mem" or "continual claude" until there actually is.
I tend to agree with you, however compacting has gotten much worse.
So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.
But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.
But that is more than memory. That is also about having a detailed set of things that must occur.
I understand the use case for plannotator. I understand why you did it that way.
I am working alone. So I am instead having plans automatically update. Same conception, but without a human in the mix.
But I am utilizing skills heavily here. I also have a python script which manages how the LLM calls the plans so it's all deterministic. It happens the same way every time.
That's my big push right now. Every single thing I do, I try to make as much of it as deterministic as possible.
Perhaps I can release it as a standalone github skill, and then do a blog post on it or something.
I'm just also working on real projects as well, so a lot of my priority is focused on new skills building, and not worrying about managing the current ones I have as github repos.
That would probably be a lot of work for little gain. Would you be open to asking Claude to summarize your approach and just put it into a paste? I'm less interested in specific implementations and more about approaches, what the tradeoffs are and where it best applies.
Someone tells me it "forgets" their instructions. Or it hallucinates fixes. Or it ignores the rules they put in CLAUDE.md. And when I ask what their setup looks like, it's always the same thing: a massive system prompt with every rule for every language, stuffed into context.
So I wrote up how I solve this.
reply