Hacker Newsnew | past | comments | ask | show | jobs | submit | heyitsguay's commentslogin

Agreed. I've never seen a concrete answer with an outcome that can be explained in clear, simple terms.

I work in insurance - regulated, human capital heavy, etc.

Three examples for you: - our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans - our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients. - our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.

We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document


Thanks for that. It's a really interesting data point. My takeaway, which I've already felt and I feel like anyone dealing with insurance would anyway, is that the industry is wildly outdated. Which I guess offers a lot of low hanging fruit where AI could be useful. Other than the email drafting, it really seems like all of that should have been handled by just normal software decades ago.

A big win for 'normal software' here is to have authentication as a multi-party/agent approval process. Have the client of the insurance company request the automated delivery of certified documents to some other company's email.

>our policy agent extracts all coverage limits and policy details into a data ontology

Aren't you worried about the agent missing or hallucinating policy details?


Management has decreed that won't happen so it won't.

What an uncharitable and nasty comment for something they clearly addressed in theirs:

> It is more accurate and consistent than our humans.

So, errors can clearly happen, but they happen less often than they used to.

> It will draft a reply or an email

"draft" clearly implies a human will will double-check.


> "draft" clearly implies a human will will double-check.

The wording does imply this, but since the whole point was to free the human from reading all the details and relevant context about the case, how would this double-checking actually happen in reality?


> the whole point was to free the human from reading all the details and relevant context about the case

That's your assumption.

My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.

It doesn't need to be all or nothing.


The “double checking” is a step to make sure there’s someone low-level to blame. Everyone knows the “double-checking” in most of these systems will be cursory at best, for most double-checkers. It’s a miserable job to do much of, and with AI, it’s a lot of what a person would be doing. It’ll be half-assed. People will go batshit crazy otherwise.

On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.


The real question is how do you enforce that the human is reviewing and double-checking?

When the AI gets "good enough", and the review becomes largely rubber stamping, and 50% is pretty close to that, then you run the risk that a good percentage of the reviews are approved without real checks.

This is why nuclear operators and security scanning operators have regular "awareness checks". Is something like this also being done, and if so what is the failure rate of these checks?


I think it's a good comment, given that the best agents seem to hallucinate something like 10% on a simple task and more than 70% on complex ones.

>So, errors can clearly happen, but they happen less often than they used to.

If you take the comment at face value. I'm sorry but I've been around this industry long enough to be sceptical of self serving statements like these.

>"draft" clearly implies a human will will double-check.

I'm even more sceptical of that working in practice.


Years ago I worked at an insurance company where the whole job was doing this - essentially reading through long PDFs with mostly unrelated information and extracting 3-4 numbers of interest. It paid terrible and few people who worked there cared about doing a good job. I’m sure mistakes were constantly being made.

> our policy agent extracts all coverage limits and policy details into a data ontology.

Are they using some software for this or was this built in-house?


I think we are the stage of the "AI Bubble" that is equivalent to saying it is 1997, 18% of U.S. households have internet access. Obviously, the internet is not working out or 90%+ of households would have internet access if it was going to be as big of deal as some claim.

I work at a place that is doing nothing like this and it seems obvious to me we are going to get put out of business in the long run. This is just adding a power law on top of a power law. Winner winner take all. What I currently do will be done by software engineers and agents in 10 years or less. Gemini is already much smarter than I am. I am going to end up at a factory or Walmart if I can get in.

The "AI bubble" is a mass delusion of people in denial of this reality. There is no bubble. The market has just priced all this forward as it should. There is a domino effect of automation that hasn't happened yet because your company still has to interface with stupid companies like mine that are betting on the hand loom. Just have to wait for us to bleed out and then most people will never get hired for white collar work again.

It amuses me when someone says who is going to want the factory jobs in the US if we reshore production? Me and all the other very average people who get displaced out of white collar work and don't want to be homeless is who.

"More valuable" work is just 2026 managerial class speak for "place holder until the agent can take over the task".


That sounds a lot like "LLMs are finally powerful enough technology to overcome our paper/PDF-based business". Solving problems that frankly had no business existing in 2020.

Thanks for this answer! I appreciate the clarity, I can see the economic impact for your company. Very cool.

Here's some anecdata from the B2B SaaS company I work at

- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work

- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it

- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.

- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production

---

All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.


This makes a lot of sense and is consistent with the lens that LLMs are essentially better autocomplete

Lately, it seems like all the blogs have shifted away from talking about productivity and are now talking about how much they "enjoy" working with LLMs.

If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.


It’s the honeymoon period with crack all over again. Everyone feels great until their teeth start falling out.

I went through a similar cycle. Going back to simplicity wasn't about laziness for me, it was because i started working across a bunch more systems and didn't want to do my whole custom setup on all of them, especially ephemeral stuff like containers allocated on a cluster for a single job. So rather than using my fancy setup sometimes and fumbling through the defaults at other times, i just got used to operating more efficiently with the defaults.


You can apply your dotfiles to servers you SSH into rather easily. I'm not sure what your workflow is like but frameworks like zsh4humans have this built in, and there are tools like sshrc that handle it as well. Just automate the sync on SSH connection. This also applies to containers if you ssh into them.


I'm guessing you haven't worked in Someone Else's environment?

The amount of shit you'll get for "applying your dotfiles" on a client machine or a production server is going to be legendary.

Same with containers, please don't install random dotfiles inside them. The whole point of a container is to be predictable.


Do you have experience with these tools? Some such as sshrc only apply temporarily per session and don't persist or affect other users. I keep plain 'ssh' separate from shell functions that apply dotfiles and use each where appropriate. You can also set up temporary application yourself pretty easily.


Someone else's environment? That should never happen. You should get your own user account and that's it.


Sometimes we need to use service accounts, so while you do have your own account all the interesting things happen in svc_foo which you cannot add your .files.


I don’t even get an account on someone else’s server. There’s no need for me to log in anywhere unless it’s an exceptional situation.


This doesn't make sense.

You said you were already using someone else's environment.

You can't later say that you don't.

Whether or not shell access makes sense depends on what you are doing, but a well written application server running in a cloud environment doesn't need any remote shell account.

It's just that approximately zero typical monolithic web applications meet that level of quality and given that 90% of "developers" are clueless, often they can convince management that being stupid is OK.


They do get to work on someone else's server, they do not get a separate account on that server. There client would be not happy to have them mess around with the environment.


By definition, it the client Alice gives contractor Mallory access to user account alice, that's worse than giving them an account called mallory.

Accounts are basically free. Not having accounts; that's expensive.


They specifically mentioned service accounts. If they’re given an user account to login as, they still might have to get into and use the service account, and its environment, from there. If the whole purpose was to get into the service account, and the service account is already setup for remote debug, then the client might prefer to skip the creation of the practically useless user account.


That's still not professional, but then again 99.9% of companies aren't.


Could you help me understand what assumptions about the access method you have in place that make this seem unprofessional?

Let's assume they need access to the full service account environment for the work, which means they need to login or run commands as the service account.

This is a bit outside my domain, so this is a genuine question. I've worked on single user and embedded systems where this isn't possible, so I find the "unprofessional" statement very naive.


If, in the year 2025, you are still using a shared account called "root" (password: "password"), and it's not a hardware switch or something (and even they support user accounts these days), I'm sorry, but you need to do better. If you're the vendor, you need to do better, if you're the client, you need to make it an issue with the vendor and tell them they need to do better. I know, it's easy for me to say from the safety of my armchair at 127.0.0.1. I've got some friends in IT doing support that have some truly horrifying stories. But holy shit why does some stuff suck so fucking much still. Sorry, I'm not mad at you or calling you names, it's the state of the industry. If there were more pushback on broken busted ass shit where this would be a problem, I could sleep better at night, knowing that there's somebody else that isn't being tortured.


It’s 2025. I don’t even have the login password to any server, they’re not unicorns, they’re cattle.

If something is wrong with a server, we terminate it and spin up a new one. No need for anyone to log in.

In very rare cases it might be relevant to log in to a running server, but I haven’t done that in years.


In other replies you explicitly state how rare it is that you log in to other systems.

Aren't you therefore optimizing for 1% of the cases, but sabotaging the 99%?


The defaults are unbearable. I prefer using chezmoi to feel at home anywhere. There's no reason I can't at least have my aliases.

I'd rather take the pain of writing scripts to automate this for multiple environments than suffer the death by a thousand cuts which are the defaults.


chezmoi is the right direction, but I don't want to have to install something on the other server, I should just be able to ssh to a new place and have everything already set up, via LocalCommand and Host * in my ~/.ssh/config


What problems?


Pass a law requiring cloud compute providers to accept a maximum user budget and be unable to charge more than that, and see how quickly the big cloud providers figure it out.


So tell me again who we need a law? Can you cite one instance where any of the cloud providers refused to give a refund to someone?


The person who signs up for the free tier and is charged.

https://medium.com/%40akshay.kannan.email/amazon-is-refusing...


There is no such thing as “signing up for a free tier” at least there wasn’t before July of this year. Some services have free tiers for a certain amount of time and others have an unlimited free tier that resets every month.


What do you do with agents?


I use them as an intelligence layer over disk cleanup tools, to manage deployments/cloud configs, I have big repo organization workflows, they can manage my KDE system settings, I use them as editors on documents all over my filesystem (to add comments for revision, not to rewrite, that's not consistent enough), I use them to do deep research on topics and save reports, to look at my google analytics and seo data and suggest changes to my pages. Frankly if I had my druthers I wouldn't use a mouse, the agent would use visual tracking (eye/hand) along with words and body language to just quickly figure out what I want.


> they can manage my KDE system settings

Why do you even have KDE installed if AI has replaced GUIs?


You’re saying you’ve found a useful assistant for menial tasks. That’s not consistent with the strong claims you were making upthread.


My claim is that the "useful assistant for menial tasks" is the Wright brothers flyer to what we'll have in a few years. If you have voice chat with an agent on your phone that can just do everything you'd need an app for, what's the point of an app? And it's gonna happen, because if your app doesn't let people's agents handle their business and your competitors' does, people are gonna switch if they can. The computer interfaces of the future are going to be made for agents first.


> My claim is that the "useful assistant for menial tasks" is the Wright brothers flyer to what we'll have in a few years.

I agree with that.

But what you originally wrote was, "The AI bundling problem is over. The user interface problem is over." It would probably make more sense to say "...will be over."

People tend to be sensitive to those kinds of claims because there's a lot of hype around all this at the moment. So when people seem to imply that what we have right now is much more capable than it actually is, there tends to be pushback.


Are there any? Concretely. Genuinely curious.


"Unfortunately, my content guidelines prohibit me from describing my activities with your mother last night"


Seems cool, but the image classification model benchmark choice is kinda weak given all the fun tools we have now. I wonder how Tversky probes do on top of DINOv3 for building a classifier for some task.


crawl walk run.

no sense spending large amounts of compute on algorithms for new math unless you can prove it can crawl.


It's the same amount of effort benchmarking, just a better choice of backbone that enables better choices of benchmark tasks. If the claim is that a Tversky projection layer beats a linear projection layer today, then one can test whether that's true with foundation embedding models today.

It's also a more natural question to ask, since building projections on top of frozen foundation model embeddings is both common in an absolute sense, and much more common, relatively, than building projections off of tiny frozen networks like a ResNet-50.


Do you have references on adaptive methods for image recognition?


I don't have an exact reference but there are a lot more hints that evidence the claim (compute more with same weights). In fact, I wouldn't even call them hints since they aren't subtle at all. For one, animal brains are perfect examples of this. But in the ML space, we could think of this purely from the mathematical perspective.

I think it might be confusing because neurons are neurons right? And they can only hold so much memory, so what's the difference? Well, that difference is architecture and training.

Let's think about signals for a moment and to help understand this, let's move to small dimensions[0]. Like 2D or 3D. (I'll use 3D, but you'll see why this can still ruin visualization) We're talking about universal approximates, so we can think of these as finite length strings, but have fixed end points. Our goal is then to untangle these strings. Oh no, this bundle has a knot! We can't actually untangle this string just by stretching. We also have a rule that we can't cut and glue things. We'd be stuck if we didn't have a trick up our sleeves. We can move into a higher dimension and untangle these strings there[1]. We'll need at least 2N-D. To the flatlander this will look like a cut, but it isn't.

The reason this needs to be understood is because we need to know where we get those dimensions. It is through architecture and training. But let's just think about that architecture. When we're learning these relationships we need to have the capacity to perform these higher dimensional movements, but once we already uncover the relationships we don't necessarily need to. The relationship it depends on the dimensionality of the relationship itself, not the data.

This is true for all models and is fundamentally why things like distillation even work. It is also why that FFN layer post attention in the transformer needs to project into a higher dimension before returning (typical is 4x and I think you can reason why that gives more flexibility than 2x). Also related to the latent manifold hypothesis.

If you ever wondered if math is useful to machine learning, I hope this gives some motivation to learn more. You don't need math to build good models, but even a little math goes a long way to help make better models.

[0] Note, we're doing a significant amount of simplification here. There's a lot of depth and complexity to all of this but I think this will be sufficient to point anyone in (mostly) the right direction.

[1] Think about a Klein bottle. In 4D it has a single surface. But the 3D projection of this shape makes it look like it is intersecting itself. Unfortunately we can't really visualize the 4D version :(


I've done something along these lines! https://github.com/heyitsguay/trader

The challenge for me was consistency in translating free text from dialogs into classic, deterministic game state changes. But what's satisfying is that the conversations aren't just window dressing, they're part of the game mechanic.


deterministic game state changes

I found this to be the actual strenuous work in LLM based development. While it appears like AI has made everything easy and free, the particular challenge of consistently getting deterministic outputs takes serious programming effort. It feels like an entirely new job role. In other words, I wouldn't do this for free, it takes too much effort.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: