Hacker Newsnew | past | comments | ask | show | jobs | submit | gillesjacobs's commentslogin


They save money by cheap labour and batching large quantities for analysis. For the consumer this means long wait times and potentially expired DNA samples.

I tried two samples with Nebula, waited 11 months total. Both samples failed. Got a refund on the service but spent 50usd in postage for the sample kit.


Nice framing for PMs, but technically it is way too rosy. MCP is real but still full of low utility services and security issues, so “skills as plug-ins” is not production ready. A2A protocols were only just announced this year (Google, etc.) and actual inter-agent interoperability is still research grade, with debugging across agents being a nightmare. Orchestration layers (skills, workflows, multi-agent) look clean in diagrams but turn into brittle state machines under load. LLM “confidence scores” are basically uncalibrated logits dressed up as probabilities.

In short: nice industry roadmap, but we are nowhere near robust, trustworthy multi-agent systems yet.


The idea of giving an LLM with a tool any kind of control over an actual user's account remains (though you put this more elegantly) batshit insane to me.

Even assuming you've correctly auth'd the user contacting you (big assumption!), allowing that user to very literally prompt a 'semi-confident thing with tools' - however many layers of abstraction away the tool is - feels very, very far away from a real-world, sensible implementation right now.

Just shoot the tool prompts over to a human operator, if it's so necessary! Sense-check!


Extracting structure and elements from HTML should be trivial and probably has multiple libraries in your programming language of choice. Be happy you have machine-readable semantic documents, that's best-case scenario in NLP. I used to convert the chunks to Markdown as it was more token-efficient and LLMs are often heavily preference trained on Markdown, but not sure with current input pricing and LLM performance gains that matters anymore.

If you have scanned documents, last I checked Gemini Flash was very good cost/performance wise for document extraction. Mistral OCR claims better performance in their benchmarks but people I know used it and other benchmarks beg to differ. Personally I use Azure Document Intelligence a lot for the bounding boxes feature, but Gemini Flash apparently has this covered too.

https://getomni.ai/blog/ocr-benchmark

Sidenote: What you want for RAG is not OCR as-in extracting text. The task for RAG preprocessing is typically called Document Layout Analysis or End-to-End Document Parsing/Extraction.

Good RAG is multimodal and semantic document structure and layout-aware so your pipeline needs to extract and recognize text sections, footers/headers, images, and tables. When working with PDFs you want accurate bounding boxes in your metadata for referring your users to retrieved sources etc.


Yeah, thanks for pointing out the OCR! We also found that for complex PDFs, you first need to use OCR to convert them into Markdown and then run PageIndex. However, most OCR tools process each page independently, which causes them to lose the overall document structure. For example, existing OCR tools often generate incorrect heading levels, which is a big problem if you want to build a tree structure from them. You could check out PageIndex-OCR, the first long-context OCR model that can produce Markdown with more accurate heading-level recognition.


I am always on the lookout for new document extraction tools, but can't seem to find any benchmarks for PageIndex-OCR. There are several like OmniDocBench and readoc. So... Got benchmark?


Try DocuPipe. It blows Gemini out of the water in terms of accuracy in extracting . They also generate a page + bounding box for every extracted field.


> Sidenote: What you want for RAG is not OCR as-in extracting text. The task for RAG preprocessing is typically called Document Layout Analysis or End-to-End Document Parsing/Extraction.

Got it. Indeed, I need to do End-to-End Document Parsing/Extraction.


A suspicious lack of any performance metrics on the many standard RAG/QA benchmarks out there, except for their highly fine-tuned and dataset-specific MAFIN2.5 system. I would love the see this approach vs. a similarly well-tuned structured hybrid retriever (vector similarity + text matching) which is the common way of building domain-specific RAG. The FinanceBench GPT4o+Search system never mentions what the retrieval approach is [1,2], so I will have to assume it is the dumbest retriever possible to oversell the improvement.

PageIndex does not state to what degree the semantic structuring is rule-based (document structure) or also inferred by an ML model, in any case structuring chunks using semantic document structure is nothing new and pretty common, as is adding generated titles and summaries to the chunk nodes. But I find it dubious that prompt-based retrieval on structured chunk metadata works robustly, and if it does perform well it is because of the extra work in prompt-engineering done on chunk metadata generation and retrieval. This introduces two LLM-based components that can lead to highly variable output versus a traditional vector chunker and retriever. There are many more knobs to tune in a text prompt and an LLM-based chunker than in a sentence/paragraph chunker and a vector+text similarity hybrid retriever.

You will have to test retrieval and generation performance for your application regardless, but with so many LLM-based components this will lead to increased iteration time and cost vs. embeddings. Advantage of PageIndex is you can make it really domain-specific probably. Claims of improved retrieval time are dubious, vector databases (even with hybrid search) are highly efficient, definitely more efficient that prompting an LLM to select relevant nodes.

1. https://pageindex.ai/blog/Mafin2.5 2. https://github.com/VectifyAI/Mafin2.5-FinanceBench


Had many a friend in the Belgian hacker scene who were threatened with legal action after responsible disclosure. To my knowledge, these threats always remained empty: if there is one thing more expensive than engineering a fix, it is starting a lawsuit in Belgium.

It is a sad state-of-affairs that the culture is like this. Ultimately it results in a less secure society, where vulns are anonymously disclosed and shared.


It doesn't do it magically. The "tools" an LLM agent calls to create responses are typically REST APIs for these services.

Previously, many companies gated these APIs but with the MCP AI hype they are incentivized to expose what you can achieve with APIs through an agent service.

Incentives align here: user wants automations on data and actions on a service they are already using, company wants AI marketing, USP in automation features and still gets to control the output of the agent.


> Previously, many companies gated these APIs but with the MCP AI hype they are incentivized to expose what you can achieve with APIs through an agent service.

Why would they be incentivized to do that if they survived all the previous hype waves and still have access gated?

> user wants automations on data and actions on a service they are already using,

How many users want that? Why didn't companies do all this before, since the need for automation has always been there?


> Why would they be incentivized to do that if they survived all the previous hype waves and still have access gated?

Because they suddenly now don't want to be left out of the whole AI hype/wave.

Is it stupid? Yes. Can we still reap the benefits of these choices driven by stupid motivations? Also yes.


> Because they suddenly now don't want to be left out of the whole AI hype/wave.

https://news.ycombinator.com/item?id=44405491 and https://news.ycombinator.com/item?id=44408434


Looks very cool!

I prototyped something like this with build123d for Python and Cursor + OCP VSCode plugin.

Build123d is too new with too little examples out there, unlike OpenSCAD. I can only get it to generate good code with largr reasoning models that access the latest docs. No fast iteration for build123d yet.


Thanks for sharing!

Yeah, I picked openscad since it's well known to LLMs, though the downside is that it's not a python lib. For visualization I used trimesh to load the STL data and then plotly to display it.


I need an LLM in my fax machine.


The team is currently skiing for two weeks so we'll have to get back to you on that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: