More

jitl · 2026-01-19T04:54:25 1768798465

The author wrote WebKit’s allocator and worked on JavaScriptCore for over a decade. I really enjoyed his posts on the WebKit blog over the years like this one on the concurrent garbage collector (2017) https://webkit.org/blog/7122/introducing-riptide-webkits-ret...

jitl · 2026-01-19T01:29:36 1768786176

In Greek mythology Prometheus took fire from the gods and gave it to humans, for the low subscription fee of a liver a day.

jitl · 2026-01-19T01:25:54 1768785954

README says it’s optimized for Metal, if it really is using metal compute shader, apparently the programming model is fairly similar to WebGPU. You could try asking Claude to translate it :)

jitl · 2026-01-19T01:22:08 1768785728

I’ve sent Claude back to look at the transcript file from before compaction. It was pretty bad at it but did eventually recover the prompt and solution from the jsonl file.

jitl · 2026-01-19T00:01:17 1768780877

e2ee makes it hard to do things like “search” which is important for working with teams. For personal messengers usually search is all on device w an encrypted index, once an org grows beyond 50 people that sort of thing breaks down.

jitl · 2026-01-18T20:25:21 1768767921

1gb of json u can do in one parse ¯\_(ツ)_/¯ big batches are fast

jitl · 2026-01-18T20:22:57 1768767777

unconvinced. any join needs some kind of seek on the secondary relation index, or a bunch of state if ur stream joining to build temporary index sizes O(n) until end of batch. on the other hand summing N numbers needs O(1) memory and if your data is column shaped it’s like one CPU instruction to process 8 rows. in “big data” context usually there’s no traditional b-tree index to join either. For jobs that process every row in the input set Mr Join is horrible for perf to the point people end up with a dedicated join job/materialized view so downstream jobs don’t have to re do the work

jitl · 2026-01-18T19:54:44 1768766084

On the other hand, now we have duckdb for all the “small big data”, and a slew of 10-100x faster than Java equivalent stuff in the data x rust ecosystem, like DataFusion, Feldera, ByteWax, RisingWave, Materialize etc

groundzeros2015 · 2026-01-18T20:39:14 1768768754

The point of the article is those don’t actually work that well.

I guarantee those rust projects have spent more time playing with rust and library design than the domain problem they are trying to solve.

jitl · 2026-01-18T22:24:38 1768775078

None of the systems I mentioned existed at the time the article was published. I think the author would love duckdb which is a very speedy CLI SQL thingy that reads and writes data in all sorts of formats. It fits in great with other Unix CLI stuff.

Many of the projects I mentioned you could see as a response to OP and the 2015 “Scalability, but at what COST?” paper which benchmarked distributed systems to see how many cores they need to beat a single thread. (https://news.ycombinator.com/item?id=26925449)

groundzeros2015 · 2026-01-18T23:29:43 1768778983

> None of the systems I mentioned existed at the time the article was published

So Hadoop was doing distributed compute wrong but now they have it figured out?

The point is that there is enormous overhead and complexity in going it in any kind of system. And your computer has a lot of power you probably aren’t maxing out.

> which is a very speedy CLI SQL thingy that reads and writes data in all sorts of formats.

Do you know about SQLite?

jitl · 2026-01-18T23:51:59 1768780319

Yeah im a big fan of SQLite :). But at analytical workloads like aggregating every row, DuckDB will outperform SQLite by a wide margin. SQLite is great stuff but it’s not a very good data Swiss Army knife because it’s very focused on a single core competency: embeddable OLTP with a simple codebase. DuckDB can read/write many more formats from local disk or via a variety of network protocols. DuckDB also embeds SQLite so you can use it with SQLite DBs as inputs or outputs.

> they were doing distributed compute wrong but now they have it figured out?

Like anything the future is here but it’s unevenly distributed. Frank McSherry, the first author of “Scalability but at what COST” wrote Timely Dataflow as his answer to that question. ByteWax is based on Timely as is Materialize. Stuff is still complex but these more modern systems with performance as their goal are orders of magnitude better than the Hadoop era Java stuff.

hunterpayne · 2026-01-19T01:35:23 1768786523

I call BS on those Rust 10-100x claims. Rust and Java are roughly equal in performance. It is just that there are a lot of old NoSQL frameworks in Java which are trash. I also checked out those companies, some of which are doing interesting stuff. None claim things are 100x faster because of Rust. You just hurt your credibility when you say such clearly false things. That's how you end up with a Hadoop cluster which is 236x slower than a batch script.

PS None of the companies you linked seem to be using a datapath architecture which is the key to the highest level of performance

jitl · 2026-01-19T05:09:58 1768799398

It wasn’t my intention to say “this stuff is 100x faster because rust”. DuckDB is C++. My intention was to draw distinction between the Java/Hadoop era of cluster and data systems, and the 2020s era of cluster and data systems, much of which has designs informed by stuff like this article / “Scalability but at what COST?”. I guess instead of “faster” I should say “more efficient”.

For example, the Kafka ecosystem tends to use Avro as the data transfer serialization, which needs a copy/deserialization step before it can be used in application logic. Newer stream systems like Timely tend to use zero-copy capable data transfer formats (timely’s is called Abomination) but it’s the same idea in CapnProto or Flatbuffers - it’s infinity faster to not copy the data as you decode! In my experience this kind of approach is more accessible in systems languages like C++ or Rust, and harder to do in GC languages where the default approach to memory layout and memory management is “don’t worry about it.”

jitl · 2026-01-17T02:02:08 1768615328

i bought mine from there

jitl · 2025-12-30T02:28:21 1767061701

I have my agent run all docker commands in the main worktree. Sometimes this is awkward but mostly docker stuff is slow changing. I never run the stuff I’m developing in docker, I always run on the host directly.

For my current project (Postgres proxy like PGBouncer) I had Claude write a benchmark system that’s worktree aware. I have flags like -a-worktree=… -b-worktree =… so I can A/B benchmark between worktrees. Works great.

saddlepaddle · 2025-12-30T14:22:29 1767104549

Awesome that sounds really cool! Yeah we have some friends that found a lot of luck with just a custom cli (something Avi did some tests with too), it definitely is a viable approach to use :)