More

digikata · 2026-01-13T10:35:46 1768300546

A fun use of this kind of approach would be to see if conversational game NPCs could be generated that stick the the lore of the game and their character.

digikata · 2026-01-10T11:40:45 1768045245

To borrow some definitions from Systems engineering for verification and validation, this question is one of validation. Verification is performed by Lean and spec syntax and logic enforcement. But Validation is a question of is if the Lean spec encodes a true representation of the problem statement (was the right thing specced). Validation at highest levels is probably an irreplaceable human activity.

Also, on the verification side - there could also be a window of failure that Lean itself has a hidden bug in it too. And with automated systems that seek correctness, it is slightly elevated that some missed crack of a bug becomes exploited in the dev-check-dev loop run by the AI.

digikata · 2025-12-20T11:53:27 1766231607

I would guess by now none have that internally. As a rule of thumb every major flash density increase (SLC, TLC, QLC) also tended to double internal page size. There were also internal transfer performance reasons for large sizes. Low level 16k-64k flash "pages" are common, and sometimes with even larger stripes of pages due to the internal firmware sw/hw design.

Sesse__ · 2025-12-20T12:24:30 1766233470

Also due to error correction issues. Flash is notoriously unreliable, so you get bit errors _all the time_ (correcting errors is absolutely routine). And you can make more efficient error-correcting codes if you are using larger blocks. This is why HDDs went from 512 to 4096 byte blocks as well.

digikata · 2025-12-03T17:14:51 1764782091

Garage is really good for core S3, the only thing I ran into was it didn't support object tagging. It could be considered maybe a more esoteric corner of the S3 api, but minio does support it. Especially if you're just mapping for a test api, object tagging is most likely an unneeded feature anyway.

It's a "Misc" endpoint in the Garage docs here: https://garagehq.deuxfleurs.fr/documentation/reference-manua...

topspin · 2025-12-03T20:26:14 1764793574

"didn't support object tagging"

Thanks for pointing that out.

digikata · 2025-10-22T22:31:59 1761172319

Incidentally there is a open source S3 project in rust that I have been following. About a year ago, I applied Garage images to replace some minio instances used in CI pipelines - lighter weight and faster to come up.

https://github.com/deuxfleurs-org/garage

digikata · 2025-08-22T20:36:30 1755894990

This seems around the durability that most databases can reach. Aside from more specialized hardware arrangements, with a single computer, embedded database there is always a window of data loss. The durability expectation is that some in-flight window of data will be lost, but on restart, it should recover to a consistent state of the last settled operation if at all possible.

A related questions is if the code base is mature enough when configured for higher durability to work as intended. Even with Rust, there needs to be some hard systems testing and it's often not just a matter of sprinkling flushes around. Further optimization can try to close the window tighter - maybe with a transaction log, but then you obviously trade some speed for it.

digikata · 2025-08-21T17:43:18 1755798198

On Linux I'm using colima with docker compose and buildx and it seems to work ok for my limited cases.

On Mac it works ok to, but there are networking cases that Colima on mac doesn't handle - so orbstack for there

digikata · 2025-08-18T17:04:32 1755536672

There are large bodies of work for optimization of state space control theory that I strongly suspect as a lot of crossover for AI, and at least has very similar mathematical structure.

e.g. optimization of state space control coefficients looks something like training a LLM matrix...

brosco · 2025-08-18T20:29:20 1755548960

There is indeed a lot of crossover, and a lot of neural networks can be written in a state space form. The optimal control problem should be equivalent to training the weights, as you mention.

However, from what I have seen, this isn't really a useful way of reframing the problem. The optimal control problem is at least as hard, if not harder, than the original problem of training the neural network, and the latter has mature and performant software for doing it efficiently. That's not to say there isn't good software for optimal control, but it's a more general problem and therefore off-the-shelf solvers can't leverage the network structure very well.

Some researchers have made interesting theoretical connections like in neural ODEs, but even there the practicality is limited.

blt · 2025-08-19T20:57:57 1755637077

Yes, in most cases the reduction of supervised learning to optimal control is not interesting.

We can also reduce supervised learning to reinforcement learning, but that doesn't mean we should use RL algorithms to do supervised learning.

We can also reduce sorting a list of integers to SAT, but that doesn't mean we should use a SAT solver to sort lists of integers.

digikata · 2025-08-15T22:52:03 1755298323

I would think that git would need a parallel storage scheme for binaries. Something that does binary chunking and deduplication between revisions, but keeps the same merkle referencing scheme as everything else.

tempay · 2025-08-15T22:55:43 1755298543

> binary chunking and deduplication

Are there many binaries that people would store in git where this would actually help? I assume most files end up with compression or some other form of randomization between revisions making deduplication futile.

adastra22 · 2025-08-16T00:29:36 1755304176

A lot in the game and visual art industries.

digikata · 2025-08-15T23:18:31 1755299911

I don't know, it's all probability in the dataset that makes one optimization strategy better over another. Git annex iirc does file level dedupe. That would take care of most of the problem if you're storing binaries that are compressed or encrypted. It's a lot of work to go beyond that, and probably one reason no one has bothered with git yet. But borg and restic both do chunked dedupe I think.

zigzag312 · 2025-08-16T09:42:35 1755337355

2-3x reduction in repository size compared to Git LFS in this test:

https://xethub.com/blog/benchmarking-the-modern-development-...

hinkley · 2025-08-15T23:11:07 1755299467

It would likely require more tooling.

zigzag312 · 2025-08-16T09:26:52 1755336412

Xet uses block level deduplication.

digikata · 2025-08-15T22:26:14 1755296774

(2016) https://news.samsung.com/global/samsung-mass-producing-indus...

The size is nothing new, but the packaging into some sort of slottable physical interface with good signal integrity is somewhat new.