Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Garbage Collection for MVCC – MyRocks, InnoDB and Postgres (smalldatum.blogspot.com)
69 points by greghn on July 7, 2023 | hide | past | favorite | 9 comments


Speaking of the Postgres MVCC bloat problem, I hope one day we'll get an implementation of the storage engine that makes better usage of SSD pages.

The thing is that there is an impedance mismatch between MVCC and the OS disk management. MVCC works by treating data as immutable pages, but the OS treats data as mutable references — e.g. update the contents of file X to Y. However, SSDs also work with immutable pages just like the MVCC mechanism.

On the other hand, if the DB storage engine uses a lower level API that goes directly to SSD pages, we can remove the impedance mismatch and achieve far better performance.

I wonder who is working on that. After a quick search, what I found was a page about OrioleDB [1], which is a Postgres storage engine distributed as an extension. If I understood it correctly, it seems to be going in this direction.

[1] https://www.orioledata.com/


When I see these issues in PG, a lot of it seems like incidental complexity.

Also I don't believe there's an "impedance mismatch". It's usually a problem of incorrect abstraction, or incorrect framing of one. For example SSD rarely modify content in-place, it streams data over new sectors to distribute write wear and tear, as each cell has finite write cycles in its lifetime.

Also a system that combines in-place mutation, with immutable sharing, cheap snapshots (deep copy) and easy garbage collection is copy-on-write. So it'd seem this makes more sense as a baseline foundation to step on when implementing MVCC in RDBMS. But there we go.


The Andy Pavlo CMU Crew has a great paper summarizing, comparing, and benchmarking different MVCC implementation techniques -- including garbage collection -- here: https://db.cs.cmu.edu/papers/2017/p781-wu.pdf


> YCSB: We modified the YCSB [14] benchmark ... The database contains a single table with 10 million tuples, each with one 64-bit primary key and 10 64-bit integer attributes

I wonder why they chose such an unrepresentative dataset size, ~84MB.


> I wonder why they chose such an unrepresentative dataset size, ~84MB.

Yo. This is me. The point of this paper was to evaluate the core MVCC algorithms under different contention scenarios. So you strip out all the internal features of the system that can influence performance that are unrelated to the experiments. This ensures that you can make it a true apples-to-apples comparison. And since everything is in memory, you don't need a large data set.

IIRC, we ran ran the same experiments with 100m tuples instead of 10m and it did not change the results.


Thanks for doing the sensitivity analysis. I struggle with that issue a lot -- how can I shorten my benchmark duration (use less data, run for less time) so I can get more work done.


Hm. Dr Pavlo comments here sometimes, maybe he can provide some illumination.


For anyone wondering (like me), MVCC means “multiversion concurrency control”, which is used to ensure isolation in databases.


Wikipedia has a decent overview of the why and how, as well the challenges with this approach:

https://en.wikipedia.org/wiki/Multiversion_concurrency_contr...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: