More

sbahra · on March 19, 2021

Also up on https://engineering.backtrace.io/2021-01-11-stuff-your-logs/

sbahra · on May 3, 2020

Thanks for sharing!

For single-writer multi-reader scenario this requires no atomic fences or operations on TSO: http://concurrencykit.org/presentations/lpc2015.pdf

Works on any open-addressed scheme (there is also a robin hood implementation).

sbahra · on July 6, 2019

overthewire.org

sbahra · on June 12, 2019

Give us a spin! Would absolutely love more feedback, it has had a big impact for us already. It’s free for open-source projects.

The tool for ASAN/etc... is just a few weeks old but we have been doing crash / error reporting and custom debuggers (built-in static analysis) for the last 5 years.

Welcome to also shoot me a mail at sbahra@backtrace.io.

sbahra · on Oct 31, 2017

Whoops, good catch! I will fix, thanks.

sbahra · on Sept 17, 2017

Great write-up! A few notes worth mentioning (IMO) below:

Garbage collection: This is only true in absence of garbage collection. If you're paying garbage collection cost to begin with, this is not an issue. Also, note things such as http://web.mit.edu/willtor/www/res/threadscan-spaa-15.pdf

This only applies to dynamic lock-free data structures: Or, data structures requiring memory allocation. If you're using bounded buffers and don't require memory allocation, this isn't an issue.

Taxonomy: Not all passive schemes are quiescent-state-based. In QSBR, quiescent points are a first-class entity while that is not the case in EBR (you have only 3 distinct states). Absent extensions you are unable to differentiate one quiescent point from another, which has real implications on being able to implement things like deferred / non-blocking reclamation efficiently. There are some advantages to this, a while ago Paul Khuong contributed a reference counting scheme to epoch reclamation allowing for overlapping protected sections (bounded epoch makes reference counting practical, something you can't do with unbounded quiescent points). It's pretty great and note, it doesn't require a strict notion of quiescence! You'll find this in Concurrency Kit.

It is also worth noting the HP, etc... (pointer-based schemes) can be used to implement QSBR.

For these reasons, I prefer to classify these techniques into two primary families (based on the semantics of the interface rather than the implementation): "passive schemes" and "active schemes". Passive schemes do not require cooperation from the algorithms requiring safe memory reclamation (EBR, QSBR, etc...) while "active schemes" do (HP, PTB, etc... in their textbook form require modification to the algorithm).

On blocking for passive schemes: It is worth noting that QSBR, EBR and other "passive schemes" do have "non-blocking" (informally) interfaces (rcu_call, text-book EBR utilizes limbo lists, etc...) such that writers do not have to block on the fast path, but of course, there is the cost of memory accumulating until a grace period is detected (so, not non-blocking in the formal sense but if you've the memory to spare and sufficient forward progress, becomes a non-issue). In my implementations, I typically use a dedicated thread for forward progress / epoch detection, ensuring the writer has forward progress.

You have much higher granularity and the lock-free progress guarantee in hazard pointers, because it is pointer-based (tied to the forward progress guarantees of the underlying algorithm).

Under-appreciated recent development: An interesting thing to note here is there are schemes such as https://www2.seas.gwu.edu/~seotest/publications/eurosys16ps.... which do not necessitate the same heavy-weight barriers in the presence of invariant TSC+TSO and with a sufficiently high granularity TSC, can provide the same forward progress guarantees as hazard pointers.

On hazard pointers being slow: One interesting thing to note is hazard pointers can also be used to implement "passive" schemes such as proxy collection, to give similar performance properties as EBR and QSBR.

On real world implementations: It is worth mentioning http://concurrencykit.org :-P

hyperpape · on Sept 17, 2017

I'm having trouble understanding your point about GC. The post points out that GC solves some of these problems, at the expense of creating waits. Do you think there's something wrong with the way it describes the issue?

wbl · on Sept 17, 2017

Not GP, but I think what he's getting at is that the GC cost is already paid and doesn't get paid again for reclaiming memory with lock-free algorithms.

sbahra · on Sept 17, 2017

This is quiescent-state-based reclamation, other implementations exist that are cheap. I have seen the technique you're talking about used in K42 (after I had also thought I invented something novel, when I used the same approach as you :-P).

cwzwarich · on Sept 17, 2017

Doesn't the strict definition of QBSR require a thread to keep no references to shared data when it is in quiescent state, whereas epoch-based reclamation allows threads to simply retain no references to older versions of shared data when it advances its epoch?

sbahra · on Sept 19, 2017

I'm referring to the FIFO implementation. On the definition of QSBR though, frankly, I think the term gets overloaded in some of the more recent literature in this area. Both EBR and QSBR require that no hazardous references get carried over across "protected sections" (EBR is explicit read-side protected sections and in QSBR, across a quiescent point).

emerged · on Sept 17, 2017

I think it does fit that description, but for some reason none of the implementations I found were completely without atomic ops. They had at least barriers which aren't free on ARM.

gpderetta · on Sept 19, 2017

wouldn't you need acquire/release barriers around your load and stores of your token (or alternatively acq/rel load/stores)? Otherwise your other operations won't be ordered with regard the epoch marked by the token.

sbahra · on June 16, 2017

Or use backtrace.io to deal with it. It'll transfer the core dump or a derivative to a centralized server.

Transparency: co-founder.

sbahra · on July 19, 2016

Can confirm.

sbahra · on June 16, 2016

It reverts to a bus lock in this split transaction case, not cache line locking (thus stalling whole system).