this looks very interesting and I'd love to add it to the benchmarking! I was interested in trying it but unfortunately got an installation error on my macbook where I'm running the benchmarks:
I've actually added a benchmark for this specific task and added `unic` to it.
It may not be the most fair comparison because with these random fastqs I'm generating the vast majority of the input is unique so it could be overloading the cuckoo filter.
Shows up a lot in bioinformatics actually - trying to identify sequences with a specific subsequence (grep) and how many of each unique sequence there are. The number of lines here could be massive (order of 1-10's of GB)
You don't really end up using these results in any specific analysis but it's super helpful for troubleshooting tools or edge-cases.
Yeah I'm using it to serialize the output lines as a TSV. Rust's `println!` is notoriously slow and using `csv` to serialize the output is a nice way to boost throughput
Thanks. Nice find! Though it feels weird to have to use a csv crate for that. Ideally the `fast printing` part should be understood, and either used directly, or extracted as a separate, smaller crate.