This is the generic reason that is always given, but I don't think I've ever seen exactly how the service will be improved and why cookies (or any uniquely-identifying data) are the only way to achieve the desired outcomes.
My point is that it's never described. Even in the detailed options for those cookie banners that permit you to tweak things. Sure, there are (some) details about "our partners" and advertising etc, but exactly how the service will be improved is never explained.
The reason is simple and obvious: nobody knows. Companies collect data in the belief it will be useful in improving the service, but generally chuck it into a data-swamp and occasionally rake it over to extract basic info like navigation routes.
I'm also rarely convinced that improvements can't be better determined by focus groups and other similar methods.
"Improving our service" is a glib catch-all that rarely stands up to scrutiny.
There may be specific examples where a cookie is genuinely the best method to improve a feature -- in which case: name the feature, list the metrics, declare success/fail criteria, and stop collecting the cookie after the decision has been made.
You can further add bucketing, and eventually move closer to FLoC.
But this is aside the point, as the spirit of the law only allows "processing for legitimate interests". The use of technology, cookies or on the server is irrelevant. If thread OP has evaluated their collection[0] as legitimate, they can use whatever technology within guidelines. Otherwise, even a cookie less data collection would require consent.
I apologize for the double negative. What I meant is that hashing doesn't improve privacy because if you know the hash and the hashing function it's easy to build a hashmap of all the possible IPv4s (around 3.5B). Unless the hash uses some sort of expensive key derivation function, but that doesn't scale.
The hashing doesn’t matter when IPv4 has such a limited dataset. IPv4 has a little under 4.3B addresses, and a cheaper GPU such as the 1080TI has a hash rate of around 4300MH/s, so it crushes that in a few seconds at most.
From there, you have a direct correlation between the IP and its resulting hash. Meaning you can easily see what the original input was.
You don’t need to break a hash to know what the original input was.
ipv4 space is very limited and you can easily compute all the hashes. There is salting and combined with rotating salts it could work but no one guarantees that you’re not storing them
Why don't you just hash the IP-address and count unique users that way?