Hacker Newsnew | past | comments | ask | show | jobs | submit | tignaj's commentslogin

If you are looking for a fast format with schema support see STEF: https://www.stefdata.net/

Disclosure: I am the author.


Here is a fairly efficient version of a simple union like that I did a while back https://github.com/tigrannajaryan/govariant


This is also because Google's Protobuf implementations aren't doing a very good job with avoiding unnecessary allocations. Gogoproto is better and it is possible to do even better, here is an example prototype I have put together for Go (even if you do not use the laziness part it is still much faster than Google's implementation): https://github.com/splunk/exp-lazyproto


Otel logs aim to record the execution context in the logs.

In languages when the context is implicitly passed (e.g. via thread-local storage / MDC in Java) Otel automatically injects trace id and span id in the logs emitted using your regular logging library (e.g. log4j). Then in your log backend you can make queries like "show me all log records of all services in my distributed system that were part of this particular user request".

Disclosure: I am an Otel contributor, working on logs (work-in-progress, not for production use yet).


This. The statelessness of the OTLP is by design. I did consider stateful designs with e.g. shared state dictionary compression but eventually chose not to, so that the intermediaries can remain stateless.

An extension to OTLP that uses shared state (and columnar encoding) to achieve more compact representation and is suitable for the last network leg in the data delivery path has been proposed and may become a reality in the future: https://github.com/open-telemetry/oteps/pull/171


Windows has something like 15,000 performance counters and error metrics that can be collected. There isn’t a system on earth that can even approach this. At scale, I have to pick and choose maybe 20-100 counters for fear of overloading a cluster(!) of servers collecting the data… once a minute.

That’s because the protocol overheads cause “write multiplication” of a hundred-to-one or worse. Every byte of metric ends up nearly a kilobyte on the wire.

Meanwhile I did some experiments that showed that even with a tiny bit of crude data-oriented design and delta compression a single box could collect 10K metrics across 10K endpoints every second without breaking a sweat.

The modern REST / RPC approach is fine for business apps but is an unmitigated disaster for collecting tiny metrics.

Set your goals higher than collecting a selected subset of 1% of the available metrics 60x less frequently than admins would like…


Here is a OneOf Go implementation I wrote that hopefully is less ugly and is significantly faster: https://github.com/splunk/exp-lazyproto#oneof-fields


Article author here, good to see it on HN, someone else has submitted it (thanks :-)).

If you are interested in the topic you may be also interested in a research library I wrote recently: https://github.com/splunk/exp-lazyproto, which among other things exploits the partial (de)serialization technique. This is just a prototype for now, one day I may actually do a production quality implementation.


How relevant is this (excellent) post in 2022? Has the tech changed at all on this front?


Please submit a bug at https://github.com/open-telemetry/opentelemetry-go/issues We want to make the SDKs rock-solid.


Here is the draft plan for logs: https://github.com/open-telemetry/opentelemetry-specificatio...

Logs are not going to be part of OpenTelemetry 1.0 release (only traces and metrics will). Logs are coming later (no specific timeline yet).

Disclaimer: I work on OpenTelemetry spec and wrote most of the linked doc. Comments/issues/PRs welcome in the repo.


Disclaimer: I work on OpenTelemetry spec.

Many tracing solutions settled on 128bits/16 bytes trace ids. Here is Jaeger's rationale: https://github.com/jaegertracing/jaeger/issues/858

It is also recommended by W3C: https://www.w3.org/TR/trace-context/#trace-id


BigBrotherBird (now OpenZipkin... thanks legal, sigh) used 128b trace_ids when we first built it at Twitter. I don’t recall the reasoning, but that’s the first system I know of which chose that size.

Dapper used 64b IDs for span and trace, but being locked inside the Googleplex probably limited its influence on compatibility issues.

My point is that 128b is the common standard now, and that’s all that I really care about - that the standard exists and APM systems conform to it. To that end, I am very pro-otel.

Thanks for your work.


Neither Jaeger nor W3C seem to present any justification for 16 byte trace identifiers, just FUD.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: