Tracing
Vitess tracing #
Vitess allows you to generate trace events from major server components: vtgate, vttablet, and vtctld. Starting with v24, OpenTelemetry is the recommended tracing backend, exporting traces via OTLP/gRPC to any compatible backend. The legacy OpenTracing-based backends (opentracing-jaeger and opentracing-datadog) are deprecated and will be removed in v25.
OpenTelemetry (Recommended) #
OpenTelemetry traces can be received by any OTLP-compatible backend, including Jaeger (v1.35+), Grafana Tempo, and Datadog Agent.
Configuring OpenTelemetry tracing #
To enable OpenTelemetry tracing, add the following flags to vtgate, vttablet, vtctld, or any other Vitess component:
--tracer opentelemetry --otel-endpoint localhost:4317
The available OpenTelemetry flags are:
--otel-endpoint: OpenTelemetry collector endpoint (host:port for gRPC). Defaults tolocalhost:4317.--otel-insecure: Use an insecure connection to the collector. Defaults tofalse.--tracing-sampling-rate: Sampling rate for traces (0.0 to 1.0). Defaults to0.1.
Running Jaeger with OTLP support #
Jaeger v1.35 and later natively supports OTLP ingestion on port 4317. You can run Jaeger with OTLP support using Docker:
$ docker run -d --name jaeger \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
Port 4317 receives OTLP/gRPC traces from Vitess, and port 16686 provides the Jaeger web UI.
OpenTracing (Deprecated) #
The following OpenTracing-based tracing backends are deprecated as of v24 and will be removed in v25:
opentracing-jaeger: Uses the archived jaeger-client-go library. The Jaeger project recommends migrating to OpenTelemetry.opentracing-datadog: Uses the OpenTracing bridge in dd-trace-go.
These backends still work in v24 but log a deprecation warning at startup. The following flags are also deprecated:
--jaeger-agent-host--tracing-sampling-type
Configuring OpenTracing (Legacy) #
If you are still using the legacy OpenTracing backends, you can configure them with:
--tracer opentracing-jaeger --jaeger-agent-host 127.0.0.1:6831 --tracing-sampling-rate 0.0
There are a few things to note:
--jaeger-agent-hostshould point to thehostname:portorip:portof the tracing collector running the Jaeger compact Thrift protocol.- The tracing sample rate (
--tracing-sampling-rate) is expressed as a fraction from 0.0 (no sampling) to 1.0 (100% of all events are sent to the server). If set to zero, you can pass custom span contexts to trace only specific queries. This is recommended for large installations because it is typically very hard to organize and consume the volume of tracing events generated by even a small fraction of events from a non-trivial production Vitess system.
Migrating from OpenTracing to OpenTelemetry #
To migrate from opentracing-jaeger to opentelemetry:
- Make sure your Jaeger deployment is v1.35 or later (older versions don't support OTLP).
- Replace the tracing flags:
| Before | After |
|---|---|
--tracer opentracing-jaeger | --tracer opentelemetry |
--jaeger-agent-host host:6831 | --otel-endpoint host:4317 |
To migrate from opentracing-datadog, configure the Datadog Agent to accept OTLP traces and use --tracer opentelemetry with --otel-endpoint pointing to the Agent's OTLP endpoint.
Instrumenting queries #
You can instrument your queries to choose which queries (or application actions) generate trace events. This is useful when --tracing-sampling-rate is set to 0.0 and you want to trace only specific operations.
The SpanContext id you need to instrument your Vitess queries with has a very specific format. It is recommended to use one of the Jaeger / OpenTracing client libraries (or OpenTelemetry SDK for the new backend) to generate these. For OpenTracing, the format is a base64 string of a JSON object that looks like this:
{"uber-trace-id":"{trace-id}:{span-id}:{parent-span-id}:{flags}"}
Note the very specific format requirements in the documentation. Because of these requirements, it can be tiresome to generate them yourself, and it is more convenient to use the client libraries instead.
Once you have the SpanContext string in its encoded base64 format, you can then generate your SQL query/queries related to this span to send them to Vitess. To inform Vitess of the SpanContext, use a special SQL comment style:
/*VT_SPAN_CONTEXT=<base64 value>*/ SELECT * from product;
There are additional notes here:
- The underlying tracing libraries are very particular about the base64 value, so if you have any formatting problems (including trailing spaces between the base64 value and the closing of the comment), you will get warnings in your
vtgatelogs. - When testing with, for example, the
mysqlCLI tool, make sure you are using the-c(or--commentsflag), since the default is--skip-comments, which will never send your comments to the server (vtgate).
Inspecting trace spans #
Once you have configured tracing and instrumented (or enabled sampling for) some queries, you can access the tracing backend's web UI to look at the recorded spans.
If you are using the local Docker container version of Jaeger, you can access the web UI in your browser at http://localhost:16686/.
You should be able to search for and find spans based on the trace-id or span-id with which your query/queries were instrumented. Once you find a query, you will be able to see the trace events emitted by different parts of the code as the query moves through vtgate and the vttablet(s) involved in the query. An example would look something like this:
