I'm sending a data dog trace using gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer
like so:
func doSomeStuff(ctx context.Context) error {
span, ctx := tracer.StartSpanFromContext(ctx, "do_some_stuff")
err := theActualOperation()
span.Finish(tracer.WithError(err))
return err
}
In DataDog I can see successful (when err = nil
) and failed operations e.g. successful ones
{
account_id: a_happy_caller
duration: ...
language: go
process_id: 1
}
and failed ones
{
account_id: an_unhappy_caller
duration: ...
error: {
fingerprint: ...
message: rpc error: code = Internal desc = ...
}
issue: {
age: ...
first_seen: ...
first_seen_version: ...
id: ...
}
language: go
process_id: 1
}
I can successfully search for traces that failed using a query like "do_some_stuff
with an error status":
operation_name:do_some_stuff AND status:error
or alternatively "do_some_stuff
with some value for error.fingerprint
":
operation_name:do_some_stuff AND @error.fingerprint:*
However this doesn't seem to work in visualizations on dashboards where I want to get the latency of successful runs separately from any measures on failed runs (i.e., our failed runs are maximally distorting the measure for "normal" runs). To this end I tried to place a simple "Query Number" visualization for failed runs with
p50:trace.do_some_stuff{status:error}
which doesn't work. And alternatively
p50:trace.do_some_stuff{@error.fingerprint:*}
which results in a query error.
I've tried all kinds of selectors and brooded for hours over the advanced filtering reference but don't seem to be able to filter on either the presence (or content) of the error
field of the trace, nor the status that I would have expected to be able to filter by, nor am I able to find an example of how to approach this measurement scenario - generally having measures on filtered traces.
How can I achieve measuring the latency using only the traces of successful or only failed runs? Or must I send different messages to achieve this?