TL;DR graphite graphs look wrong when viewing a time period larger than a few
hours. hitcount
is the cure.
It turns out counters are bucketed as hits per time slice, not as permanently
incrementing numbers.
http://code.hootsuite.com/accurate-counting-with-graphite-and-statsd/ helps a
lot to explain how this works.
For example, why, when viewing a counter over
many hours or days, the numbers start to look way different from what you
expect.
To work around this, wrap the stat in a hitcount
call.
This example tells me how many posts failed to publish, per hour, across all publishing platforms:
hitcount(sum(stats.posts.by_platform.stuck.*), "1hours")
Get a Ratio of Two Stats!
For example, we
increment a "published" stat every time we publish a post, and a "stuck" stat
every time we fail to publish (after giving up on retries). divideSeries
does the job. This is my currently most useful graph - the ratio of all posts
that got stuck, per hour ... code formatted for readability:
divideSeries(
hitcount(
sum(stats.posts.by_platform.stuck.*),
"1hours"),
hitcount(
sum(stats.posts.by_platform.*.*),
"1hours")
)
Or, to convert that to a percentage:
scale(
divideSeries(
hitcount(
sum(stats.posts.by_platform.stuck.*),
"1hours"),
hitcount(
sum(stats.posts.by_platform.*.*),
"1hours")
),
100.0
)