Question #223956 “Graphite-Web Refactoring Help Request” : Questions : Graphite

Revision history for this message

Paula (paula-ke) said on 2013-03-12:

#1

Hi Chris, If all goes well with my project to ingest mega quantities of zabbix metrics, our next challenge will be to clone a decent UI for users to customize their reports. Currently we build semi custom reports using a json wrapper around openTSDB, python wsgi, jquery and highcharts. Graphite is one of the contenders so I am very interested in this work. I believe we are facing the same issues (a meta data database for metrics, tags and tag values, json output wrapper [done], openTSDB query generator[from what ever meta data database],results pipe is that time normalized and possible time aggregator plug ins in the TSD pipe).

Thanks,

Paula Keezer

Revision history for this message

Tim Kuhlman (timkuhlman) said on 2013-03-13:

#2

Great timing, I just started modifying graphite to use Vertica as a backend. I would definitely benefit from a good abstraction layer for the backend and was planning on building one up as I go. Clearly it would be best if we could come up with something that works well for both use cases. I'll update this when more info as I start to make progress.

Tim

Revision history for this message

Chris Larsen (clarsen575) said on 2013-03-14:

#3

Hi Paula:
I'm going to encourage our TSDB users to work on Graphite as I think it has the greatest potential for a UI, though TSDB should have it's own basic, packaged UI too. Graphite just has awesome features already such as searching, functions and the dashboard.

Tim: That's great! If you, or any other Python power coders, could work up a quick design doc on what would need to change in Graphite I bet everyone would be happy.

I'd also be curious about how you're using Vertica to store time-series data and what kind of performance you get versus what we can get with HBase.

Revision history for this message

Dieter P (dieter-plaetinck) said on 2013-03-17:

#4

re: "Nodes/Leaves should ID a timeseries with a generic ID field and provide a display_name field for GUI use"
1) In graphite only leaves in the tree point to timeseries. (i.e. if you have a metric name "foo.bar.baz" then "foo.bar" is nothing). I think this is sensible. do you want to change this? (if so what would "foo.bar" be?)
2) what are reasons why the display name would be different from the metric name? (assuming the metric name would still be all nodes in the tree "as.we.are.used.to" and which would translate into the TSUID) i don't see a need for this, especially in graphite where you interact so closely with the metric names (when building graphs and dashboards) that hiding their names seems to be disadvantageous (probably read the tagging section below first)

=== tagging ===
I was actually going to start a separate discussion, but since you bring it up here...
First, you should know about:
* https://github.com/Dieterbe/graph-explorer/tree/master/structured_metrics : a library that converts the graphite metric list into a tag space of metrics
* https://github.com/Dieterbe/graph-explorer: a graphite dashboard that takes this tag space and provides a query language so you can filter metrics and group them into graphs by tag(s)
I will refer to these as 'G-E'

to take the last example from http://www.euphoriaaudio.com/opentsdb/http-api-meta.html
that metric can be written as:
{
        "name": "tsd.http.latency_50pct",
        "display_name": "HTTP Latency 50pct",
        "tags": {'host': 'hobbes-64bit', 'type': 'all'}
}
1) as you can see, I brought down the markup for tags substantially. I find the syntax demonstrated in your opentsdb RFC quite overengineered, which is also evident because so many fields are just empty.
2) one thing that I learned with G-E is that the more information you can capture in tags, the better (because it's structured data, clearly defined, so more usable. "name" has no clear meaning for metrics and so can only be used for text filtering ). Luckily, there's no need for a "name" attribute, if you add additional tags such as protocol=http, what=seconds, type=latency_50pct. this gives more power for filtering, aggregating and grouping metrics when composing graphs. As a rule of thumb, I would say never have a 'name' attribute, always aim to structure data in more specific tags.
(the canonical opentsdb example metric "mysql.bytes_sent schema=foo host=db2" becomes "service=mysql what=bytes type=sent schema=foo foo=db2")

Furthermore:

1) I would argue that the display of metric names should not be configured at the metric level as in your examples, but can easily be generated.
* in a composer interface you can just list all the tags in a predefined order: In G-E it's just "%what %type %target_type <other tags alphabetically sorted> %server %plugin")
* this becomes more apparent when viewing a graph: say you are plotting these two metrics on one graph:
{'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'sent'}
{'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'received'}

for this graph, the tags 'what', 'service' and 'server' are constant, and the 'type' tag is variable. So the graph title can be computed to be "host1 mysql bytes" and the entries on the legend would be 'sent' and 'received'. This is what G-E does, it's trivial to implement and creates a non-reduntant display of information independent of how you group metrics into graphs.

2) nowadays, many metrics in graphite have names that are just to unclear. often they don't specify the unit of measurement (seconds? ms?, bits? bytes? elements in a queue? an amount of errors?), prefixes used (M,G, etc) and how it should be interpreted (is this a number per second? per flushinterval (like statsd counts), etc). G-E solves this by making the 'what' and 'target_type' tags mandatory and clearly defined.

3) the current tree based organisational paradigm is a bit too simple. There's basically no way to organise your tree to support all ways of later querying it (so that you can later do "give me all metrics related to service mysql, or all metrics that are an amount of errors", which causes people spending too much time trying to.
(see also the amounts of statsd issues/PR's related to suffixes, prefixes and namespacing). a tag based system makes this moot.

This is why I'm in favor of deprecating the tree based method entirely and moving towards a completely tag-based database and query method.
This can actually be implemented more easily than one would think: actual metrics would still be stored based on a key/filename/id; this would either be a hash of all tag key/value pairs, sorted, or whatever the 'name' tag says (and if you specify 'foo.bar.baz' that's the name tag. this gives instant backwards compatibility). for all incoming metrics, just store all tags in a database along with the id of the metric, so it's easy to query for metrics, but because of the hashing, no lookups are needed when storing time series data. this also has the benefit of being compatible with different (existing or not) backends such as ceres, whisper, etc; they don't have to implement the tagging.

=== events ===
quote: "B) We’re also adding annotation support to track/mark events. Same thing, Graphite could store notes in the DB or get the info from OpenTSDB".
I think there's no benefit of deep integration between an event/change management system with a metrics database/management system, because events/changes are inherently very different things than metrics. They have different requirements wrt ingestion, storage, management, GUI's, etc. (I believe deep integration leads to feature creep, scope dilution, and harder integration with other software i.e. monolithic software)
They do go alongside on graphs, which is AFAICT the only place where metrics and events meet. That's why I think it's sensible to have a separate change/event management system, and have a timeseries graphing widget where they can be rendered together (as directed by dashboard software)
From this conviction, I've written:
* https://github.com/Dieterbe/anthracite change/event management system (inspired by graphite's philosophy)
* https://github.com/Dieterbe/timeserieswidget to render graphs and events/changes in a "rich" way (with annotation text etc), as you would expect it supports and targets graphite and anthracite.

Btw, are you going to monitorama? I will, as will a bunch of other graphite devs.

re: "Nodes/Leaves should ID a timeseries with a generic ID field and provide a display_name field for GUI use"
1) In graphite only leaves in the tree point to timeseries. (i.e. if you have a metric name "foo.bar.baz" then "foo.bar" is nothing). I think this is sensible.  do you want to change this? (if so what would "foo.bar" be?)
2) what are reasons why the display name would be different from the metric name? (assuming the metric name would still be all nodes in the tree "as.we.are.used.to" and which would translate into the TSUID) i don't see a need for this, especially in graphite where you interact so closely with the metric names (when building graphs and dashboards) that hiding their names seems to be disadvantageous (probably read the tagging section below first)

=== tagging ===
I was actually going to start a separate discussion, but since you bring it up here...
First, you should know about:
* https://github.com/Dieterbe/graph-explorer/tree/master/structured_metrics : a library that converts the graphite metric list into a tag space of metrics
* https://github.com/Dieterbe/graph-explorer: a graphite dashboard that takes this tag space and provides a query language so you can filter metrics and group them into graphs by tag(s)
I will refer to these as 'G-E'

to take the last example from http://www.euphoriaaudio.com/opentsdb/http-api-meta.html
that metric can be written as:
{
        "name": "tsd.http.latency_50pct",
        "display_name": "HTTP Latency 50pct",
        "tags": {'host': 'hobbes-64bit', 'type': 'all'}
}
1) as you can see, I brought down the markup for tags substantially. I find the syntax demonstrated in your opentsdb RFC quite overengineered, which is also evident because so many fields are just empty.
2) one thing that I learned with G-E is that the more information you can capture in tags, the better (because it's structured data, clearly defined, so more usable.   "name" has no clear meaning for metrics and so can only be used for text filtering ). Luckily, there's no need for a "name" attribute, if you add additional tags such as protocol=http, what=seconds, type=latency_50pct.  this gives more power for filtering, aggregating and grouping metrics when composing graphs.  As a rule of thumb, I would say never have a 'name' attribute, always aim to structure data in more specific tags.
(the canonical opentsdb example metric "mysql.bytes_sent schema=foo host=db2" becomes "service=mysql what=bytes type=sent schema=foo foo=db2")

Furthermore:

1) I would argue that the display of metric names should not be configured at the metric level as in your examples, but can easily be generated.
* in a composer interface you can just list all the tags in a predefined order: In G-E it's just "%what %type %target_type <other tags alphabetically sorted> %server %plugin")
* this becomes more apparent when viewing a graph: say you are plotting these two metrics on one graph:
{'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'sent'}
{'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'received'}

for this graph, the tags 'what', 'service' and 'server' are constant, and the 'type' tag is variable.  So the graph title can be computed to be "host1 mysql bytes" and the entries on the legend would be 'sent' and 'received'.  This is what G-E does, it's trivial to implement and creates a non-reduntant display of information independent of how you group metrics into graphs.

2) nowadays, many metrics in graphite have names that are just to unclear. often they don't specify the unit of measurement (seconds? ms?, bits? bytes? elements in a queue? an amount of errors?), prefixes used (M,G, etc) and how it should be interpreted (is this a number per second? per flushinterval (like statsd counts), etc).  G-E solves this by making the 'what' and 'target_type' tags mandatory and clearly defined.

3) the current tree based organisational paradigm is a bit too simple.  There's basically no way to organise your tree to support all ways of later querying it (so that you can later do "give me all metrics related to service mysql, or all metrics that are an amount of errors", which causes people spending too much time trying to.
(see also the amounts of statsd issues/PR's related to suffixes, prefixes and namespacing). a tag based system makes this moot.

This is why I'm in favor of deprecating the tree based method entirely and moving towards a completely tag-based database and query method.
This can actually be implemented more easily than one would think: actual metrics would still be stored based on a key/filename/id; this would either be a hash of all tag key/value pairs, sorted, or whatever the 'name' tag says (and if you specify 'foo.bar.baz' that's the name tag. this gives instant backwards compatibility). for all incoming metrics, just store all tags in a database along with the id of the metric, so it's easy to query for metrics, but because of the hashing, no lookups are needed when storing time series data. this also has the benefit of being compatible with different (existing or not) backends such as ceres, whisper, etc; they don't have to implement the tagging.

=== events ===
quote: "B) We’re also adding annotation support to track/mark events. Same thing, Graphite could store notes in the DB or get the info from OpenTSDB".
I think there's no benefit of deep integration between an event/change management system with a metrics database/management system, because events/changes are inherently very different things than metrics.   They have different requirements wrt ingestion, storage, management, GUI's, etc.  (I believe deep integration leads to feature creep, scope dilution, and harder integration with other software i.e. monolithic software)
They do go alongside on graphs, which is AFAICT the only place where metrics and events meet.  That's why I think it's sensible to have a separate change/event management system, and have a timeseries graphing widget where they can be rendered together (as directed by dashboard software)
From this conviction, I've written:
* https://github.com/Dieterbe/anthracite change/event management system (inspired by graphite's philosophy)
* https://github.com/Dieterbe/timeserieswidget to render graphs and events/changes in a "rich" way (with annotation text etc), as you would expect it supports and targets graphite and anthracite.

Btw, are you going to monitorama? I will, as will a bunch of other graphite devs.

Revision history for this message

Chris Larsen (clarsen575) said on 2013-03-18:

#5

Wow, thanks a ton for all of your comments Dieter:

> re: "Nodes/Leaves should ID a timeseries with a generic ID field and provide a display_name field for GUI use"
> 1) In graphite only leaves in the tree point to timeseries. (i.e. if you have a metric name "foo.bar.baz" then "foo.bar" is nothing). I
> think this is sensible. do you want to change this? (if so what would "foo.bar" be?)

No change, a leaf is the only thing that could point to a timeseries, but each node (or branch) can have metadata (like # of child branches/leaves) or a different name for other types of storage systems.

> 2) what are reasons why the display name would be different from the metric name? (assuming the metric name would still be all
> nodes in the tree "as.we.are.used.to" and which would translate into the TSUID) i don't see a need for this, especially in graphite
> where you interact so closely with the metric names (when building graphs and dashboards) that hiding their names seems to be
> disadvantageous (probably read the tagging section below first)

For us data geeks, "as.we.are.used.to" is perfectly fine, but if you want to share the information with less technical users, having an optional name override is very useful. Or if you abbreviate a metric, e.g. "if.tx.eth0", it would be nice to display on the graph "Interface Transmits: Ethernet 0". I'm thinking of the broader audience. For situations where no display name is provided, everything just defaults to the node name.

> === tagging ===
> I was actually going to start a separate discussion, but since you bring it up here...
> First, you should know about:
> * https://github.com/Dieterbe/graph-explorer/tree/master/structured_metrics : a library that converts the graphite metric list into
> a tag space of metrics
> * https://github.com/Dieterbe/graph-explorer: a graphite dashboard that takes this tag space and provides a query language so you
> can filter metrics and group them into graphs by tag(s) I will refer to these as 'G-E'

That’s some neat work and really helps with the naming restrictions inherit in Graphite.

> to take the last example from http://www.euphoriaaudio.com/opentsdb/http-api-meta.html
> that metric can be written as:
> {
> "name": "tsd.http.latency_50pct",
> "display_name": "HTTP Latency 50pct",
> "tags": {'host': 'hobbes-64bit', 'type': 'all'} }
> 1) as you can see, I brought down the markup for tags substantially. I find the syntax demonstrated in your opentsdb RFC quite
> overengineered, which is also evident because so many fields are just empty.

I'm sorry if this bit was a little confusing. An actual timeseries data point can be written to OpenTSDB (with the upcoming JSON format) via:

{
   "metric":"tsd.http.latency_50pct",
   "timestamp":1341144000,
  "value":42,
   "tags":{"host":"hobbes-64bit","type":"all"}
}

So the only data needed to submit a data point is the metric name and one or more sets of tags. The metadata is user supplied information associated with a single timeseries, metric, tag name or a tag value. When a brand new timeseries data point is written, a metadata entry is generated with default values, which is why the fields are blank. Then users can go in and add their description, a display name override, notes, or custom key/values. It's very useful for documenting things like: Where the data came from, who's in charge of the data source, what the data means, etc. When you have millions of unique timeseries and many users working with the system, this kind of data can be invaluable and I think Graphite users could benefit from having it as well. But no one is forced to use it.

> 2) one thing that I learned with G-E is that the more information you can capture in tags, the better (because it's structured data,
> clearly defined, so more usable. "name" has no clear meaning for metrics and so can only be used for text filtering ). Luckily, there's
> no need for a "name" attribute, if you add additional tags such as protocol=http, what=seconds, type=latency_50pct. this gives more
> power for filtering, aggregating and grouping metrics when composing graphs. As a rule of thumb, I would say never have a 'name'
> attribute, always aim to structure data in more specific tags.
> (the canonical opentsdb example metric "mysql.bytes_sent schema=foo host=db2" becomes "service=mysql what=bytes type=sent schema=foo foo=db2")

The "name" field is only used in the general UID meta object for OpenTSDB and reflects either a METRIC, a TAGK or a TAGV name so it's valid in this situation but you're absolutely right that it's a terrible tag name, e.g. "name=something".

OpenTSDB's schema is more efficient when you have fewer tags, hence the combination of a Graphite style dotted syntax for the metric and tags for differentiation and aggregation. Metadata can also be used to break these up into tags like you're suggesting and make it easier to filter and search (I'm pushing everything into ElasticSearch and it works great)

> Furthermore:
> 1) I would argue that the display of metric names should not be configured at the metric level as in your examples, but can easily be
> generated.
> * in a composer interface you can just list all the tags in a predefined order: In G-E it's just "%what %type %target_type <other tags
> alphabetically sorted> %server %plugin")
> * this becomes more apparent when viewing a graph: say you are plotting these two metrics on one graph:
> {'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'sent'}
> {'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'received'}
> for this graph, the tags 'what', 'service' and 'server' are constant, and the 'type' tag is variable. So the graph title can be computed to
> be "host1 mysql bytes" and the entries on the legend would be 'sent' and 'received'. This is what G-E does, it's trivial to implement
> and creates a non-reduntant display of information independent of how you group metrics into graphs.

That's a neat idea and I think a combo of the two would work really well.

> 2) nowadays, many metrics in graphite have names that are just to unclear. often they don't specify the unit of measurement
> (seconds? ms?, bits? bytes? elements in a queue? an amount of errors?), prefixes used (M,G, etc) and how it should be interpreted
> (is this a number per second? per flushinterval (like statsd counts), etc). G-E solves this by making the 'what' and 'target_type' tags
> mandatory and clearly defined.

We don't want to mandate any tags in OpenTSDB as per the tag size issue, so that is another example of what the metadata is meant to solve, particularly with the "units" field. It's something that could be bolted on to Graphite directly so that folks could upgrade and fill out the meta without having to rename their existing data files.

> 3) the current tree based organisational paradigm is a bit too simple. There's basically no way to organise your tree to support all
> ways of later querying it (so that you can later do "give me all metrics related to service mysql, or all metrics that are an amount of
> errors", which causes people spending too much time trying to.
> (see also the amounts of statsd issues/PR's related to suffixes, prefixes and namespacing). a tag based system makes this moot.

You're absolutely correct. That was a pain when I was trying to get OpenTSDB info into Graphite for the default composer view. I wound up writing a tree creation rule system that lets users define a number of trees with information they want to see. E.g. Netops could build a tree with only network related metrics, Sysops could write one for their system metrics, etc. But we also have full-text searching so that would need to be integrated with Graphite as well.

=== events ===
> quote: "B) We’re also adding annotation support to track/mark events. Same thing, Graphite could store notes in the DB or get the
> info from OpenTSDB".
> I think there's no benefit of deep integration between an event/change management system with a metrics database/management
> system, because events/changes are inherently very different things than metrics. They have different requirements wrt ingestion,
> storage, management, GUI's, etc. (I believe deep integration leads to feature creep, scope dilution, and harder integration with
> other software i.e. monolithic software)

Right, we don't intend to use Graphite or OpenTSDB for tracking all events or being used for change management. But we do want the ability to let users place a simple note associated with a timeseries in the storage system. The note could be something as simple as "I started up process X on this box, it pegged the CPU until I stopped it, watch out". Or folks could write scripts to comb their change/event systems and put notes in the appropriate spots, e.g. "<a href='link'>Change # Upgrade Web Servers</a>". The goal is to have these pop-up when a graph is rendered, then a user can hover or click to get to the details.

> They do go alongside on graphs, which is AFAICT the only place where metrics and events meet. That's why I think it's sensible to
> have a separate change/event management system, and have a timeseries graphing widget where they can be rendered together
> (as directed by dashboard software)
> From this conviction, I've written:
> * https://github.com/Dieterbe/anthracite change/event management system (inspired by graphite's philosophy)
> * https://github.com/Dieterbe/timeserieswidget to render graphs and events/changes in a "rich" way (with annotation text etc), as
> you would expect it supports and targets graphite and anthracite.

That's really cool and I like your timeseries widge display, that's exactly what we're going for. For us to achieve maximum performance, it made sense to stash timeseries related annotations inline with the data points so that queries don't have to make another call each time data is fetched.

> Btw, are you going to monitorama? I will, as will a bunch of other graphite devs.

Shoot, I would like to but I was just called out to my firm's headquarters so I won't be around on those dates. I'm Connecticut based so if ya'll have any other meetups I'd really like to pow-wow with Graphite devs. Thanks!

Wow, thanks a ton for all of your comments Dieter:

> re: "Nodes/Leaves should ID a timeseries with a generic ID field and provide a display_name field for GUI use"
> 1) In graphite only leaves in the tree point to timeseries. (i.e. if you have a metric name "foo.bar.baz" then "foo.bar" is nothing). I 
> think this is sensible.  do you want to change this? (if so what would "foo.bar" be?)

No change, a leaf is the only thing that could point to a timeseries, but each node (or branch) can have metadata (like # of child branches/leaves) or a different name for other types of storage systems.

> 2) what are reasons why the display name would be different from the metric name? (assuming the metric name would still be all 
> nodes in the tree "as.we.are.used.to" and which would translate into the TSUID) i don't see a need for this, especially in graphite 
> where you interact so closely with the metric names (when building graphs and dashboards) that hiding their names seems to be 
> disadvantageous (probably read the tagging section below first)

For us data geeks, "as.we.are.used.to" is perfectly fine, but if you want to share the information with less technical users, having an optional name override is very useful. Or if you abbreviate a metric, e.g. "if.tx.eth0", it would be nice to display on the graph "Interface Transmits: Ethernet 0". I'm thinking of the broader audience. For situations where no display name is provided, everything just defaults to the node name.

> === tagging ===
> I was actually going to start a separate discussion, but since you bring it up here...
> First, you should know about:
> * https://github.com/Dieterbe/graph-explorer/tree/master/structured_metrics : a library that converts the graphite metric list into 
> a tag space of metrics
> * https://github.com/Dieterbe/graph-explorer: a graphite dashboard that takes this tag space and provides a query language so you 
> can filter metrics and group them into graphs by tag(s) I will refer to these as 'G-E'

That’s some neat work and really helps with the naming restrictions inherit in Graphite.

> to take the last example from http://www.euphoriaaudio.com/opentsdb/http-api-meta.html
> that metric can be written as:
> {
>         "name": "tsd.http.latency_50pct",
>         "display_name": "HTTP Latency 50pct",
>         "tags": {'host': 'hobbes-64bit', 'type': 'all'} }
> 1) as you can see, I brought down the markup for tags substantially. I find the syntax demonstrated in your opentsdb RFC quite 
> overengineered, which is also evident because so many fields are just empty.

I'm sorry if this bit was a little confusing. An actual timeseries data point can be written to OpenTSDB (with the upcoming JSON format) via:

{ 
   "metric":"tsd.http.latency_50pct",
   "timestamp":1341144000,
  "value":42,
   "tags":{"host":"hobbes-64bit","type":"all"}
}

So the only data needed to submit a data point is the metric name and one or more sets of tags. The metadata is user supplied information associated with a single timeseries, metric, tag name or a tag value. When a brand new timeseries data point is written, a metadata entry is generated with default values, which is why the fields are blank. Then users can go in and add their description, a display name override, notes, or custom key/values. It's very useful for documenting things like: Where the data came from, who's in charge of the data source, what the data means, etc. When you have millions of unique timeseries and many users working with the system, this kind of data can be invaluable and I think Graphite users could benefit from having it as well. But no one is forced to use it.

> 2) one thing that I learned with G-E is that the more information you can capture in tags, the better (because it's structured data, 
> clearly defined, so more usable.   "name" has no clear meaning for metrics and so can only be used for text filtering ). Luckily, there's 
> no need for a "name" attribute, if you add additional tags such as protocol=http, what=seconds, type=latency_50pct.  this gives more 
> power for filtering, aggregating and grouping metrics when composing graphs.  As a rule of thumb, I would say never have a 'name' 
> attribute, always aim to structure data in more specific tags.
> (the canonical opentsdb example metric "mysql.bytes_sent schema=foo host=db2" becomes "service=mysql what=bytes type=sent schema=foo foo=db2")

The "name" field is only used in the general UID meta object for OpenTSDB and reflects either a METRIC, a TAGK or a TAGV name so it's valid in this situation but you're absolutely right that it's a terrible tag name, e.g. "name=something".

OpenTSDB's schema is more efficient when you have fewer tags, hence the combination of a Graphite style dotted syntax for the metric and tags for differentiation and aggregation. Metadata can also be used to break these up into tags like you're suggesting and make it easier to filter and search (I'm pushing everything into ElasticSearch and it works great)

> Furthermore:
> 1) I would argue that the display of metric names should not be configured at the metric level as in your examples, but can easily be 
> generated.
> * in a composer interface you can just list all the tags in a predefined order: In G-E it's just "%what %type %target_type <other tags 
> alphabetically sorted> %server %plugin")
> * this becomes more apparent when viewing a graph: say you are plotting these two metrics on one graph:
> {'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'sent'}
> {'what': 'bytes', 'service': 'mysql', 'server':'host1', 'type': 'received'}
> for this graph, the tags 'what', 'service' and 'server' are constant, and the 'type' tag is variable.  So the graph title can be computed to 
> be "host1 mysql bytes" and the entries on the legend would be 'sent' and 'received'.  This is what G-E does, it's trivial to implement 
> and creates a non-reduntant display of information independent of how you group metrics into graphs.

That's a neat idea and I think a combo of the two would work really well.

> 2) nowadays, many metrics in graphite have names that are just to unclear. often they don't specify the unit of measurement 
> (seconds? ms?, bits? bytes? elements in a queue? an amount of errors?), prefixes used (M,G, etc) and how it should be interpreted 
> (is this a number per second? per flushinterval (like statsd counts), etc).  G-E solves this by making the 'what' and 'target_type' tags 
> mandatory and clearly defined.

We don't want to mandate any tags in OpenTSDB as per the tag size issue, so that is another example of what the metadata is meant to solve, particularly with the "units" field. It's something that could be bolted on to Graphite directly so that folks could upgrade and fill out the meta without having to rename their existing data files.

> 3) the current tree based organisational paradigm is a bit too simple.  There's basically no way to organise your tree to support all 
> ways of later querying it (so that you can later do "give me all metrics related to service mysql, or all metrics that are an amount of 
> errors", which causes people spending too much time trying to.
> (see also the amounts of statsd issues/PR's related to suffixes, prefixes and namespacing). a tag based system makes this moot.

You're absolutely correct. That was a pain when I was trying to get OpenTSDB info into Graphite for the default composer view. I wound up writing a tree creation rule system that lets users define a number of trees with information they want to see. E.g. Netops could build a tree with only network related metrics, Sysops could write one for their system metrics, etc. But we also have full-text searching so that would need to be integrated with Graphite as well.

=== events ===
> quote: "B) We’re also adding annotation support to track/mark events. Same thing, Graphite could store notes in the DB or get the 
> info from OpenTSDB".
> I think there's no benefit of deep integration between an event/change management system with a metrics database/management 
> system, because events/changes are inherently very different things than metrics.   They have different requirements wrt ingestion, 
> storage, management, GUI's, etc.  (I believe deep integration leads to feature creep, scope dilution, and harder integration with 
> other software i.e. monolithic software)

Right, we don't intend to use Graphite or OpenTSDB for tracking all events or being used for change management. But we do want the ability to let users place a simple note associated with a timeseries in the storage system. The note could be something as simple as "I started up process X on this box, it pegged the CPU until I stopped it, watch out". Or folks could write scripts to comb their change/event systems and put notes in the appropriate spots, e.g. "<a href='link'>Change # Upgrade Web Servers</a>". The goal is to have these pop-up when a graph is rendered, then a user can hover or click to get to the details.

> They do go alongside on graphs, which is AFAICT the only place where metrics and events meet.  That's why I think it's sensible to 
> have a separate change/event management system, and have a timeseries graphing widget where they can be rendered together 
> (as directed by dashboard software)
> From this conviction, I've written:
> * https://github.com/Dieterbe/anthracite change/event management system (inspired by graphite's philosophy)
> * https://github.com/Dieterbe/timeserieswidget to render graphs and events/changes in a "rich" way (with annotation text etc), as 
> you would expect it supports and targets graphite and anthracite.

That's really cool and I like your timeseries widge display, that's exactly what we're going for. For us to achieve maximum performance, it made sense to stash timeseries related annotations inline with the data points so that queries don't have to make another call each time data is fetched.

> Btw, are you going to monitorama? I will, as will a bunch of other graphite devs.

Shoot, I would like to but I was just called out to my firm's headquarters so I won't be around on those dates. I'm Connecticut based so if ya'll have any other meetups I'd really like to pow-wow with Graphite devs. Thanks!

Revision history for this message

Dieter P (dieter-plaetinck) said on 2013-03-26:

#6

> For us data geeks, "as.we.are.used.to" is perfectly fine, but if you want to share the information with less technical users,
> having an optional name override is very useful. Or if you abbreviate a metric, e.g. "if.tx.eth0", it would be nice to display on
> the graph "Interface Transmits: Ethernet 0". I'm thinking of the broader audience. For situations where no display name is
> provided, everything just defaults to the node name.

ok so, in accordance with the later section about tag-based naming and context-based name computation, we could just provide a mapping for every tag to a "friendly" version: friendlies = {'if': 'interface', 'tx': 'transmits', 'eth0': 'Ethernet 0'}, this way we can still leverage the name computation, but only with friendly names. (note: this probably sounds harder then it really is, i'm doing this stuff with G-E and it's pretty easy to generate names in basically any format you want)

> It's very useful for documenting things like: Where the data came from, who's in charge of the data source, what the data
> means, etc. When you have millions of unique timeseries and many users working with the system, this kind of data can be
> invaluable and I think Graphite users could benefit from having it as well. But no one is forced to use it.

OK, but that doesn't translate into any particular datastructures AFAICT. One could, for example, have a pure tag-based metrics database (without the extra metadata), and a separate database for additional metadata by merely referencing the tag keys/values.

> The "name" field is only used in the general UID meta object for OpenTSDB and reflects either a METRIC, a TAGK or a TAGV
> name so it's valid in this situation but you're absolutely right that it's a terrible tag name, e.g. "name=something".
> OpenTSDB's schema is more efficient when you have fewer tags, (...)

ok so basically it seems you're using it only as a hack because things get slow if you have too many tags ;)
I still stand by my "split everything into tags, and avoid a 'name' tag/attribute" stance. we just need a decent database that can deal with tags properly

> (...) the "units" field. It's something that could be bolted on to Graphite directly so that folks could upgrade and
> fill out the meta without having to rename their existing data files.

IMHO all tags are an intrinsic part of the identity of a metric. if my code is reporting a metric in bits, but then i change it to report in bytes, then i just want to update the tag in the same place i'm reporting from; so that any metric submitted always has the correct unit that applies to that metric:
* less work (no need to update the unit somewhere else)
* no possibility that the wrong unit is used when looking at a different timerange then what the unit applies for (or a timerange with the unit switch in the middle)
* no ambiguity in tools that work with the data *before* it hits the storage system. for example statsd has a backend that feeds a websocket in addition to graphite, so you can view data as it gets generated on the originating host.
the fact that you get a different metric if you change the unit is not a problem, if you query for "tx eth0 server123 in bits" both metrics would match and both would show up (and one of them gets automatically scaled), the UI is transparant to this.

BTW to avoid any confusion, i'm not an official graphite dev, i just contributed code and worked on a bunch of related projects. the opinions expressed are mine, not of the graphite devs (though i will try to convince/discuss (with) them, based on what i learned with G-E :-)

> For us data geeks, "as.we.are.used.to" is perfectly fine, but if you want to share the information with less technical users,
> having an optional name override is very useful. Or if you abbreviate a metric, e.g. "if.tx.eth0", it would be nice to display on
> the graph "Interface Transmits: Ethernet 0". I'm thinking of the broader audience. For situations where no display name is
> provided, everything just defaults to the node name.

ok so, in accordance with the later section about tag-based naming and context-based name computation, we could just provide a mapping for every tag to a "friendly" version: friendlies = {'if': 'interface', 'tx': 'transmits', 'eth0': 'Ethernet 0'}, this way we can still leverage the name computation, but only with friendly names. (note: this probably sounds harder then it really is, i'm doing this stuff with G-E and it's pretty easy to generate names in basically any format you want)

> It's very useful for documenting things like: Where the data came from, who's in charge of the data source, what the data
> means, etc. When you have millions of unique timeseries and many users working with the system, this kind of data can be
> invaluable and I think Graphite users could benefit from having it as well. But no one is forced to use it.

OK, but that doesn't translate into any particular datastructures AFAICT.  One could, for example, have a pure tag-based metrics database (without the extra metadata), and a separate database for additional metadata by merely referencing the tag keys/values.

> The "name" field is only used in the general UID meta object for OpenTSDB and reflects either a METRIC, a TAGK or a TAGV
> name so it's valid in this situation but you're absolutely right that it's a terrible tag name, e.g. "name=something".
> OpenTSDB's schema is more efficient when you have fewer tags, (...)

ok so basically it seems you're using it only as a hack because things get slow if you have too many tags ;)
I still stand by my "split everything into tags, and avoid a 'name' tag/attribute" stance. we just need a decent database that can deal with tags properly

> (...) the "units" field. It's something that could be bolted on to Graphite directly so that folks could upgrade and
> fill out the meta without having to rename their existing data files.

IMHO all tags are an intrinsic part of the identity of a metric.  if my code is reporting a metric in bits, but then i change it to report in bytes, then i just want to update the tag in the same place i'm reporting from; so that any metric submitted always has the correct unit that applies to that metric:
* less work (no need to update the unit somewhere else)
* no possibility that the wrong unit is used when looking at a different timerange then what the unit applies for (or a timerange with the unit switch in the middle)
* no ambiguity in tools that work with the data *before* it hits the storage system. for example statsd has a backend that feeds a websocket in addition to graphite, so you can view data as it gets generated on the originating host.
the fact that you get a different metric if you change the unit is not a problem, if you query for "tx eth0 server123 in bits" both metrics would match and both would show up (and one of them gets automatically scaled), the UI is transparant to this.

BTW to avoid any confusion, i'm not an official graphite dev, i just contributed code and worked on a bunch of related projects.  the opinions expressed are mine, not of the graphite devs (though i will try to convince/discuss (with) them, based on what i learned with G-E :-)

Revision history for this message

Launchpad Janitor (janitor) said on 2013-03-27:

#7

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message

FRLinux (frlinux-frlinux) said on 2013-04-22:

#8

Hey guys, is that still active? I pulled out the code of github and this is something I definitely want to test.

Revision history for this message

James Stewart (amorphic) said on 2013-07-04:

#9

Hello all,

Just came across this thread while looking into using a proprietary TSDB as a backend to Graphite in my day job. It seems to me that if I am going to do the work to plug in this backend, it would be worth abstracting the interface, (as discussed above) such that any other backend, (OpenTSDB or otherwise) could also be plugged in.

We can potentially branch off on our own and do this, but obviously I'd prefer to stay compatible with the trunk and also share my work with others who might use it if possible. Given that my employer might be in a position to sponsor my work on this, is there still interest? Chris I'm particularly interested in your opinions and those of the OpenTSDB team, being that OpenTSDB is the most obvious candidate for 'plugging'.

Cheers,

James

Revision history for this message

Chris Larsen (clarsen575) said on 2013-08-06:

#10

Hi James, I'd love it if you could help and drum up some interest in abstracting the web interface. I know some of the authors have said that they'd rather abstract at the Carbon layer, but I don't think that would work very well as it still expects a "flat", "dotted" style time series name and expects each series to be an individual file. I'd rather see Graphite Web be completely source agnostic and able to accept non-flat metrics as well.

Revision history for this message

Dieter P (dieter-plaetinck) said on 2013-08-10:

#11

> is there still interest?

hell yeah.

> carbon layer (...) and expects each series to be an individual file.

that's a whisper (and maybe ceres) implementation detail. not a carbon design goal or anything.

In case anyone cares, at work I wrote https://github.com/vimeo/carbon-tagger ; it's a daemon that sits in (y)our carbon pipeline and maintains a tag database for all metrics flowing through, it can do this because the metric id's ("names" if you will) contain tag key/value pairs. with our dashboard (http://vimeo.github.io/graph-explorer/) we can then do some interesting things based on the tags.

Personally I think metric identifiers should be 100% key/value pairs; others seem to find it's better to have a dot string with some associated key/value tags. Implementing both in graphite-web would probably get too messy, so maybe it's better to leave that to specialized dashboards.

Revision history for this message

Peter C. Norton (spacey-launchpad-net) said on 2013-08-10:

#12

I haven't had a chance to read this whole thread yet, but I have implemented a non-whisper/non-ceres/non-rrd backend so thought I'd chime in. As a hackday project, I have added support to graphite-web to query metrics from kairosdb. I could help with abstracting for pluggable backends as well. I guess a good place to start is what the requirements are.

The main changes are here: https://github.com/pcn/graphite-web/tree/add_kairosdb_support

This relies on a kairosdb glue library where I put most of the details of getting data in a format appropriate for graphite, but the summary is:

1) In master, there is an idea of "time intervals" which allow data to be queried from >1 physical location in a single backend. E.g. host1 has data from yesterday and host2 has data from today. Their data can be joined and presented. Since kairosdb stores the data in a backend where this doesn't really make sense, I haven't gotten into the details of this, but this is my understanding and it should be documented.

2) Graphite requires that data be in even steps. Kairosdb offers good grouping and aggregation functions, so it was pretty easy to turn data that could be spaced unevenly to be grouped into appropriate time intervals. This is in the glue library, and isn't in the graphite-web code.

3) graphite-web implements search functionality using a better-than-globbing/better-than-fnmatch feature, you just have to give it a list of metrics that the backend knows about. I didn't know that so I implemented something inferior that I should go and fix soon.

If you look at the commits in my branch the work required to abstract these things should be pretty apparent, along with some other details that don't really stand out in my memory (this was a month ago, and I need to fix some of this up soon). I'd be happy to talk about making an API for this if I can help.

-Peter

Revision history for this message

James Stewart (amorphic) said on 2013-08-12:

#13

Thanks Chris, Dieter and Peter for your replies. I'm really glad to hear that there's still interest in the idea of abstracting the backend to graphite-web.

I'm definitely of the opinion that the abstraction layer be built into graphite-web rather than carbon. There are already 3 data sources (whisper, ceres and rrd) but last time I looked they each had specific code in graphite-web. Perhaps the first step would be looking at how they might be abstracted via a single interface.

There is definitely merit to the idea of moving away from dot-separated metric names. However being that the dot-separated names are quite integral to graphite-web, I think there will be work enough in just making the backend pluggable.

Peter your kairosdb interface sounds like a perfect candidate. I envisage a standard plugin interface so that when somebody wishes to add a new backend, (like you did for kairosdb) there is a clearly-defined means of doing so.

I've forked graphite-web and will also take a look at Peter's kairosdb code. When I have a proposal for how this might be implemented I'll post back here for comments. Hopefully some of you will be able to lend advice and possibly help out with the implementation!

Cheers,
James

Graphite

Graphite-Web Refactoring Help Request

Question information

Subscribers

Graphite

Graphite-Web Refactoring Help Request

Question information

Related bugs

Related FAQ:

Subscribers