Per session statistics

Asked by Neil Katin

Mysql has a "show profiles" command that show a bunch of (somewhat inaccurate, as Mark Callaghan pointed out, referring to this post: http://blogs.mysql.com/peterg/2008/11/06/show-profile-information_schemaprofiling/) information.

The ourdelta patches adds some additional statistics, aggregated by user, tables (and soon to go away: by client).

But it appears there is no way to access these new statistics for the current session. Am I correct in this?

Assuming I'm right, would the ourdelta team be open to a patch that added a new INFORMATION_SCHEMA table with current-session statistics?

Question information

Language:
English Edit question
Status:
Answered
For:
OurDelta Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Arjen Lentz (arjen-lentz) said :
#1

Hi Neil

On 01/05/2009, at 7:14 AM, Neil Katin wrote:
> Mysql has a "show profiles" command that show a bunch of (somewhat
> inaccurate, as Mark Callaghan pointed out, referring to this post: http://blogs.mysql.com/peterg/2008/11/06/show-profile-information_schemaprofiling/)
> information.
>
> The ourdelta patches adds some additional statistics, aggregated by
> user, tables (and soon to go away: by client).

As far as I'm concerned, the per-client/host stuff won't go away.
It might not be useful for Google, but it's useful for the rest of the
world.

Some fixing will be needed, there appears to be a memory leak
somewhere which we need to track down. But there's now a runtme toggle
to switch off the stats and it's turned off by default - also to
reduce the little bit of time overhead that is created by the
additional code, when people don't need it.

> But it appears there is no way to access these new statistics for
> the current session. Am I correct in this?
> Assuming I'm right, would the ourdelta team be open to a patch that
> added a new INFORMATION_SCHEMA table with current-session statistics?

What info specifically are you trying to find out about within the
current connection?

Cheers,
Arjen.
--
Arjen Lentz, Director @ Open Query (http://openquery.com)
Affordable Training and ProActive Support for MySQL & related
technologies

Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org

Revision history for this message
Neil Katin (d-launchpad-askneil-com) said :
#2

Hi Arjen. Thanks for the fast response.

I wanted the same information as is kept in the USER_STATISTICS table, in particular I wanted an indication of rows read/written. We're hoping to use this to get an indication of which operations were "expensive" to tune our application and our pricing models. I know rows is an imperfect metric, but it seems better than counting selects.

Revision history for this message
Neil Katin (d-launchpad-askneil-com) said :
#3

> As far as I'm concerned, the per-client/host stuff won't go away.
> It might not be useful for Google, but it's useful for the rest of the
> world.

> Some fixing will be needed, there appears to be a memory leak
> somewhere which we need to track down

I had actually tracked this one down, since our servers had encountered this. The root-cause problem was that the CLIENT field is 16 chars long, but hostnames were limited to 64 bytes, but checked against the 16 char CLIENT field. If your hostname was longer than 16 chars it looked like a new one, and was inserted as a new row in the CLIENT_STATISTICS data structure; a new row was added for each connection.

CLIENT was 16 chars because it shared the same data structure as USER_STATISTICS. The reasonable fix seemed to be to either make a different structure that had CLIENT as 64 bytes, or to truncate host names to 16 characters to match the stored value.

Revision history for this message
Arjen Lentz (arjen-lentz) said :
#4

Neil - re the leak, would you care to check if there's a bug on this in the percona patches project, if not create it, and if there is add this info to it? That'd be fab tnx.

Revision history for this message
Arjen Lentz (arjen-lentz) said :
#5

Hmm... tracking that could be quite high overhead.
Using the enhanced slow query log stats, shouldn't you be able to catch the queries you want anyway?
It's not just about time and rows, it's also about index use, tmp table use, sort buffers etc. All that is there.

Revision history for this message
Neil Katin (d-launchpad-askneil-com) said :
#6

> Neil - re the leak, would you care to check if there's a bug on this in the percona patches project, if not create it,
> and if there is add this info to it? That'd be fab tnx.

I'll go look at the percona bug site, but there already was an ourdelta bug I reported on it: https://bugs.launchpad.net/ourdelta/+bug/344447 from mid-March. I assumed from Mark's comments that the problem was well-understood, and the feature
(host statistics aggregation) was going away. But from your comment I guess you feel differently, and could use an
actual fix; if so I'll see if I can put one together if you want one.

> Hmm... tracking that could be quite high overhead.

I guess I'm missing the issue. The code is already collecting all this data, so it can't be an instrumentation
issue. I'm just trying to aggregate it in a new way: via current session, in addition to user/host/table.

> Using the enhanced slow query log stats, shouldn't you be able to catch the queries you want anyway?

If I just wanted to catch a few expensive queries then you would be right. I guess I didn't explain our use case well.

My interest is more of an accounting issue. I would like to find metrics I can collect to figure out the relative
cost of various features. We know (on a given connection) which customers we did work for, and what sorts
of things we did. We're looking for better visibility into how "expensive" that work was, on an aggregated basis.
We're trying to better understand the underlying infrastructure costs to various features we have.

So its not that any individual query is taking "too long" (the classic tuning problem). Its that we want to better
understand which features are costing us the most, and for that we need to be able to match some metric
of "underlying cost/effort" to our high-level set of requests.

We thought about assigning each of our customers to a different mysql user; that way the by-user aggregation
would give us some of the info. But we also wanted to understand the costs of various requests in our system,
and we didn't want to create a huge number of artificial mysql user accounts.

I could turn on the slow query log for all requests, but that would be very high overhead.

Is that more clear?

Revision history for this message
Launchpad Janitor (janitor) said :
#7

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Arjen Lentz (arjen-lentz) said :
#8

Hi Neil

Yep fix for the leak in the userstat patch would be great, thanks.

Re the keeping track of usage per session... if you care to write up a patch for the patch, I think that can be good.
Perhaps it's something that people might want to switch on separately. If you check out the current version of the patch, there's a global toggle to switch userstat on. You could make that into a tri-state, with =2 to switch this on.

Can you help with this problem?

Provide an answer of your own, or ask Neil Katin for more information if necessary.

To post a message you must log in.