Python MySQL Connection Pooling

Some helpful tips

Asked by Eric Heller on 2011-02-13

Hello,

I've been puzzling over this package for the last few days, and I think I *finally* understand how to make this work as expected!

That said, this is problematic, because the package is rather confusing to use until you understand how. I think the code needs some major refactoring and renaming of objects and methods to make this more clear.

I know Nick has answered a bunch of questions to people trying to understand PySQLPool, but those didn't help me either. Sometimes you just have to read the source!

Here are some things I find helpful to understand:

1. Calling PySQLPool.getNewConnection() does NOT create a *connection* to the SQL server. This method might be more properly called something like `makeConnectionInformation()` The PySqlConnection object returned isn't alive or active or anything. It just holds the information to make a connection to the SQL server via MySQLdb.

2. You don't have to call PySQLPool.getNewPool(). This is basically a no-op. It returns a new Pool object that you can't really use anywhere!

3. Calling PySQLPool.getNewQuery() does not create a connection. What it DOES do is create a new Pool object. This method might more accurately be called `getNewPool()`. Once you've created a PySqlQuery, you've got a pool.

4. Calling a method like `.query()` on a PySqlQuery object finally DOES establish a connection. It asks the pool for a free existing connection, if one is available. Otherwise it establishes a new connection, up to `maxActiveConnections` set on the PySqlPool object. It blocks until it can get a connection from the Pool.

Once it gets the connection, it executes the query you specified in the method `.query()` This call blocks until the query has finished and the result is returned from the server. Then the connection is returned back to the Pool, and control returns to your code.

5. The `.query...()` methods handle the creation of a MySQLdb Cursor object for you. If you're used to using MySQLdb, you probably do something like:

 c = db.cursor(MySQLdb.cursors.DictCursor)
 c.execute("....")
 c.close()

Calling the `.query...()` methods on the PySqlQuery object automatically makes a MySQLdb Cursor to execute your query via MySQLdb, and then closes the cursor before control is returned to your code. This cursor is always created as a DictCursor, and is accessible after calling `<PySqlQueryObj>.query()` via `<PySqlQueryObj>.record`.

6. The fact that I had to do this much explaining means this package needs some work! What is the development status of the project? I see a lot of '2008's in the code, so I'm guessing ... not much.

I might fork off this project (under a different name, since the API's would have to be incompatible), if anyone else is interested, just let me know.

Eric

Question information

Language:
English Edit question
Status:
Answered
For:
PySQLPool Edit question
Assignee:
Nick Verbeck Edit question
Last query:
2011-02-13
Last reply:
2011-03-14
Nick Verbeck (nerdynick) said : #1

So in reading this it make me a little curious. Did the docs not help out at all, or did you just not manage to find them. If not they are located at http://packages.python.org/PySQLPool.

If you check the commit logs (Google Code) this is still an active project. Currently I'm looking at migrating everything away from Launchpad/Google Code to Github. As I find it a lot easier for others to contribute to. As well as it allows for better forking. The other reason is I don't find the Launchpad interface intuitive enough and I think people just get lost in it.

As far as the function naming goes I to have a blueprint/story slated for the next release to redue a lot of that. Including an adaptor to allow for older implementations to still be able to use it.

So there are a few corrections to your list.

1) Calling getNewPool() does actually return an object that you can operate on. Granted in most situations no one ever needs to use them. You can see this with the terminatePool() , commitPool(), cleanupPool().

2) getNewQuery() doesn't create a new pool per say. The core of PySQLPool uses the Borg pattern to manage all the connections and allow for multi threaded support without needing to pass around objects. The PySQLQuery object that is returned from getNewQuery() is more a class to help manage everything around a query or set of queries. It makes the calls to the pool layer to get an open connection. Handles parsing & building escaped queries. As well as providing helper functions to fetch needed information about the last preformed queries. In future release it will also have everything to assist in transaction based querying.

If you are willing to help out I do have a list of Blueprints that have been mapped out. That might solve a lot of the problems you are having as well as provide a large set of new features. Fill free to shoot me any patches you may have as well. I'll get them into the release/code base asap. I'll also look towards adding you as a committer after a few patches.

Eric Heller (eheller) said : #2

Yes, I did find the documentation, but as I said it still left me pretty baffled to be perfectly honest.

I'm sorry if you took this the wrong way, I'm not trying to make any digs at you. For what the package is, it works fine, once you understand how to use it correctly. So, I'm not critizing the code per say, just the way it's organized and the confusion it induces.

I didn't look at the commit logs on google code, sorry! There's stuff everywhere. Launchpad, google code - maybe pick one? Or as you said, put the project on GitHub. GitHub is great!

1) Calling getNewPool() does actually return an object that you can operate on.

Yes, I see that you're right. I didn't realize the PySqlPool class keeps a static dictionary of information, and that creating an instance of the class more or less acts as an interface to the class's static information.

That's still a little confusing though, becuase it's not really a "new" Pool, it's a new object that lets you interface with /the/ Pool.

2. Right, again, I didn't read deep enough into the code. I saw the line `self.Pool = PySQLPool()` in the __init__ method of PySqlQuery and assumed this was creating a new Pool. But (as above) as see that it's not a new pool per say, just an object that interfaces with the pool.

I'm glad there are plans for better transaction support in the future on the 'Query' layer, instead of on the Pool layer. This has made me create a pretty ugly work-around.

3. I'd be willing to help out if the direction of the project, mapped out in your blueprints, point in the same general direction as my own needs. Otherwise, it makes more sense to spend my time either writing my own pooling layer from scratch, or adapting your code for a better fit for me.

Anyway, thanks for the work.

Eric

Eric Heller (eheller) said : #3

Oh, I forgot to mention a couple of small things.

1. You have access to the escape_string() function from MySQLdb on the PySqlQuery object, but I really need to get at real_escape_string(), which is dependent on the connection to the MySQL server (eg, if `conn` is a MySQLdb Connection, then mysql_real_escape_string() is equivalent to `conn.escape_string()`. The mysql_real_escape_string() function is dependent on the character encoding used by the server. There needs to be some easier way to get at this. For now I've had to hack the PySqlQuery object, call _GetConnection manually, access self.conn.escape_string(), and then let the connection go back to the pool. Too hacky.

2. It should be noted that in the PySQLConnectionManager, when you make a call to TestConnection(), the code executes the following to determine if the connection is 'alive':

cursor = self.connection.cursor(MySQLdb.cursors.DictCursor)
cursor.execute('select current_user')

However, if you HAVE lost connection, calling cursor.execute(..) has the potential to Block for a very, very long time (forever, as far as I know) until the connection is re-established, /depending/ on how and when the connection was lost. Furthermore, I don't think a call to .execute() has any sort of timeout in MySQLdb. It may not raise an exception either.

An example when this occurs is if you lose network connectivity altogether. The call
cursor.execute('select current_user') will just continue to block until network connectivity is restored.

This is probably not an issue for most people, and probably even then, not very often, but it needs a graceful solution. Unfortunately, the application I'm developing will be permanently sitting behind a flaky wireless network connection, so this is an issue for me :D

There needs to be a way to set a timeout when attempting to test the connection. In the past, I've done this by spawning a separate thread for the "Test Alive" task, and then kill the thread if it doesn't terminate in X seconds, and throw a Timeout exception. I've used this method myself. It may not be the best way to handle the issue, but it does work. Maybe you have some ideas?

3. A similar issue is cleanuping up dead connections. It seems that, for now, this will only happen by explicitly calling cleanupPool() or when you try to create a new PySqlQuery, and there are already maxActiveConnections, it may cull one of those dead connections so that the query gets a new, live connection.

Again, this behaviour can have blocking problems for me (as above).

One option would be to introduce a separate Reaper thread that automatically tests and cleans up the dead connections.

That's all for now.

Nick Verbeck (nerdynick) said : #4

If you've got some suggestions on where and what in the docs I should cover more. I'd really like to hear them. I will also look at finding a way of integrating a lot of the note from this thread in there as well.

1) I do have a patch that is suppose to resolve the mysql_real_escape_string() issue. I just haven't managed to get around to testing it and everything else that was provided within the patch to commit it and release a minor release. The patch allows the call of escape_string() on a PySQLQuery object. It does exactly what your hack around is doing.

2) I actually didn't know about this. I will have to find away around determining if a connection is alive other then attempting to run a query. I would like to avoid running another thread to test this. Since it could lead to slower processing times.

3) I am actually working on planning out a way of using a Reaper thread to handle this.

Launchpad Janitor (janitor) said : #5

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Nick Verbeck (nerdynick) said : #6

You will be happy to know I have finally finished getting the Github base repo up and running. I'm still in the middle of trying to move everything over to it, but all code has been moved. Along with all open branches. Issue and blueprints will aether move into Github or into a Jira haven't full decided on that yet.

URL: https://github.com/nerdynick/PySQLPool

Nick Verbeck (nerdynick) said : #7

Updating to Answered so its tracked in the upfront search results.

Can you help with this problem?

Provide an answer of your own, or ask Eric Heller for more information if necessary.

To post a message you must log in.