Correct implementation of pooling

Asked by kris van der Merwe

The six lines of code on how to use PySqlPool confused me.
Im also new to python, so i dont understand how persistence works in python.
Im using mod_python under Apache (I suspect this may be the typical target audience for PySqlPool).

The initial call to PySql in your example is:
  connection = PySQLPool.getNewConnection(host = 'localhost', username='user', password='pass', schema='mysql')

I suspect the variable naming in the example may be confusing in that connection is not an actual Mysql connection as returned by MySQLdb.connect

Question 1)
Is connection in the above call the instance of the connection pool?
How do i ensure that the instance of the pool persists in a (mod_python) deployment?
(maybe i wouldnt have this question if i knew python implementations)

Question 2)
How do i obtain a connection like MySQLdb.connect from the connection pool?
From your example:
  query = PySQLPool.getNewQuery(connection = connection)
  query does not follow the standard MySQLdb patterns.
Is there a reason why one should not be able to use the standard MySQLdb patterns?

Question information

Language:
English Edit question
Status:
Solved
For:
PySQLPool Edit question
Assignee:
No assignee Edit question
Solved by:
kris van der Merwe
Solved:
Last query:
Last reply:
Revision history for this message
Nikoleta Verbeck (nerdynick) said :
#1

This might help out a bit.

PySQLPool.getNewConnection() returns an instance of PySQLConnection. PySQLConnection is a simple Command Pattern based class to store a standard connection information and provide some internal functions for the internals of PySQLPool to get information that it needs.

I can't remember when I implemented it but getNewConnection()/PySQLConnection does take all the standard connection information that MySQLdb.connect does. It is actually preferred to use the same connection variables that MySQLdb uses as in the future I wish to phase out the old connection information.

As a trick as well you can dynamically pass in your connection information from a dictionary. i.e.

connectionInfo = {'user':'user', 'passwd':'password'}
connection = PySQLPool.getNewConnection(*[], **connectionInfo)

To answer your second question. You don't need to worry about the actual connections. PySQLQuery (Returned from PySQLPool.getNewQuery()) manages the actual connections from the inter-workings and executes your query(s).

As a note when using mod_python/mod_wsgi you will wont to close out your connections at the end each request or at the end of each apache threads life. As you will start to see network port locks. As MySQL will timeout your connection but your apache threads may not ever find out about this for a while and will not close the connection from the web server's side. Hope that makes since.

Hope this helps. If not feel free to ask for some clarification on any of it.

Revision history for this message
kris van der Merwe (krisvdm) said :
#2

Thanks for the response.

From your note on apache threads i assume one would have to call something like PySQLPool.returnConnection after using a query object?

I may not have not been very clear in my questions.
Question 1)
When a user request a webpage.py, webpage.py instansiates variables and possibly objects.
I assume these variables disappear once the code in the request has been processed.
There is no use in creating a database pool if it has to be re-created for each request as this will result in even slower performance.
So how does one ensure that the database pool (python object/instance) persists between calls (eg in mod_python)?

Question 2)
Ok, i understand - I suspect one can maintain better control over the pool by not exposing connections, thereby not allowing standard patterns. The advantage of using standard patterns is that one do not have to change the application code when you change the implementation in a new version of PySQLPool. Standard patterns also mean developers only have to learn and remember one way of interacting with MySQL databases.

Thanks

Revision history for this message
Nikoleta Verbeck (nerdynick) said :
#3

Your actually going to want to call PySQLPool.terminatePool() at the end of your request.

1) The performance loss vs gain depends on what your doing with your web framework. If your only ever making 1 query per request the a connection pool is not for you, but if your making a set of queries per request then a connection pool is perfect for your use. As the goal of a pool is to hold and manage all your connection without your code needing to pass the connection reference all over the place. Now with that said PySQLPool can manage connections between each request per apache thread running as mod_python/mod_wsgi actual maintain a running instance of your framework in each thread. Now PySQLPool cannot manage connections across each thread. So for each thread you will have a separate connection. So if you wish to you do not need to close each connection at the end of each request, but as with my experience trying this. You will start to have malock problems with your network ports. This problem doesn't so much come about from PySQLPool but with MySQL closing its inactive connections from its side, but without something listening on the web servers side for this close connection call. The server doesn't know that it needs to remove the malock on that network port. The only solutions I have found to date for solving this is to close all connections at the end of each request or calling a cleanup (PySQLPool.cleanupPool) at the start of each request. Where closing all is the preferred as it guarantees on high active servers that no port is ever wasted. I am looking at possible ideas of adding a threaded layer to PySQLPool to watch all connections for inactivity or seeing if their is something in MySQLdb that I can hook into for callbacks on connection close requests. Hope this all makes some since as it took me a few days to figure out that problem on one of my high active server clusters.

2) What standard patterns are you talking about? I do not know of any standard patterns for connection pools or even MySQL connection classes. If you can point me to some I may be able to in-act a layer to PySQLPool to allow for this.

ps. Sorry for the delayed response been a busy week.

Revision history for this message
kris van der Merwe (krisvdm) said :
#4

Thanks Nick

Thats was enlightening. I find python a bit baffling (the indentation is part of the expression approach irritates me a bit). Im starting to realise that once a module variable is loaded in a mod_python thread, it is always available. Im still wondering when a variable persists and when not though ....

1) I was wondering if this would address Mysql disconnects:
     try:
           connection.ping() # I saw this is a way to test Mysql connection
                                              # I assume the ping would be fast?
           ....
     except MySQLdb.OperationalError: # loss of connection
           ....
The exception above may not cover an Mysql disconnect, but maybe it does?

2) When I looked up the Mysql connect sequence, I found that the standard "query=connection.cursor()" and query=PySQLPool.getNewQuery behaved differently (return different objects). Im new to the mysql API so i may be missing something ...

Kind regards