Stado - General question about parallel queries

Asked by Cesar Leon

Hi, I'm about to try stado to do some tests with parallel queries involving a big table that I hold in a single PostgreSQL Standard Plus server (v9.1.3), the problem is that many clients had built many queries using this table, the question is: Can I use stado to increase performance and avoid the users lose their queries? (some queries are very complex). If not, which will be the best alternative to increase the performance without impact to the clients?

Thanks in advance.
César.

Example of a complex query, How stado will resolve a query like that if the table is partitioned in many nodes?:

select distinct
       case when ticketnbr ~ '^[0-9]{13}-[0-9]+' then
  left(ticketnbr, 13-length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')))
  || lpad((substring(ticketnbr, length(ticketnbr)-length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)'))*2, length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')))::integer
  +generate_series(0, substring(ticketnbr from '^[0-9]{13}-([0-9]+)')::integer - substring(ticketnbr, length(ticketnbr)-length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)'))*2, length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')))::integer))::varchar(30),
  length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')), '0')
 else
  case when ticketnbr ~ '^[0-9]{13}[0-9]\/[0-9]+-[0-9]+$' then
   substring(ticketnbr, 1, 13-length(substring(ticketnbr from '^[0-9]{13}[0-9]\/[0-9]+-([0-9]+)$')))
   || lpad((substring(ticketnbr from '^[0-9]{13}[0-9]\/([0-9]+)-[0-9]+$')::integer
   +generate_series(0, substring(ticketnbr from '^[0-9]{13}[0-9]\/[0-9]+-([0-9]+)$')::integer-substring(ticketnbr from '^[0-9]{13}[0-9]\/([0-9]+)-[0-9]+$')::integer))::varchar(30),
   length(substring(ticketnbr from '^[0-9]{13}[0-9]\/[0-9]+-([0-9]+)$')), '0')
  else
   left(ticketnbr, 13)
  end
 end as tkts_nmr_ticket,
 first_value(ptt.cdg_tipo) over (partition by rsda.loaddt, rsda.ticketnbr, rsda.pnrlocatorid, rsda.pnrcreatedate order by ptt.orden) as tkts_tipo_emision,
 rsda.pnrlocatorid as rvas_cdg_pnr,
 rsda.pnrcreatedate as rvas_fch_reserva,
 rsda.loaddt as tkts_fch_inicio
from res.ressuspensedocarrangement rsda
inner join aux.prioridad_tipo_tkts ptt
on ptt.cdg_tipo=coalesce(ssrcode, case when resarractioncode='TE' then 'TKNE' else case when resarractioncode='TK' then 'TKNM' else 'TKNM' end end)
left join aux.gpnr_reservas_err gre
on gre.loaddt=rsda.loaddt and gre.pnrlocatorid=rsda.pnrlocatorid and gre.pnrcreatedate=rsda.pnrcreatedate
where
(
 (rsda.loaddt in ('2011-11-03', '2011-11-04') and rsda.pnrcreatedate between '2011-11-02' and '2011-11-03')
 or
 (rsda.loaddt in ('2011-11-05') and rsda.pnrcreatedate>='2011-11-02')
)
and intrapnrsetnbr=0 and ticketnbr is not null
and resarractioncode in ('TA', 'TE', 'TK', 'TM', 'TR', 'TV') and gre.loaddt is null
and rsda.loaddt='2011-11-03'

Question information

Language:
English Edit question
Status:
Solved
For:
Stado Edit question
Assignee:
No assignee Edit question
Solved by:
Jim Mlodgenski
Solved:
Last query:
Last reply:
Revision history for this message
Jim Mlodgenski (jim-cirrusql) said :
#1

Unfortunatly, Stado does not support the PostgreSQL window function yet so the part of your query "first_value(ptt.cdg_tipo) over (partition" will not work. This is something we're considering adding in the near future.

Revision history for this message
Cesar Leon (cesar-leon) said :
#2

Hi Jim, thanks very much for your quick reply. I understand that stado is still being developed and window function is on the TODO list, but I mean, In the case that the table is partitioned among several databases. What will happen when the user run this query (or any query unsupported by stado)? Stado will return an error or it will join all the data in one node and then it execute the query? (or something like that)

Best regards.
César.

Revision history for this message
Best Jim Mlodgenski (jim-cirrusql) said :
#3

If you run a query similar to yours, Stado will push down all of the joins out to the individual nodes and run the query in parallel across all of the nodes in your cluster. The efficiency depends on how you define your schema. In Stado, you have the ability to define how your tables are partitioned across your cluster. Take a look at the planning guide to get a better understanding.
http://www.cirrusql.com/community/stado/doc/plan_guide#schema

Revision history for this message
Cesar Leon (cesar-leon) said :
#4

Ok Jim, thanks very much.