Stado

Stado - General question about parallel queries

Asked by Cesar Leon on 2012-03-21

Hi, I'm about to try stado to do some tests with parallel queries involving a big table that I hold in a single PostgreSQL Standard Plus server (v9.1.3), the problem is that many clients had built many queries using this table, the question is: Can I use stado to increase performance and avoid the users lose their queries? (some queries are very complex). If not, which will be the best alternative to increase the performance without impact to the clients?

Thanks in advance.
César.

Example of a complex query, How stado will resolve a query like that if the table is partitioned in many nodes?:

select distinct
       case when ticketnbr ~ '^[0-9]{13}-[0-9]+' then
  left(ticketnbr, 13-length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')))
  || lpad((substring(ticketnbr, length(ticketnbr)-length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)'))*2, length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')))::integer
  +generate_series(0, substring(ticketnbr from '^[0-9]{13}-([0-9]+)')::integer - substring(ticketnbr, length(ticketnbr)-length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)'))*2, length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')))::integer))::varchar(30),
  length(substring(ticketnbr from '^[0-9]{13}-([0-9]+)')), '0')
else
  case when ticketnbr ~ '^[0-9]{13}[0-9]\/[0-9]+-[0-9]+$' then
   substring(ticketnbr, 1, 13-length(substring(ticketnbr from '^[0-9]{13}[0-9]\/[0-9]+-([0-9]+)$')))
   || lpad((substring(ticketnbr from '^[0-9]{13}[0-9]\/([0-9]+)-[0-9]+$')::integer
   +generate_series(0, substring(ticketnbr from '^[0-9]{13}[0-9]\/[0-9]+-([0-9]+)$')::integer-substring(ticketnbr from '^[0-9]{13}[0-9]\/([0-9]+)-[0-9]+$')::integer))::varchar(30),
   length(substring(ticketnbr from '^[0-9]{13}[0-9]\/[0-9]+-([0-9]+)$')), '0')
  else
   left(ticketnbr, 13)
  end
end as tkts_nmr_ticket,
first_value(ptt.cdg_tipo) over (partition by rsda.loaddt, rsda.ticketnbr, rsda.pnrlocatorid, rsda.pnrcreatedate order by ptt.orden) as tkts_tipo_emision,
rsda.pnrlocatorid as rvas_cdg_pnr,
rsda.pnrcreatedate as rvas_fch_reserva,
rsda.loaddt as tkts_fch_inicio
from res.ressuspensedocarrangement rsda
inner join aux.prioridad_tipo_tkts ptt
on ptt.cdg_tipo=coalesce(ssrcode, case when resarractioncode='TE' then 'TKNE' else case when resarractioncode='TK' then 'TKNM' else 'TKNM' end end)
left join aux.gpnr_reservas_err gre
on gre.loaddt=rsda.loaddt and gre.pnrlocatorid=rsda.pnrlocatorid and gre.pnrcreatedate=rsda.pnrcreatedate
where
(
(rsda.loaddt in ('2011-11-03', '2011-11-04') and rsda.pnrcreatedate between '2011-11-02' and '2011-11-03')
or
(rsda.loaddt in ('2011-11-05') and rsda.pnrcreatedate>='2011-11-02')
)
and intrapnrsetnbr=0 and ticketnbr is not null
and resarractioncode in ('TA', 'TE', 'TK', 'TM', 'TR', 'TV') and gre.loaddt is null
and rsda.loaddt='2011-11-03'

Question information

Language:: English Edit question

Status:: Solved

For:: Stado Edit question

Assignee:: No assignee Edit question

Solved by:: Jim Mlodgenski

Solved:: 2012-03-22

Last query:: 2012-03-22

Last reply:: 2012-03-22

Link existing bug

Revision history for this message

Jim Mlodgenski (jim-cirrusql) said on 2012-03-21:

Unfortunatly, Stado does not support the PostgreSQL window function yet so the part of your query "first_value(ptt.cdg_tipo) over (partition" will not work. This is something we're considering adding in the near future.

Revision history for this message

Cesar Leon (cesar-leon) said on 2012-03-21:

Hi Jim, thanks very much for your quick reply. I understand that stado is still being developed and window function is on the TODO list, but I mean, In the case that the table is partitioned among several databases. What will happen when the user run this query (or any query unsupported by stado)? Stado will return an error or it will join all the data in one node and then it execute the query? (or something like that)

Best regards.
César.

Revision history for this message

Jim Mlodgenski (jim-cirrusql) said on 2012-03-22:

If you run a query similar to yours, Stado will push down all of the joins out to the individual nodes and run the query in parallel across all of the nodes in your cluster. The efficiency depends on how you define your schema. In Stado, you have the ability to define how your tables are partitioned across your cluster. Take a look at the planning guide to get a better understanding.
http://www.cirrusql.com/community/stado/doc/plan_guide#schema

Revision history for this message

Cesar Leon (cesar-leon) said on 2012-03-22:

Ok Jim, thanks very much.

To post a message you must log in.

Ask a question

Edit question

Stado

Stado - General question about parallel queries

Question information

Related bugs

Related FAQ:

Subscribers