mysql - Filtering Large DB record grouped by a column -


transactionhistory table:

    create table `transactionhistory` (       `id` varchar(200) not null,       `transactiontype` varchar(200) default null,       `startdate` bigint(20) default null,       `completiondate` bigint(20) default null,       `userid` varchar(200) default null,       `status` varchar(200) default null,       `error_code` varchar(200) default null,       `transactionumber` varchar(200) default null,       primary key (`id`),       key `transactionumber_index` (`transactionumber`)     ) engine=innodb default charset=latin1; 

user table:

    create table `user` (       `userid` varchar(200) not null,       `name` varchar(200) default null,       primary key (`userid`),       key `userid_index` (`userid`)     ) engine=innodb default charset=latin1; 

scenario:

  • group transactionhistory transactionumber
    • if groupsize == 1,
      • display value in transactiontype, startdate, completiondate, status, error_code
    • if groupsize > 1
      • display '' transactiontype
      • display min startdate, , max startdate
      • for status , error_code
        • display status = success, error_code = '0' if status in group = success,
        • display status = failed, error_code = '99' if status in group = failed,
        • display status = warning, error_code = '-1' if mixed
    • display name of username (if transaction has userid)

i came query:

    select tx.id,          case when count(*) = 1 transactiontype else '' end transactiontype,         case when count(*) = 1 status else (              case when count(case when status = 'success' 1 end) = 0 'failed'              when count(case when status = 'failed' 1 end) = 0 'success'              else 'warning' end) end status,         case when count(*) = 1 error_code else (              case when count(case when status = 'success' 1 end) = 0 '99'              when count(case when status = 'failed' 1 end) = 0 '0'              else '-1' end) end status         max(completiondate) completiondate,          min(startdate) startdate,         a.userid, a.name,         transactionumber     transactionhistory tx left join user on tx.userid = a.userid      group transactionumber     limit 0, 20 //pagination 

however if need add filtering, query takes long complete. read faster put filter before group instead of having, cannot filter status , error_code correctly warning , -1 values present after group by

    having status = 'warning' 

also if need count total number of grouped entries, takes long.

my explain shows following

    select_type: simple     table: tx     type:     possible_keys: null     key_len: null     ref: null     rows: 1140654     extra: using temporary; using filesort      select_type: simple     table: e     type: eq_ref     possible_keys: primary,id_index     key_len: 202     ref: db.tx.userid     rows: 1     extra: using    

count(case when status = 'success' 1 end) 

can shortened

sum(status = 'success') 

these must written in order, , executed in order: where, group by, having. correctly observed having can't turned where.

also if need count total number of grouped entries, takes long.

i don't know mean -- using count(*) several times.

is transactionumber in 1:1 relationship id? if not, group by invalid.

you don't have order by, (technically), limit ill-defined.

run explain select ... see how optimizer performing query.

here's technique might -- deferring join. first, strip mention of user out of query. then, make select subquery in:

select z.id,        z.transactiontype,        ...        a.userid, a.name,        z.transactionumber ( select id,                if(count(*) = 1, transactiontype, '') transactiontype,               ...            transactionhistory            group transactionumber            order transactionumber            limit 0, 20      ) z left join user on z.userid = a.userid  

that way, join occur 20 times, not once per row in transactionhistory.

edit

without where clause, optimizer index helps group by. if order by identical group by, can both group by , order by simultaneously. if differ, order by becomes separate sort step.

an order by mixed directions (such startdate desc, transactiontype asc) can never use index. destined need tmp table , sort. using startdate desc, transactiontype desc (both desc) work better, without changing semantics much.

if optimizer cannot use index both group by , order by, must gather all rows , sort them before applying limit.

with 1140654 rows, want try hard have query , index let optimizer make way through order by -- needs @ 20 rows, not 1140654. my pagination blog goes of that.

explain may "using temporary, using filesort". group by and/or order by. however, hides case needs two sorts, 1 group by, 1 order by. explain format=json make clear when multiple sorts needed.

still, "filesort" in not evil. real performance killer needing work 1140654 rows instead of 20.


Comments