Select with outliers

From DB Optimizer
Jump to: navigation, search

Tuning - select with outlier (oracle)

from Richard Foote's example

   drop table hist_test;
   create table hist_test (
            id1 number, 
            id2 number, 
            id3 number);
   insert into hist_test (id1, id2, id3) 
                       select rownum, 
                       mod(rownum,10)+1, 
                       100 
          from dual connect by level <= 1000000;
   update hist_test set id1=1000000000 where id1=1000000;
   alter table hist_test add primary ky(id1);
   exec DBMS_STATS.GATHER_TABLE_STATS (null, 'HIST_TEST');


 Database doesn't  know that there is only one extreme  outlier  thus the database thinks that 
            select * from hist_test where id1 > 1000000;
 will return the majority of rows based on the bad assumption that the values are evenly distributed between
               1 to 1,000,000,000 
 thus the optimizer thinks the predicate 
              id1  > X
  will return  
                100% * ( 1 -  (  X / total_number_rows )  = 100% * ( 1 - ( 1,000,000/  1,000,000,000 ) = 99.9 % of the rows
               

and the optimizer then thinks it's cheaper to do a FULL TABLE SCAN , but that's based on the false assumption of "Even Distribution"


The actual distribution is all grouped together except for one outlier, so in our case the distribution looks like


The actual result of

            select * from hist_test where id1 > 1000000;

only returns 1 row out of 1,000,000 which is

             100% * 1/1000000 or 0.0001%  of the rows

And the optimizer should have accessed the data via an INDEX RANGE SCAN based on correct statistics from histogram