Tuning - select with outlier (oracle)

from Richard Foote's example

   drop table hist_test;

   create table hist_test (
            id1 number, 
            id2 number, 
            id3 number);

   insert into hist_test (id1, id2, id3) 
                       select rownum, 
                       mod(rownum,10)+1, 
                       100 
          from dual connect by level <= 1000000;

   update hist_test set id1=1000000000 where id1=1000000;

   alter table hist_test add primary ky(id1);

   exec DBMS_STATS.GATHER_TABLE_STATS (null, 'HIST_TEST');

 Database doesn't  know that there is only one extreme  outlier  thus the database thinks that

            select * from hist_test where id1 > 1000000;

 will return the majority of rows based on the bad assumption that the values are evenly distributed between

               1 to 1,000,000,000

 thus the optimizer thinks the predicate

              id1  > X

  will return

                100% * ( 1 -  (  X / total_number_rows )  = 100% * ( 1 - ( 1,000,000/  1,000,000,000 ) = 99.9 % of the rows

and the optimizer then thinks it's cheaper to do a FULL TABLE SCAN , but that's based on the false assumption of "Even Distribution"

The actual distribution is all grouped together except for one outlier, so in our case the distribution looks like

The actual result of

            select * from hist_test where id1 > 1000000;

only returns 1 row out of 1,000,000 which is

             100% * 1/1000000 or 0.0001%  of the rows

And the optimizer should have accessed the data via an INDEX RANGE SCAN based on correct statistics from histogram

Select with outliers

Tuning - select with outlier (oracle)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools