Select with outliers
Tuning - select with outlier (oracle)
from Richard Foote's example
drop table hist_test;
create table hist_test ( id1 number, id2 number, id3 number);
insert into hist_test (id1, id2, id3) select rownum, mod(rownum,10)+1, 100 from dual connect by level <= 1000000;
update hist_test set id1=1000000000 where id1=1000000;
alter table hist_test add primary ky(id1);
exec DBMS_STATS.GATHER_TABLE_STATS (null, 'HIST_TEST');
Database doesn't know that there is only one extreme outlier thus the database thinks that
select * from hist_test where id1 > 1000000;
will return the majority of rows based on the bad assumption that the values are evenly distributed between
1 to 1,000,000,000
thus the optimizer thinks the predicate
id1 > X
will return
100% * ( 1 - ( X / total_number_rows ) = 100% * ( 1 - ( 1,000,000/ 1,000,000,000 ) = 99.9 % of the rows
and the optimizer then thinks it's cheaper to do a FULL TABLE SCAN , but that's based on the false assumption of "Even Distribution"
The actual distribution is all grouped together except for one outlier, so in our case the distribution looks like
The actual result of
select * from hist_test where id1 > 1000000;
only returns 1 row out of 1,000,000 which is
100% * 1/1000000 or 0.0001% of the rows
And the optimizer should have accessed the data via an INDEX RANGE SCAN based on correct statistics from histogram