# Select with outliers

# Tuning - select with outlier (oracle)

from Richard Foote's example

drop table hist_test;

create table hist_test ( id1 number, id2 number, id3 number);

insert into hist_test (id1, id2, id3) select rownum, mod(rownum,10)+1, 100 from dual connect by level <= 1000000;

update hist_test set id1=1000000000 where id1=1000000;

alter table hist_test add primary ky(id1);

exec DBMS_STATS.GATHER_TABLE_STATS (null, 'HIST_TEST');

Database doesn't know that there is only one extreme outlier thus the database thinks that

select * from hist_test where id1 > 1000000;

will return the majority of rows based on the bad assumption that the values are evenly distributed between

1 to 1,000,000,000

thus the optimizer thinks the predicate

id1 > X

will return

100% * ( 1 - ( X / total_number_rows ) = 100% * ( 1 - ( 1,000,000/ 1,000,000,000 ) = 99.9 % of the rows

and the optimizer then thinks it's cheaper to do a FULL TABLE SCAN , but that's based on the false assumption of "Even Distribution"

The actual distribution is all grouped together except for one outlier, so in our case the distribution looks like

The actual result of

select * from hist_test where id1 > 1000000;

only returns 1 row out of 1,000,000 which is

100% * 1/1000000 or 0.0001% of the rows

And the optimizer should have accessed the data via an INDEX RANGE SCAN based on correct statistics from histogram