Understand METHOD OPT Param for Stats in Oracle

In Oracle Statistics about database objects like Table, Index etc. is Core component for Cost based Optimizer for better Plan Estimation.
It is the basic ingredient that decides different SQL Plan component (Access path / Join Order / Cardinality / Join Method and more..) in parsing phase.

Oracle provided DBMS_STATS.* Utility package to Gather and manage database object statistics.

One of the Core and important parameter for stats gathering package is “METHOD_OPT”, but it is one of misunderstood parameter for stats gathering.
Most of Folks think it is used only to Gather Histogram in oracle, i.e. Capture Skewness of Column Data but that’s not entirely True.

Actually, it is used to Gather below related stats.
1. Base statistics about a column for e.g. Number of distinct, low value, high value, number of nulls etc.
2. Histogram on column
3. Extended Statistics.

It’s important to understand it also gather Base Statistics column, which is important for cardinality estimations.

Let’s start first by understanding its Syntax

Option 1 : FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
Option 2 : FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]…]

[size_clause] — Decide Buckets and Whether Histogram need to be Gathered or Not.

IF [size_clause] == 1
THEN
Only Gather Base Statistics and "NO" Histogram.

IF [size_clause] > 1 OR AUTO
THEN
Gather Base Statistics and histogram with number of buckets specified.

Let walk through an example to understand it better.

DROP TABLE Method_Opt_Test1;

CREATE TABLE Method_Opt_Test1 AS
(
SELECT ROWNUM Pk_Col1,
TRUNC(DBMS_RANDOM.VALUE(1,10)) Skew1,
TRUNC(DBMS_RANDOM.VALUE(1,100)) Skew2,
TRUNC(DBMS_RANDOM.VALUE(1,1000)) Skew3,
TRUNC(DBMS_RANDOM.VALUE(1,10000)) Skew4,
TRUNC(DBMS_RANDOM.VALUE(1,10000)) Skew5
FROM DUAL CONNECT BY ROWNUM < 100000
);

When “METHOD_OPT” param is not passed for STATS Gathering using DBMS_STATS.* Package, Oracle picks Default PARAM Set.
Default Option can be checked using below Utility Functions.

select DBMS_STATS.GET_PARAM('METHOD_OPT') from dual;
Output : FOR ALL COLUMNS SIZE AUTO

AUTO : Gather Base statistics for all column and histogram on columns capture in Column Usage View “SYS.COL_USAGE$

1 : FOR ALL COLUMNS SIZE 1
Gather Base Stats for all Table column and no Histogram creation for same.

BEGIN
DBMS_STATS.GATHER_TABLE_STATS(USER, 'METHOD_OPT_TEST', ESTIMATE_PERCENT =>100,CASCADE=> TRUE, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1' );
END;


Column LOW_VAL Format A20
Column HIGH_VAL Format A20
Column Column_Name Format A20
SELECT
COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS, LAST_ANALYZED,
Raw_To_Num(Low_Value) Low_Val, -- For "Raw_To_Num" Function please check below note.
Raw_To_Num(HIGH_VALUE) HIGH_VAL
FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME='METHOD_OPT_TEST1';

Please Note :
LOW_VALUE and HIGH_VALUE implies Lowest and Highest Column value in the table , it is in Raw format.
We can convert into normal text using oracle supplied utility procedure “DBMS_STATS. CONVERT_RAW_VALUE”.

Let Create a Function RAW_TO_Num to transform RAW to NUMBER

Create Or Replace Function Raw_To_Num(I_Raw Raw)
return NUMBER
As
m_n NUMBER;
Begin
dbms_stats.convert_raw_value(i_raw,m_n); -- Overload Procedure Exists with m_n Argument of three different datatype NUMBER/DATE/VARCHAR2
Return M_N;
End;

SQL OUTPUT :

COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED LOW_VAL HIGH_VAL
------------ ------------ --------- ----------- ------------- --------- --------
PK_COL1 99999 NONE 1 12-04-15 1 99999
SKEW1 9 NONE 1 12-04-15 1 9
SKEW2 99 NONE 1 12-04-15 1 99
SKEW3 999 NONE 1 12-04-15 1 999
SKEW4 9999 NONE 1 12-04-15 1 9999
SKEW5 9999 NONE 1 12-04-15 1 9999

Base Statistics like NUM_DISTINCT, LOW_VAL, HIGH_VAL , NUM_NULLS and more were Gathered and Capture in Stats Dictionary tables with no Histogram.

Before moving forward to another example lets delete the Stats gather on table.

BEGIN
DBMS_STATS.DELETE_TABLE_STATS(USER, 'METHOD_OPT_TEST1');
END;

Example 2 : Gather Base Stats with Histogram on Specific columns.

In Most of DB Environment, DBA or Performance Analyst like to Control the Columns on which Histogram need to be Gathered instead of relying on AUTO (COL_USAGE$) Options.
In Such Scenario it is important to understand to segregate column base on what statistics need to be gathered only base stats or base stats with histogram.

In METHOD_OPT Param we require to Specify Both
1. Columns on Which Histogram need to be Gather and
2. Columns with Histogram.

Suppose we want to gather histogram only for columns SKEW1/SKEW2/SKEW3.

The appropriate method to do so will be,

METHOD_OPT => ‘FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 256 SKEW1 SKEW2 SKEW3’

Initial Clause, “FOR ALL COLUMNS SIZE 1” Specify Gather Base Statistics on all Column.
Subsequent clause, “FOR COLUMNS SIZE 256 SKEW1 SKEW2 SKEW3” Specify columns on which histogram need to be gathered.

Begin
Dbms_Stats.Gather_Table_Stats(User, 'METHOD_OPT_TEST1', Estimate_Percent =>100,Cascade=> True, Method_Opt => 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 256 SKEW1 SKEW2 SKEW3' );
END;

Stats SQL output :

COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED LOW_VAL HIGH_VAL
----------- ------------ --------------- ----------- ------------- ------- ----------
SKEW5 9999 NONE 1 12-04-15 1 9999
SKEW4 9998 NONE 1 12-04-15 1 9999
SKEW3 999 HEIGHT BALANCED 256 12-04-15 1 999
SKEW2 99 FREQUENCY 99 12-04-15 1 99
SKEW1 9 FREQUENCY 9 12-04-15 1 9
PK_COL1 99999 NONE 1 12-04-15 1 99999

On Specified Columns SKEW1 SKEW2 SKEW3 along with base statistics, histogram was gathered.
Based on Bucket size specified and Number of distinct values of columns, specific histogram type is created.

Specifying in such ways, we control both columns with histogram and without histogram.

3. METHOD_OPT => “FOR ALL INDEXED COLUMNS” implication.
Now one of the most important considerations, Why we Should not Use “FOR ALL INDEXED COLUMNS” option for METHOD_OPT.

As of now we understand that METHOD_OPT is not only use for Gathering Histogram but also Base Statistics
Let understand the implication of FOR ALL INDEXED COLUMNS and thinking it will only impact creation of histogram and no underlying base statistics.

When we specify such options we are telling,
Gather only “BASE” Statistics and Histogram for Columns which are indexed. So by Specifying such options we are completely ignoring or not gathering base Stats on Non Indexed Columns.
So SQL which is having Predicate as Non indexed Columns will be impacted.

Let run through an example.

–begin with deleting stats
Begin
DBMS_STATS.DELETE_TABLE_STATS(User, 'METHOD_OPT_TEST1');
end;

Create index on columns Pk_Col1,Skew4,Skew5

CREATE INDEX IDX_PK_COL ON METHOD_OPT_TEST1(PK_COL1);
CREATE INDEX IDX_SKEW4 ON METHOD_OPT_TEST1(SKEW4);
CREATE INDEX IDX_SKEW5 ON METHOD_OPT_TEST1(SKEW5);

Now lets Gather Stats with METHOD_OPT => ‘FOR ALL INDEXED COLUMNS’

BEGIN
DBMS_STATS.GATHER_TABLE_STATS(USER, 'METHOD_OPT_TEST1', ESTIMATE_PERCENT =>100,CASCADE=> TRUE, METHOD_OPT=> 'FOR ALL INDEXED COLUMNS' );
END;

Now lets check columns stats on underlying table.

Column LOW_VAL Format A20
Column HIGH_VAL Format A20
Column Column_Name Format A20
SELECT COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS, LAST_ANALYZED,
Raw_To_Num(Low_Value) Low_Val,
Raw_To_Num(HIGH_VALUE) HIGH_VAL
FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME='METHOD_OPT_TEST1';

COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED LOW_VAL HIGH_VAL
----------- ------------ --------------- ----------- ------------- ------- ---------
SKEW5 9999 HEIGHT BALANCED 75 12-04-15 1 9999
SKEW4 9998 HEIGHT BALANCED 75 12-04-15 1 9999
PK_COL1 99999 HEIGHT BALANCED 75 12-04-15 1 99999

Observations :
1. Base Stats are present only for Indexed Column and missing for non indexed columns.
2. Histogram is created on a Unique column value “PK_COL1”, which is unnecessary as columns has unique distribution.
NUM_ROWS =: NUM_DISTINCT
3. Cardinality Estimate for Non indexed columns will be far offset from actual rows.

Let take an example,

Select /*+ GATHER_PLAN_STATISTICS */ count(*) from METHOD_OPT_TEST1 where SKEW3 = 250;

Output :
COUNT(*)
----------
106

select * from table(dbms_xplan.display_cursor(FORMAT=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------
SQL_ID bvmdabj2h5adb, child number 0
-------------------------------------
Select /*+ GATHER_PLAN_STATISTICS */ count(*) from METHOD_OPT_TEST1
where SKEW3 = 250

Plan hash value: 4260537176
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 389 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 389 |
|* 2 | TABLE ACCESS FULL| METHOD_OPT_TEST1 | 1 | 1000 | 106 |00:00:00.01 | 389 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("SKEW3"=250)

Due to no presence of Stats, Estimate is 10X offset from actual rows.
AS no stats are present, estimate is base on default 1% of NUM_ROWS, 1% of 99999 := 1000 Approx.

Such Scenario can have severe impact on SQL performance due to no presence of Stats and should be avoided.

4. Maximum Bytes limit for Histogram.

In one of tuning cases, Though dataset was Skew but still Histogram was not functional as required.
On Further analysis we found it was due to Byte limit set on columns data for histogram calculation, bytes limit is the maximum bytes consider for gathering histogram on column data.

Lets understand it better with an Example.

DROP TABLE METHOD_OPT_TEST2;

Create Table Method_Opt_Test2 As
Select Rownum Pk_Col , Decode(Mod(Rownum,20),0,Lpad('1',65,'A'),Lpad('2',65,'A')) Skew1
from dual connect by rownum < 100000;

SELECT SKEW1 ,LENGTHB(SKEW1), COUNT(*)
FROM METHOD_OPT_TEST2
GROUP BY SKEW1,LENGTHB(SKEW1) ORDER BY 3 DESC;

SKEW1 LENGTHB(SKEW1) COUNT(*)
----------------------------------------------------------------- -------------- ----------
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2 65 95000
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1 65 4999

Column “SKEW1” is Skew, but initial 64 bytes are same.

Now lets create histogram on same.


BEGIN
DBMS_STATS.GATHER_TABLE_STATS(USER, 'METHOD_OPT_TEST2', ESTIMATE_PERCENT =>100,CASCADE=> TRUE, METHOD_OPT=> 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 75 SKEW1' );
END;

Stats output for column SKEW1 :

COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------
SKEW1 2 FREQUENCY 1 13-04-15

LOW_VAL
----------------------------------------------------------------
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

HIGH_VAL
----------------------------------------------------------------
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Observations :
1. Histogram is created but only with 1 Bucket, though NUM_DISTINCT are 2.
2. Low and high value for column “SKEW1” is same “AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA”
3. Only First 64 Bytes was consider for histogram.

Please note :
Bytes limit for histogram has been upgraded to 64 Bytes in Oracle 12c.
Pre 12c it was set to 32 Bytes.

4.
Though we have histogram for such dataset, but it will be for no use.

Select /*+ GATHER_PLAN_STATISTICS */ count(*) from METHOD_OPT_TEST2
where SKEW1 = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2';

Output :
COUNT(*)
----------
4999


select * from table(dbms_xplan.display_cursor(FORMAT=>'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID a7r7wuqx61gg5, child number 0
-------------------------------------
Select /*+ GATHER_PLAN_STATISTICS */ count(*) from METHOD_OPT_TEST2
where SKEW1 = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1' ;

Plan hash value: 1003180425
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 1057 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.02 | 1057 |
|* 2 | TABLE ACCESS FULL| METHOD_OPT_TEST2 | 1 | 99999 | 4999 |00:00:00.02 | 1057 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("SKEW1"='AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1')

Estimated Rows is far offset from actual, though histogram is present.

If Initial set of character upto bytes limit (64 in 12c, 32 in 11g)for a column data is same and its values only varies post that, histogram on such columns will not be effective.

All Example are test on Oracle 12c version.

About Deepak Mahto

Having 8+ Years of relevant/exciting/fruitful/challenging/learning in Oracle Technology. Currently working as Oracle Performance Consultant in an MNC. Hold Expertise in Performance Engineering, SQL Tuning, Database Tuning, SQL, PLSQL Development/Design Considerations and Oracle Internals. I enjoy reading on Oracle internals, posting some of my finding or observations and playing football.
This entry was posted in Oracle Internal and tagged , , , , . Bookmark the permalink.

4 Responses to Understand METHOD OPT Param for Stats in Oracle

  1. Pingback: Histogram on Function based Index hidden column. | Oracle Insight and Performance concepts

  2. siddharth samant says:

    Nice article Deepak

    Like

  3. sagar says:

    Thanks Deepak !!this would be very informative and useful post.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s