Thursday, November 8, 2007

OPT_PARAM

I had no intention of getting into the depth of optimizer_index_caching but since you brought up this topic -

Let me start of with a worldly example..........

Though CBO is a very powerful algorithm it cannot differentiate types of SQL statements, the speed of the disks and the load on the CPUs etc. All these parameters influence the execution plan of a statement. For example, the best execution plan at 9 in the morning for a 2 CPU server may be quite different at 3 in the evening when the system is near threshold.

Given this circumstance here are a few parameters that can be tweaked for better performance ....
optimizer_index_cost_adj: This allows the optimizer to guage whether FTS is faster OR index scans. A value of 100 makes them equal, a lower value favors the index over the FTS
optimizer_index_caching: lets the CBO know how much index has been cached in the RAM data buffer cache. To evaluate a good value - allocate anywhere between 4k - 32k buffer on RAM, create a tb space, rebuild the index under consideration into this tb space. This way we forced an index onto separate tablespace and index buffer

get the value for the db cache size from v$parameter, subtract it with blocks from dba/user/all_segments - this would give an idea of how much index would be present in the index buffer
optimizer_max_permutations: A very straightforward parameter - forces CBO to pick a table join order when a query has multiple tables after # of permutations. This parameter is a hidden parameter starting 10g, hence prefixed with _


..........To be continued..........

vikram kulkarni wrote:
Hey
Sorry this is a long post from me, but i thought it can be shared:

/*+ opt_param('optimizer_index_caching' , 90) */
where does this 90 come from? How does one know what is the correct value of the optimizer_index_caching?
The default value for the optimizer_index_caching parameter is 0 - which gives the highest preference to hash joins and sort merge joins.Setting optimizer_index_caching to a higher percentage makes nested loop joins look less expensive to the optimizer, which will be more likely to pick nested loop joins over hash or sort merge joins.setting values close to 100 encourages the use of indexes over tablescans.

The cost of executing a nested loop join where an index is used to access the inner table is highly dependent on the caching of that index in the buffer cache. The amount of index caching depends on factors, such as the load on the system and the block access patterns of different users, that the optimizer cannot predict.

Set the optimizer_index_caching to something in the region of the "buffer cache hit ratio." (You have to make your own choice about whether this should be the figure derived from the default pool, keep pool, or both).

Resetting this parameter can be very dangerous if you are not using stored outlines because it could change the execution plans for thousands of SQL statements. Also, because the cost-based optimizer will generally only invoke sort merge joins when there are no indexes on the joined tables, this parameter has the most effect on the invocation of hash joins.

lets take an example:

Execution Plan ----------------------------------------------------------0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=25 Card=20 Bytes=1160)1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=25 Card=20 Bytes=1160)2 1 INDEX (RANGE SCAN) OF 'T1_I1' (NON-UNIQUE) (Cost=5 Card=20)

In this query, the optimizer chooses an index and reports a cost of 25, with a cardinality of20. We know that the cardinality happens to be a good estimate, but where does the cost comefrom? The first part of the cost comes from line 2 (Cost = 5) for visiting the index, and the nextpart (Cost = 25, an increment of 20) comes from line 1 for visiting the table.
We have the formula for the cost of an index-driven access path ascost =blevel +ceiling(leaf_blocks * effective index selectivity) +ceiling(clustering_factor * effective table selectivity)
so as the parameter optimizer_index_caching is set to a non-zero value, the effective cost of the blevel component of the formula seems to be halved.This makes the cost of the index scan lesser and hence optimiser will be fooled into taking that execution plan.

Note:optimizer_index_caching is used to adjust the cost calculation for index blocks of the inner table innested loops and for the index blocks used during in-list iterators. It is not used in the calculation of costs for simple index unique scans or range scans into a single table.

I think in your next message you need to touch on optimizer_index_cost_adj

----- Original Message ----From: Kiran Kulkarni To: kiran_bk@yahoo.comSent: Wednesday, November 7, 2007 10:05:17 AMSubject: Today's topic - opt_param
I have seen that everytime something like this - be it chalk talk, newsletter etc. after the initial euphoria the exercise eventually peters out. This time I do not want that to happen - due to my super busy schedule, I am changing the frequency to weekly..

As you may know, Oracle provides several parameters that can adjust the behavior of the CBO to make it more like rule-based optimization.

One such parameter is optimizer_index_caching - this parameter is a %ge parameter with valid values between 0 & 100. This parameter lets you adjust the behavior of the cost-based optimizer to select nested loop joins more often or less often. The cost of executing a nested loop join where an index is used to access the inner table is highly dependent on the caching of that index in the buffer cache. Setting optimizer_index_caching to a higher percentage makes nested loop joins look less expensive to the optimizer, which will be more likely to pick nested loop joins over hash or sort merge joins.

The default value for the optimizer_index_caching parameter is 0, which gives the highest preference to hash joins and sort merge joins. Resetting this parameter can be very dangerous if you are not using stored outlines because it could change the execution plans for thousands of SQL statements.

Oracle documentation says "OPTIMIZER_INDEX_CACHING favors using selective indexes. That is, if you use a relatively low value for this parameter, the optimizer effectively models the caches of all non-leaf index blocks. In this case, the optimizer bases the cost of using this index primarily on the basis of its selectivity. Thus, by setting OPTIMIZER_INDEX_CACHING to a low value, you achieve the desired modeling of the index caching without over using possibly undesirable indexes that have poor selectivity."

Ok - back to the topic.....

Sometimes we face bad SQL where we need some session level parameters to be modified. e.g. the optimizer_index_caching.
Using Alter System can impact some other sql's executed within the session, which is not desired. So what we need is a method to modify session level parameter for the specific sql only. With 10g Rel 2 we have the possibility of doing so using the hint opt_param.
Using this hint, session level parameter can be modified locally for that sql only. e.g. /*+ opt_param('optimizer_index_caching' , 90) */

This is documented in (Metalink Note 377333.1), please refer for usage.

As usual comments/suggestions welcome... goto rush to a meeting...

-Kiran.

ps: Happy Diwali to you all - happy bursting..

Effective data slices

Ok. I have not recieved any "Remove me from the list" email. I presume that you read my email &

- said good to know topic
- said let me save this for future reading
- ignored it
- deleted, w/o opening it

In one of the application that I built after quitting Oracle was Archive & Purge tool. This tool picks data from source, performs ETL (if user has defined) & inserts into the target, purges the records from the source. W/o getting too much into this tool (btw. I have 3 customers in various stages of implementation of this tool) I would like to highlight one major issue that I faced - building effective data slice that can be used as Payload for my workers.. I rummaged a lot, for guidelines on the same & happened to stumble upon an article by Tom Kyte.

This query gives you ranges of ROWID's that can be used based on chunks of data (num# of slices)

Here it goes......

http://asktom.oracle.com/pls/asktom/f?p=100:11:4288147952189703::::P11_QUESTION_ID:10498431232211

It took me a while to figure out what the query was actually doing.

However, I could not use this query in one of our implemtations as it just hung the system

- Query ran forever even after the entire schema was analyzed
- The customer had 35G of rollback - no issues with that
- Undo_retention of 36000
- We were trying to run this query on a table that had 127M records
- table size - 18G

I wrote a custom logic to build data slice on PK column after ensuring the table is analyzed & could get them up n running.

If any of you have a better way of building DS pl. free to share it. Any insights on why the query by Tom Kyte failed is also appreciated.

-Kiran.

ANYDATA

I am trying to put forth some interesting topics every alternate day - if you want to share some topic with me pl. feel free to do so.

Today's topic - usage of anydata. While working on my ETL tool, I had a constant need of type casting data but with anydata I am able to store my intermediate results in stg table.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3099475696866

Note: if you do not want me to include you in this list, I will do the needful.

cheers
Kiran.