Friday, March 30, 2012

order of columns in composite index

I'm know that for composite indexes, the recommendation is always to specify
the most selective column first, but can someone please explain why this
makes such a huge difference in perfomance' I have a table where column A
only has 1 unique value and column B is basically unique among all rows. A
query that has A and B in the WHERE clause takes ALOT longer if my composite
index was created with (A, B) instead of (B, A). Thanks for any help.
BobBy the way, I did verify that in both cases, my index is being used, since I
know that index statistics are gathered based on the first column.
"Bob Gabor" <rjg@.mindspring.com> wrote in message
news:Yclkf.8556$N45.2454@.newsread1.news.atl.earthlink.net...
> I'm know that for composite indexes, the recommendation is always to
> specify the most selective column first, but can someone please explain
> why this makes such a huge difference in perfomance' I have a table
> where column A only has 1 unique value and column B is basically unique
> among all rows. A query that has A and B in the WHERE clause takes ALOT
> longer if my composite index was created with (A, B) instead of (B, A).
> Thanks for any help.
> Bob
>|||It depends on the query. Rules like the one you quote are just
general guidelines. For example, if one of the columns is used
in a range or LIKE comparison, it may be best to index that column
first regardless of selectivity. Depending on the query and data,
the two-column statistics may be more or less accurate predictors
of row count for the index in one order than in the other, also.
If you look at the query plans in more detail, you may be able
to see whether the faster solution is resulting in a better plan that
the other ordering can't allow, or if the faster solution is a result
of better row count estimates.
Steve Kass
Drew University
Bob Gabor wrote:

>I'm know that for composite indexes, the recommendation is always to specif
y
>the most selective column first, but can someone please explain why this
>makes such a huge difference in perfomance' I have a table where column A
>only has 1 unique value and column B is basically unique among all rows. A
>query that has A and B in the WHERE clause takes ALOT longer if my composit
e
>index was created with (A, B) instead of (B, A). Thanks for any help.
>Bob
>
>|||>> is always to specify
the most selective column first, but can someone please explain why
this
makes such a huge difference in perfomance' <<
there is only one hard and fast rule in our trade:
there are no hard and fast rules in database programming.
;)
For instance, if you frequently join on some column, putting it first
frequently speeds up joins.|||It does depend on many things, but I can give you some insight into why this
might be a problem (though I can not say it is necessarily a problem for
your application).
If you have a 2-column index with a non-selective leading column and a very
selective secondary column, it can cause additional I/O when compared to a
query run over an index with the columns defined in the opposite order
(selective column first). If the query does a s on the first column and
then later columns, this could be less efficient.
SELECT col1, col2 FROM Table WHERE col1=4 and col3 > 5;
I will point out that it can vary from database engine to database engine.
It can vary on the predicates being used. It can vary based on the mix of
queries and the hardware. In short, it really does depend. However, it is
generally good to index selective fields since the cost of searching and the
cost of maintaining these indexes in updates is less than non-selective
columns.
Another reason to potentially pick a more selective leading index column,
all other factors being equal, is that SQL Server builds histograms on the
leading column. If it is very unselective, this can make the process of
cardinality estimation more difficult for the optimizer. This could cause
errors that lead to less than optimal plans being picked in some cases.
I hope that this gives you some insights into the internals to understand
why it might matter.
Thanks,
Conor Cunningham
SQL Server Query Optimization Development Lead
"Alexander Kuznetsov" <AK_TIREDOFSPAM@.hotmail.COM> wrote in message
news:1133653242.800468.130470@.g47g2000cwa.googlegroups.com...
> the most selective column first, but can someone please explain why
> this
> makes such a huge difference in perfomance' <<
> there is only one hard and fast rule in our trade:
> there are no hard and fast rules in database programming.
> ;)
> For instance, if you frequently join on some column, putting it first
> frequently speeds up joins.
>

No comments:

Post a Comment