I got a reply from Microsoft which says that technique 2 (INSERT SELECT into
table with IDENTITY) is in fact guaranteed, while all the others aren't.
Cheers,
--
BG, SQL Server MVP
www.SolidQualityLearning.com
"Questar" <Questar@.newsgroup.nospam> wrote in message
news:EDC4A261-24F6-4891-9EBA-DE081E31B810@.microsoft.com...
> Thanks for the follow-up!
> "Itzik Ben-Gan" wrote:
>Interesting, I will definitely keep it in mind. Thanks for all your
efforts. It would definitely be worth an INFO article in Microsoft's
knowledge base...
Gert-Jan
Itzik Ben-Gan wrote:
> Well, it seems like your persistency was worthwhile!
> I got a reply from Microsoft which says that technique 2 (INSERT SELECT in
to
> table with IDENTITY) is in fact guaranteed, while all the others aren't.
> Cheers,
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
> "Questar" <Questar@.newsgroup.nospam> wrote in message
> news:EDC4A261-24F6-4891-9EBA-DE081E31B810@.microsoft.com...|||I completely agree. I know I'm planning to write one. ;-)
BG, SQL Server MVP
www.SolidQualityLearning.com
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:41F56391.55685A2D@.toomuchspamalready.nl...
> Interesting, I will definitely keep it in mind. Thanks for all your
> efforts. It would definitely be worth an INFO article in Microsoft's
> knowledge base...
> Gert-Jan
>
> Itzik Ben-Gan wrote:|||Thank you very much for your role in finding an answer!
"Itzik Ben-Gan" wrote:
> Well, it seems like your persistency was worthwhile!
> I got a reply from Microsoft which says that technique 2 (INSERT SELECT in
to
> table with IDENTITY) is in fact guaranteed, while all the others aren't.
> Cheers,
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
>
> "Questar" <Questar@.newsgroup.nospam> wrote in message
> news:EDC4A261-24F6-4891-9EBA-DE081E31B810@.microsoft.com...
>
>|||I just ran across this thread and I wanted to share my experiences.
Where I work, we are often performing very large data-mining type of
queries. Often on the order of 100's of millions of rows. Using INSERT
statement for result sets are much to slow. So we normally use SELECT
INTO based table creation.
Since our processing often requires the creation of an ordered identity
column to process the data we have had to try to find solutions that
will allow for SELECT INTO using the IDENTITY function with and ORDER BY
clause. We tried using MAXDOP 1 as a hint to fix the problem and found
that unreliable.
Then we tried using derived tables using an ORDER BY clause with a TOP
100 PERCENT call. This seemed to often fix the problem but not reliably.
Our final solution that seems to work for us is to use derived table
with MAXDOP 1.
Here is an example:
set nocount on
use tempdb
go
drop table ta, tb, #t, #t2
go
create table ta(a int not null)
insert into ta values(3)
insert into ta values(1)
insert into ta values(2)
create table tb(b int not null primary key)
insert into tb values(4)
insert into tb values(1)
insert into tb values(3)
insert into tb values(2)
select b, identity(int, 1, 1) as rn
into #t
from ta join tb on a = b
order by b
select * from #t order by b
b rn
-- --
1 2
2 3
3 1
select b,
identity(int, 1, 1) as rn
into #t2
from (Select top 100 percent
*
from ta
inner join tb
on a = b
order by b) lu
order by b
option (maxdop 1)
select * from #t2 order by b
b rn
-- --
1 1
2 2
3 3
If you try this you will most likely get the same answer with or without
using the MAXDOP hint. I have found that when joining multiple tables
that have clustered indexes require the MAXDOP hint
*** Sent via Developersdex http://www.examnotes.net ***
Don't just participate in USENET...get rewarded for it!|||Any of these results are undefined so there is some level of risk if
you rely on these behaviours in a production environment. It may appear
to work most of the time, it may even work ALL of the time, but it
could still break under a future schema change, new product version,
service pack or hotfix. You have to decide if the benefit you derive
from this method is worth the risk of any possible future impact. Since
the risk is very hard to quantify I would tend to avoid it unless the
potential impact was quite small.
Here's what Microsoft says:
"Even though this is not guaranteed, there are scenarios where you
might see the IDENTITY function generating identity values in the same
order as defined by the ORDER BY column. This is purely coincidental
and should not be considered the expected behavior."
http://support.microsoft.com/defaul...kb;en-us;273586
--
David Portas
SQL Server MVP
--|||I would agree with completely on this issue. I often find myself running
different tests over and over each time a service pack or hotfix is
installed. Unfortunately SQL SERVER's behavior is sometimes
unpredictable when trying to execute certain types of queries. This is
just one of those types.
Regarding the queries I posted earlier I am including the showplan_all
report.
select b, identity(int, 1, 1) as rn
into #t
from ta join tb on a = b
order by b
StmtText
---
---
--
|--Table Insert(OBJECT:([#t]), SET:([#t].[rn]=[Expr1005],
[#t].[b]=[tb].[b]))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1005]=setidentity([E
xpr1004],
-7, 0, '#t')))
|--Sort(ORDER BY:([tb].[b] ASC))
|--Compute
Scalar(DEFINE:([Expr1004]=getidentity(-7, 0, '#t')))
|--Nested Loops(Inner Join, OUTER
REFERENCES:([ta].[a]))
|--Table
Scan(OBJECT:([tempdb].[dbo].[ta]))
|--Clustered Index
S

SEEK:([tb].[b]=[ta].[a]) ORDERED FORWARD)
---
select b,
identity(int, 1, 1) as rn
into #t2
from (Select top 100 percent
*
from ta
inner join tb
on a = b
order by b) lu
order by b
option (maxdop 1)
StmtText
---
---
--
|--Table Insert(OBJECT:([#t2]), SET:([#t2].[rn]=[Expr1005],
[#t2].[b]=[tb].[b]))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1005]=setidentity([E
xpr1004],
-7, 0, '#t2')))
|--Compute Scalar(DEFINE:([Expr1004]=getidentity(-7, 0,
'#t2')))
|--Sort(ORDER BY:([tb].[b] ASC))
|--Nested Loops(Inner Join, OUTER
REFERENCES:([ta].[a]))
|--Table
Scan(OBJECT:([tempdb].[dbo].[ta]))
|--Clustered Index
S

SEEK:([tb].[b]=[ta].[a]) ORDERED FORWARD)
As you can see in the second query, the getidentity function is called
after the sort. Unlike the first query where the getidentity is called
prior to the sort.
In the end, if we can find a repeatable process that saves us hours over
using logged transactions, we will. As it is now, I have processes that
run for 10+ days. So save even a few percent in processing time is very
important.
*** Sent via Developersdex http://www.examnotes.net ***
Don't just participate in USENET...get rewarded for it!|||I would agree with completely on this issue. I often find myself running
different tests over and over each time a service pack or hotfix is
installed. Unfortunately SQL SERVER's behavior is sometimes
unpredictable when trying to execute certain types of queries. This is
just one of those types.
Regarding the queries I posted earlier I am including the showplan_all
report.
select b, identity(int, 1, 1) as rn
into #t
from ta join tb on a = b
order by b
StmtText
---
---
--
|--Table Insert(OBJECT:([#t]), SET:([#t].[rn]=[Expr1005],
[#t].[b]=[tb].[b]))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1005]=setidentity([E
xpr1004],
-7, 0, '#t')))
|--Sort(ORDER BY:([tb].[b] ASC))
|--Compute
Scalar(DEFINE:([Expr1004]=getidentity(-7, 0, '#t')))
|--Nested Loops(Inner Join, OUTER
REFERENCES:([ta].[a]))
|--Table
Scan(OBJECT:([tempdb].[dbo].[ta]))
|--Clustered Index
S

SEEK:([tb].[b]=[ta].[a]) ORDERED FORWARD)
---
select b,
identity(int, 1, 1) as rn
into #t2
from (Select top 100 percent
*
from ta
inner join tb
on a = b
order by b) lu
order by b
option (maxdop 1)
StmtText
---
---
--
|--Table Insert(OBJECT:([#t2]), SET:([#t2].[rn]=[Expr1005],
[#t2].[b]=[tb].[b]))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1005]=setidentity([E
xpr1004],
-7, 0, '#t2')))
|--Compute Scalar(DEFINE:([Expr1004]=getidentity(-7, 0,
'#t2')))
|--Sort(ORDER BY:([tb].[b] ASC))
|--Nested Loops(Inner Join, OUTER
REFERENCES:([ta].[a]))
|--Table
Scan(OBJECT:([tempdb].[dbo].[ta]))
|--Clustered Index
S

SEEK:([tb].[b]=[ta].[a]) ORDERED FORWARD)
As you can see in the second query, the getidentity function is called
after the sort. Unlike the first query where the getidentity is called
prior to the sort.
In the end, if we can find a repeatable process that saves us hours over
using logged transactions, we will. As it is now, I have processes that
run for 10+ days. So save even a few percent in processing time is very
important.
*** Sent via Developersdex http://www.examnotes.net ***
Don't just participate in USENET...get rewarded for it!
No comments:
Post a Comment