We have a sliding window scenario where every day we add a new day and trim
off an old day from our partition function and scheme.
I read somewhere that in this scenario it is better to keep Partition1 empty
and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
SWITCHing out Partition1 and then MERGEing Partition 1. Reason being that you
will incurr additional I/O if you do it this way.
Before I test this, can anyone confirm this. We have 30 million row
partitions and can't afford any I/O while we MERGE Partition1. I am of the
opinion that provided Partition1 is empty, the MERGE will always be a
metadata operation only.
-- switch out Partition1 then MERGE
boundary_id row_count
1 30000000
2 30000000
3 30000000
4 30000000
5 30000000
6 30000000
7 30000000
-- or switch out partition2 then MERGE
boundary_id row_count
1 0
2 30000000
3 30000000
4 30000000
5 30000000
6 30000000
7 30000000
which is less I/O? I dont want to be moving data around on disk.
--
-- cranfield, DBA> I read somewhere that in this scenario it is better to keep Partition1
> empty
> and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
> SWITCHing out Partition1 and then MERGEing Partition 1. Reason being that
> you
> will incurr additional I/O if you do it this way.
The partition that includes the specified boundary must be empty in order to
avoid data movement during MERGE.
> -- switch out Partition1 then MERGE
> boundary_id row_count
> 1 30000000
> 2 30000000
> 3 30000000
> 4 30000000
> 5 30000000
> 6 30000000
> 7 30000000
No data movement will be needed if the function is RANGE LEFT (inclusive).
Data will need to be moved if RANGE RIGHT because the first boundary
includes data in the second partition.
> -- or switch out partition2 then MERGE
> boundary_id row_count
> 1 0
> 2 30000000
> 3 30000000
> 4 30000000
> 5 30000000
> 6 30000000
> 7 30000000
No data movement will be needed because both partitions 1 and 2 will be
empty during the MERGE. No data movement is needed in this case regardless
of LEFT or RIGHT.
Regarding general sliding window approaches, one usually wants to keep all
values for a given date in the same partition. The 2 basic techniques to
accomplish an efficient datetime based sliding window:
Method 1:
Specify RANGE LEFT with boundary values that includes the highest possible
datetime value (e.g. '2008-02-29T23:59:59.997'). To remove old data, switch
out partition 1 and MERGE the first partition.
Method 2:
Specify RANGE RIGHT with boundary values of the next date (e.g.
'2008-03-01T00:00:00'). To remove old data, switch out partition 2 and
MERGE the first partition. Partition 1 is always empty with this approach.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Cranfield" <alan_cranfield@.msn.co.za> wrote in message
news:A75BDFAA-9492-455A-92BE-12F4FFBCCED9@.microsoft.com...
> We have a sliding window scenario where every day we add a new day and
> trim
> off an old day from our partition function and scheme.
> I read somewhere that in this scenario it is better to keep Partition1
> empty
> and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
> SWITCHing out Partition1 and then MERGEing Partition 1. Reason being that
> you
> will incurr additional I/O if you do it this way.
> Before I test this, can anyone confirm this. We have 30 million row
> partitions and can't afford any I/O while we MERGE Partition1. I am of the
> opinion that provided Partition1 is empty, the MERGE will always be a
> metadata operation only.
> -- switch out Partition1 then MERGE
> boundary_id row_count
> 1 30000000
> 2 30000000
> 3 30000000
> 4 30000000
> 5 30000000
> 6 30000000
> 7 30000000
> -- or switch out partition2 then MERGE
> boundary_id row_count
> 1 0
> 2 30000000
> 3 30000000
> 4 30000000
> 5 30000000
> 6 30000000
> 7 30000000
> which is less I/O? I dont want to be moving data around on disk.
> --
> -- cranfield, DBA|||Hi Dan
Thankls for that. We've used RANGE RIGHT as we have the concept of a
Trading Day which is all data up to 21h30. Anything that comes in after that
goes to the next partition. In addition we have binary date representation as
our partition key. We will make sure we keep Partition1 empty and SWITCH OUT
Partition2 and then MERGE.
--e.g.
CREATE PARTITION FUNCTION [MyFunc](binary(16))
AS RANGE RIGHT FOR VALUES (
0x47966058000000000000000000000000,
0x4797B1D8000000000000000000000000,
0x47990358000000000000000000000000,
0x479A54D8000000000000000000000000,
0x479BA658000000000000000000000000,
0x479CF7D8000000000000000000000000,
0x479E4958000000000000000000000000,
0x479F9AD8000000000000000000000000,
0x47A0EC58000000000000000000000000,
0x47A23DD8000000000000000000000000,
0x47A38F58000000000000000000000000,
0x47A4E0D8000000000000000000000000,
0x47A63258000000000000000000000000,
0x47A783D8000000000000000000000000,
0x47A8D558000000000000000000000000,
0x47AA26D8000000000000000000000000,
0x47AB7858000000000000000000000000
)
-- equates to:
2008-01-22 21:30:00.000
2008-01-23 21:30:00.000
2008-01-24 21:30:00.000
2008-01-25 21:30:00.000
2008-01-26 21:30:00.000
2008-01-27 21:30:00.000
2008-01-28 21:30:00.000
2008-01-29 21:30:00.000
2008-01-30 21:30:00.000
2008-01-31 21:30:00.000
2008-02-01 21:30:00.000
2008-02-02 21:30:00.000
2008-02-03 21:30:00.000
2008-02-04 21:30:00.000
2008-02-05 21:30:00.000
-- cranfield, DBA
"Dan Guzman" wrote:
> > I read somewhere that in this scenario it is better to keep Partition1
> > empty
> > and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
> > SWITCHing out Partition1 and then MERGEing Partition 1. Reason being that
> > you
> > will incurr additional I/O if you do it this way.
> The partition that includes the specified boundary must be empty in order to
> avoid data movement during MERGE.
> > -- switch out Partition1 then MERGE
> > boundary_id row_count
> > 1 30000000
> > 2 30000000
> > 3 30000000
> > 4 30000000
> > 5 30000000
> > 6 30000000
> > 7 30000000
> No data movement will be needed if the function is RANGE LEFT (inclusive).
> Data will need to be moved if RANGE RIGHT because the first boundary
> includes data in the second partition.
> > -- or switch out partition2 then MERGE
> > boundary_id row_count
> > 1 0
> > 2 30000000
> > 3 30000000
> > 4 30000000
> > 5 30000000
> > 6 30000000
> > 7 30000000
> No data movement will be needed because both partitions 1 and 2 will be
> empty during the MERGE. No data movement is needed in this case regardless
> of LEFT or RIGHT.
> Regarding general sliding window approaches, one usually wants to keep all
> values for a given date in the same partition. The 2 basic techniques to
> accomplish an efficient datetime based sliding window:
> Method 1:
> Specify RANGE LEFT with boundary values that includes the highest possible
> datetime value (e.g. '2008-02-29T23:59:59.997'). To remove old data, switch
> out partition 1 and MERGE the first partition.
> Method 2:
> Specify RANGE RIGHT with boundary values of the next date (e.g.
> '2008-03-01T00:00:00'). To remove old data, switch out partition 2 and
> MERGE the first partition. Partition 1 is always empty with this approach.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Cranfield" <alan_cranfield@.msn.co.za> wrote in message
> news:A75BDFAA-9492-455A-92BE-12F4FFBCCED9@.microsoft.com...
> > We have a sliding window scenario where every day we add a new day and
> > trim
> > off an old day from our partition function and scheme.
> >
> > I read somewhere that in this scenario it is better to keep Partition1
> > empty
> > and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
> > SWITCHing out Partition1 and then MERGEing Partition 1. Reason being that
> > you
> > will incurr additional I/O if you do it this way.
> >
> > Before I test this, can anyone confirm this. We have 30 million row
> > partitions and can't afford any I/O while we MERGE Partition1. I am of the
> > opinion that provided Partition1 is empty, the MERGE will always be a
> > metadata operation only.
> >
> > -- switch out Partition1 then MERGE
> > boundary_id row_count
> > 1 30000000
> > 2 30000000
> > 3 30000000
> > 4 30000000
> > 5 30000000
> > 6 30000000
> > 7 30000000
> >
> > -- or switch out partition2 then MERGE
> > boundary_id row_count
> > 1 0
> > 2 30000000
> > 3 30000000
> > 4 30000000
> > 5 30000000
> > 6 30000000
> > 7 30000000
> >
> > which is less I/O? I dont want to be moving data around on disk.
> > --
> > -- cranfield, DBA
>|||> We will make sure we keep Partition1 empty and SWITCH OUT
> Partition2 and then MERGE.
It looks like you are in good shape then. I'm glad I was able to help out
out.
--
Dan Guzman
SQL Server MVP
"Cranfield" <alan_cranfield@.msn.co.za> wrote in message
news:34841B67-89A1-4AF0-8069-EE03F709B923@.microsoft.com...
> Hi Dan
> Thankls for that. We've used RANGE RIGHT as we have the concept of a
> Trading Day which is all data up to 21h30. Anything that comes in after
> that
> goes to the next partition. In addition we have binary date representation
> as
> our partition key. We will make sure we keep Partition1 empty and SWITCH
> OUT
> Partition2 and then MERGE.
> --e.g.
> CREATE PARTITION FUNCTION [MyFunc](binary(16))
> AS RANGE RIGHT FOR VALUES (
> 0x47966058000000000000000000000000,
> 0x4797B1D8000000000000000000000000,
> 0x47990358000000000000000000000000,
> 0x479A54D8000000000000000000000000,
> 0x479BA658000000000000000000000000,
> 0x479CF7D8000000000000000000000000,
> 0x479E4958000000000000000000000000,
> 0x479F9AD8000000000000000000000000,
> 0x47A0EC58000000000000000000000000,
> 0x47A23DD8000000000000000000000000,
> 0x47A38F58000000000000000000000000,
> 0x47A4E0D8000000000000000000000000,
> 0x47A63258000000000000000000000000,
> 0x47A783D8000000000000000000000000,
> 0x47A8D558000000000000000000000000,
> 0x47AA26D8000000000000000000000000,
> 0x47AB7858000000000000000000000000
> )
>
> -- equates to:
> 2008-01-22 21:30:00.000
> 2008-01-23 21:30:00.000
> 2008-01-24 21:30:00.000
> 2008-01-25 21:30:00.000
> 2008-01-26 21:30:00.000
> 2008-01-27 21:30:00.000
> 2008-01-28 21:30:00.000
> 2008-01-29 21:30:00.000
> 2008-01-30 21:30:00.000
> 2008-01-31 21:30:00.000
> 2008-02-01 21:30:00.000
> 2008-02-02 21:30:00.000
> 2008-02-03 21:30:00.000
> 2008-02-04 21:30:00.000
> 2008-02-05 21:30:00.000
>
> --
> -- cranfield, DBA
>
> "Dan Guzman" wrote:
>> > I read somewhere that in this scenario it is better to keep Partition1
>> > empty
>> > and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
>> > SWITCHing out Partition1 and then MERGEing Partition 1. Reason being
>> > that
>> > you
>> > will incurr additional I/O if you do it this way.
>> The partition that includes the specified boundary must be empty in order
>> to
>> avoid data movement during MERGE.
>> > -- switch out Partition1 then MERGE
>> > boundary_id row_count
>> > 1 30000000
>> > 2 30000000
>> > 3 30000000
>> > 4 30000000
>> > 5 30000000
>> > 6 30000000
>> > 7 30000000
>> No data movement will be needed if the function is RANGE LEFT
>> (inclusive).
>> Data will need to be moved if RANGE RIGHT because the first boundary
>> includes data in the second partition.
>> > -- or switch out partition2 then MERGE
>> > boundary_id row_count
>> > 1 0
>> > 2 30000000
>> > 3 30000000
>> > 4 30000000
>> > 5 30000000
>> > 6 30000000
>> > 7 30000000
>> No data movement will be needed because both partitions 1 and 2 will be
>> empty during the MERGE. No data movement is needed in this case
>> regardless
>> of LEFT or RIGHT.
>> Regarding general sliding window approaches, one usually wants to keep
>> all
>> values for a given date in the same partition. The 2 basic techniques to
>> accomplish an efficient datetime based sliding window:
>> Method 1:
>> Specify RANGE LEFT with boundary values that includes the highest
>> possible
>> datetime value (e.g. '2008-02-29T23:59:59.997'). To remove old data,
>> switch
>> out partition 1 and MERGE the first partition.
>> Method 2:
>> Specify RANGE RIGHT with boundary values of the next date (e.g.
>> '2008-03-01T00:00:00'). To remove old data, switch out partition 2 and
>> MERGE the first partition. Partition 1 is always empty with this
>> approach.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Cranfield" <alan_cranfield@.msn.co.za> wrote in message
>> news:A75BDFAA-9492-455A-92BE-12F4FFBCCED9@.microsoft.com...
>> > We have a sliding window scenario where every day we add a new day and
>> > trim
>> > off an old day from our partition function and scheme.
>> >
>> > I read somewhere that in this scenario it is better to keep Partition1
>> > empty
>> > and always SWITCH out Partition2 and then MERGE Partition 1. Instead of
>> > SWITCHing out Partition1 and then MERGEing Partition 1. Reason being
>> > that
>> > you
>> > will incurr additional I/O if you do it this way.
>> >
>> > Before I test this, can anyone confirm this. We have 30 million row
>> > partitions and can't afford any I/O while we MERGE Partition1. I am of
>> > the
>> > opinion that provided Partition1 is empty, the MERGE will always be a
>> > metadata operation only.
>> >
>> > -- switch out Partition1 then MERGE
>> > boundary_id row_count
>> > 1 30000000
>> > 2 30000000
>> > 3 30000000
>> > 4 30000000
>> > 5 30000000
>> > 6 30000000
>> > 7 30000000
>> >
>> > -- or switch out partition2 then MERGE
>> > boundary_id row_count
>> > 1 0
>> > 2 30000000
>> > 3 30000000
>> > 4 30000000
>> > 5 30000000
>> > 6 30000000
>> > 7 30000000
>> >
>> > which is less I/O? I dont want to be moving data around on disk.
>> > --
>> > -- cranfield, DBA
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment