1
00:00:00,000 --> 00:00:02,000
Data Loading Options.

2
00:00:02,000 --> 00:00:05,000
So earlier in this series, we looked at how to load

3
00:00:05,000 --> 00:00:06,000
dimensions.

4
00:00:06,000 --> 00:00:08,000
We saw how to work with slowly changing dimensions to

5
00:00:08,000 --> 00:00:11,000
maintain historical accuracy, and we saw that loading

6
00:00:11,000 --> 00:00:14,000
dimensions wasn't bad, as far as performance goes, because

7
00:00:14,000 --> 00:00:15,000
dimensions are generally small.

8
00:00:15,000 --> 00:00:18,000
This Nugget, we're going to focus on fact tables.

9
00:00:18,000 --> 00:00:20,000
Fact tables are generally huge.

10
00:00:20,000 --> 00:00:22,000
So we've got a whole new set of challenges to deal with.

11
00:00:22,000 --> 00:00:25,000
How do we load an entire set--

12
00:00:25,000 --> 00:00:28,000
a full data load, as we call it-- into our fact table?

13
00:00:28,000 --> 00:00:31,000
How do we deal with incremental loads-- loads that

14
00:00:31,000 --> 00:00:33,000
we only want to track changed data and new data

15
00:00:33,000 --> 00:00:35,000
since the last load?

16
00:00:35,000 --> 00:00:36,000
How do we deal with those challenges, and the big one,

17
00:00:36,000 --> 00:00:38,000
how do we deal with performance?

18
00:00:38,000 --> 00:00:40,000
Especially when we're dealing with an extremely large data

19
00:00:40,000 --> 00:00:42,000
warehouse, loads could take a day.

20
00:00:42,000 --> 00:00:44,000
So we obviously don't want to bring down

21
00:00:44,000 --> 00:00:45,000
production for a day.

22
00:00:45,000 --> 00:00:48,000
So I'm going to show you a really nice design pattern and

23
00:00:48,000 --> 00:00:51,000
a couple of tips and tricks for dealing with performance,

24
00:00:51,000 --> 00:00:52,000
and that's going to come in the form of something called

25
00:00:52,000 --> 00:00:55,000
partition switching, which is a really, really cool feature

26
00:00:55,000 --> 00:00:58,000
that was introduced in SQL Server 2005, that we'll get a

27
00:00:58,000 --> 00:01:00,000
good demo on here so you can see how it works.

28
00:01:00,000 --> 00:01:03,000
So we'll start with a whiteboard on what data

29
00:01:03,000 --> 00:01:04,000
loading is all about.

30
00:01:04,000 --> 00:01:07,000
So we'll talk about the full load or the initial load,

31
00:01:07,000 --> 00:01:09,000
we'll talk about incremental loads, what those entail, and

32
00:01:09,000 --> 00:01:11,000
there's a lot of different strategies for doing an

33
00:01:11,000 --> 00:01:12,000
incremental load.

34
00:01:12,000 --> 00:01:15,000
Much of it is going to depend on your source data.

35
00:01:15,000 --> 00:01:17,000
So we'll take a look at what some of those variables are.

36
00:01:17,000 --> 00:01:19,000
Then we'll also talk about partitions switching.

37
00:01:19,000 --> 00:01:21,000
So we'll kind of get a good overview of everything that

38
00:01:21,000 --> 00:01:24,000
this Nugget is going to cover on paper, and then we'll jump

39
00:01:24,000 --> 00:01:27,000
into the Virtual Nugget Lab and put it all to use here.

40
00:01:27,000 --> 00:01:28,000
We'll start with incremental and full loads.

41
00:01:28,000 --> 00:01:30,000
I'll show you how to do a full load, we'll create a full load

42
00:01:30,000 --> 00:01:33,000
package, we'll create an incremental load package, one

43
00:01:33,000 --> 00:01:38,000
that uses both Dynamic SQL and one that uses CDC.

44
00:01:38,000 --> 00:01:40,000
change data capture, a feature that was introduced in SQL

45
00:01:40,000 --> 00:01:44,000
Server 2008 that we can use to help us

46
00:01:44,000 --> 00:01:45,000
track all these changes.

47
00:01:45,000 --> 00:01:48,000
And I'll get you familiar with the SSIS components, the CDC

48
00:01:48,000 --> 00:01:50,000
control task, the CDC source.

49
00:01:50,000 --> 00:01:52,000
We'll look at the CDC splitter.

50
00:01:52,000 --> 00:01:55,000
So I'll give you a good run down on a package that uses

51
00:01:55,000 --> 00:01:56,000
all of the CDC functionality.

52
00:01:56,000 --> 00:01:58,000
It's pretty cool stuff and pretty easy to use.

53
00:01:58,000 --> 00:02:00,000
And finally, as I mentioned, we'll look

54
00:02:00,000 --> 00:02:01,000
at partition switching.

55
00:02:01,000 --> 00:02:04,000
A great way to speed up our data warehousing loads.

56
00:02:04,000 --> 00:02:06,000
And by the way, this isn't something you'll need to know

57
00:02:06,000 --> 00:02:09,000
in great detail for the 70-463 exam, but it's certainly

58
00:02:09,000 --> 00:02:11,000
something you'll want to be aware of, especially if you do

59
00:02:11,000 --> 00:02:12,000
this in the real world.

60
00:02:12,000 --> 00:02:15,000
And by the way, I remember when SQL Server 2008 came out,

61
00:02:15,000 --> 00:02:18,000
Microsoft put a pretty cool article out there on how they

62
00:02:18,000 --> 00:02:22,000
were boasting that they loaded a terabyte of data in 30

63
00:02:22,000 --> 00:02:26,000
minutes using SSIS and partition switching.

64
00:02:26,000 --> 00:02:28,000
So it's kind of funny, because we're going to do the exact

65
00:02:28,000 --> 00:02:31,000
same thing that they did, just on a much smaller scale.

66
00:02:31,000 --> 00:02:33,000
But you'll still get to see a good demonstration on how to

67
00:02:33,000 --> 00:02:35,000
do partition switching, because it's

68
00:02:35,000 --> 00:02:36,000
awesome stuff here.

69
00:02:36,000 --> 00:02:39,000
It really is as good as it gets as far as loading data as

70
00:02:39,000 --> 00:02:42,000
fast as possible in SQL Server.

71
00:02:42,000 --> 00:02:45,000
Let's start here with the data loading overview, and I want

72
00:02:45,000 --> 00:02:47,000
to get you familiar with what we're going to accomplish in

73
00:02:47,000 --> 00:02:49,000
this Nugget and what we're going to work with, as far as

74
00:02:49,000 --> 00:02:52,000
our data warehouse and our OLTP databases go.

75
00:02:52,000 --> 00:02:55,000
So, in the Adventure Works Data Warehouse, we have this

76
00:02:55,000 --> 00:02:58,000
data mart that deals with the internet sales.

77
00:02:58,000 --> 00:03:03,000
So we're going to work with the FactInternetSales table,

78
00:03:03,000 --> 00:03:07,000
and this data is modeled after many tables in the Adventure

79
00:03:07,000 --> 00:03:08,000
Works database.

80
00:03:08,000 --> 00:03:11,000
Most of them are going to sit inside of the sales schema.

81
00:03:11,000 --> 00:03:13,000
Sales order header is where most of our data is

82
00:03:13,000 --> 00:03:13,000
going to come from.

83
00:03:13,000 --> 00:03:16,000
And there's a lot of other linkages that go on this.

84
00:03:16,000 --> 00:03:17,000
We'll take a look at it when we get into our demo and show

85
00:03:17,000 --> 00:03:19,000
you the big honking query going to use that pulls all

86
00:03:19,000 --> 00:03:22,000
this data together, and that's the query we're going to use

87
00:03:22,000 --> 00:03:26,000
to push that data into our FactInternetSales table.

88
00:03:26,000 --> 00:03:28,000
So we're going to start this process from the beginning.

89
00:03:28,000 --> 00:03:30,000
I want to show you this from scratch, that way you can get

90
00:03:30,000 --> 00:03:32,000
an idea here of how you would load a fact table.

91
00:03:32,000 --> 00:03:34,000
So, the first thing we're going to do is we're going to

92
00:03:34,000 --> 00:03:34,000
empty out this fact table.

93
00:03:34,000 --> 00:03:37,000
We're going to say this is the very first time we're loading

94
00:03:37,000 --> 00:03:38,000
this fact table.

95
00:03:38,000 --> 00:03:38,000
How do we do it?

96
00:03:38,000 --> 00:03:42,000
And we're going to do it by creating a full load, an SSIS

97
00:03:42,000 --> 00:03:47,000
package that is responsible for initially loading our fact

98
00:03:47,000 --> 00:03:49,000
table, and it's good to have that package around in case we

99
00:03:49,000 --> 00:03:51,000
ever need to do a full load again.

100
00:03:51,000 --> 00:03:55,000
From there, we'll learn how to build an incremental SSIS

101
00:03:55,000 --> 00:03:56,000
package, one the does incremental loads.

102
00:03:56,000 --> 00:04:01,000
Incremental is anything that has changed or been added to

103
00:04:01,000 --> 00:04:04,000
our source since our last load.

104
00:04:04,000 --> 00:04:05,000
How do we figure that out?

105
00:04:05,000 --> 00:04:08,000
That's really going to depend on your source data.

106
00:04:08,000 --> 00:04:10,000
For instance, see this modified date down here?

107
00:04:10,000 --> 00:04:11,000
That's a good thing.

108
00:04:11,000 --> 00:04:13,000
If you have a modified date in your source, it's going to

109
00:04:13,000 --> 00:04:17,000
make things a little bit easier, because we know what's

110
00:04:17,000 --> 00:04:18,000
been changed when.

111
00:04:18,000 --> 00:04:21,000
Now, even easier, if we have that same modified date over

112
00:04:21,000 --> 00:04:25,000
in our fact table, because now, we can create a package

113
00:04:25,000 --> 00:04:27,000
that the first thing it does is a look up into our

114
00:04:27,000 --> 00:04:29,000
FactInternetSales table.

115
00:04:29,000 --> 00:04:30,000
We can just run a simple aggregate query.

116
00:04:30,000 --> 00:04:31,000
We don't have to actually do a look up.

117
00:04:31,000 --> 00:04:35,000
And by the way, look ups are bad against your fact tables

118
00:04:35,000 --> 00:04:37,000
in SSIS, and I'll talk a little bit more about that and

119
00:04:37,000 --> 00:04:39,000
why we want to do set-based operations here shortly.

120
00:04:39,000 --> 00:04:43,000
But in this scenario, we could run a quick aggregate to give

121
00:04:43,000 --> 00:04:46,000
us a max on the modified date, which will be the starting

122
00:04:46,000 --> 00:04:49,000
point for our query that hits our source.

123
00:04:49,000 --> 00:04:51,000
We'll do anything after that date, and that will give us

124
00:04:51,000 --> 00:04:53,000
all our changed records.

125
00:04:53,000 --> 00:04:56,000
Finding all the new records is easy, because we can do the

126
00:04:56,000 --> 00:04:56,000
same thing.

127
00:04:56,000 --> 00:04:59,000
We can get a max on, say, our product key, and that's going

128
00:04:59,000 --> 00:05:03,000
to be the beginning, we'll do greater than that key against

129
00:05:03,000 --> 00:05:05,000
our source, and that will give us all of our new records.

130
00:05:05,000 --> 00:05:08,000
So if you have modified dates in both places, it makes

131
00:05:08,000 --> 00:05:12,000
things very easy for finding incremental and new data.

132
00:05:12,000 --> 00:05:15,000
What if we didn't have it inside of our destination,

133
00:05:15,000 --> 00:05:17,000
inside of our fact table?

134
00:05:17,000 --> 00:05:21,000
Well in that case, we could create our own table.

135
00:05:21,000 --> 00:05:23,000
Maybe we just stuff this inside of our audit

136
00:05:23,000 --> 00:05:28,000
information, so every time the package runs, it'll log the ID

137
00:05:28,000 --> 00:05:33,000
of the run, and then the modified date, the last

138
00:05:33,000 --> 00:05:33,000
modified date.

139
00:05:33,000 --> 00:05:35,000
So instead of that being stored in here,

140
00:05:35,000 --> 00:05:36,000
it's stored in here.

141
00:05:36,000 --> 00:05:39,000
And every time we ran this, then we would grab the last

142
00:05:39,000 --> 00:05:43,000
modified date out of this table to use inside of our

143
00:05:43,000 --> 00:05:45,000
package as the starting point.

144
00:05:45,000 --> 00:05:47,000
So there's ways around that.

145
00:05:47,000 --> 00:05:49,000
Now, the tricky part comes as what if we didn't have a

146
00:05:49,000 --> 00:05:52,000
modified date in our source or our destination?

147
00:05:52,000 --> 00:05:54,000
Now how do we figure out what records have changed?

148
00:05:54,000 --> 00:05:58,000
Well, we have something called CDC, the changed data capture.

149
00:05:58,000 --> 00:06:01,000
This was introduced in SQL Server 2008.

150
00:06:01,000 --> 00:06:02,000
It's really nice.

151
00:06:02,000 --> 00:06:05,000
It's really just SQL Server's way of tracking inserts,

152
00:06:05,000 --> 00:06:08,000
updates, and deletes against a table.

153
00:06:08,000 --> 00:06:10,000
Once you enable CDC on a database, it installs a bunch

154
00:06:10,000 --> 00:06:14,000
of system tables and system objects that SQL Server will

155
00:06:14,000 --> 00:06:17,000
then use to track any change data.

156
00:06:17,000 --> 00:06:20,000
And we have CDC tasks and transformations that we can

157
00:06:20,000 --> 00:06:24,000
use in SSIS to work with this changed data.

158
00:06:24,000 --> 00:06:26,000
And it's actually kind of neat, because in the data

159
00:06:26,000 --> 00:06:29,000
flow, we have a CDC source to hook into and grab all that

160
00:06:29,000 --> 00:06:30,000
changed information.

161
00:06:30,000 --> 00:06:33,000
We also a CDC splitter, which will take all that changed

162
00:06:33,000 --> 00:06:35,000
data and throw it down it's own data flow.

163
00:06:35,000 --> 00:06:37,000
So you have an insert data flow, an update data flow, and

164
00:06:37,000 --> 00:06:40,000
a delete data flow that you can do whatever you want with

165
00:06:40,000 --> 00:06:42,000
on those flows.

166
00:06:42,000 --> 00:06:45,000
Another important aspect to loading up fact tables is

167
00:06:45,000 --> 00:06:46,000
performance.

168
00:06:46,000 --> 00:06:48,000
There's a couple of things we can do on the performance

169
00:06:48,000 --> 00:06:49,000
side, partitions switching is going to be the big one, I'll

170
00:06:49,000 --> 00:06:53,000
talk about it in a second, but also set-based operations.

171
00:06:53,000 --> 00:06:58,000
Set-based operations is really just using SQL, using inserts

172
00:06:58,000 --> 00:07:02,000
with selects and joins to take data from a source and quickly

173
00:07:02,000 --> 00:07:04,000
put it into a destination.

174
00:07:04,000 --> 00:07:07,000
It's all using SQL, because it's not row by row processing

175
00:07:07,000 --> 00:07:09,000
like we do inside of our data flow.

176
00:07:09,000 --> 00:07:11,000
And if you remember this, a few Nuggets ago, we talked

177
00:07:11,000 --> 00:07:12,000
about how to load up a data warehouse.

178
00:07:12,000 --> 00:07:15,000
Say we have our source database, we have a staging

179
00:07:15,000 --> 00:07:18,000
database, and then we have a production database.

180
00:07:18,000 --> 00:07:21,000
So we want to load up our staging using SSIS in whatever

181
00:07:21,000 --> 00:07:22,000
means possible.

182
00:07:22,000 --> 00:07:24,000
We can use look ups here and such, because we're not

183
00:07:24,000 --> 00:07:26,000
hitting our production database.

184
00:07:26,000 --> 00:07:27,000
We're loading it into a staging area, so we can take

185
00:07:27,000 --> 00:07:29,000
all the time we need.

186
00:07:29,000 --> 00:07:32,000
And then, from our staging to production is where we want to

187
00:07:32,000 --> 00:07:35,000
use a set-based operation, because when we're going into

188
00:07:35,000 --> 00:07:38,000
production, we want to load it as fast as possible.

189
00:07:38,000 --> 00:07:42,000
We want to impact production as little as possible.

190
00:07:42,000 --> 00:07:45,000
So I'll show you how to do exactly this first.

191
00:07:45,000 --> 00:07:48,000
We will create a full and incremental load that uses all

192
00:07:48,000 --> 00:07:50,000
of these scenarios and loads up our

193
00:07:50,000 --> 00:07:53,000
staging data using SSIS.

194
00:07:53,000 --> 00:07:56,000
Then, we'll take staging, we'll push it into production

195
00:07:56,000 --> 00:07:59,000
using a set-based operation.

196
00:07:59,000 --> 00:08:02,000
We could even take that a step further if we incorporate

197
00:08:02,000 --> 00:08:05,000
partition switching into our strategy, because what this

198
00:08:05,000 --> 00:08:07,000
allows us to do is take data from staging, put it into

199
00:08:07,000 --> 00:08:08,000
production.

200
00:08:08,000 --> 00:08:13,000
Rather than doing a set-based operation, we just simply do a

201
00:08:13,000 --> 00:08:13,000
partition switch.

202
00:08:13,000 --> 00:08:16,000
Now, in order for that to work, these need to be inside

203
00:08:16,000 --> 00:08:18,000
of the same database.

204
00:08:18,000 --> 00:08:19,000
And that's not a problem.

205
00:08:19,000 --> 00:08:20,000
We'll put our staging tables right inside of

206
00:08:20,000 --> 00:08:21,000
our production database.

207
00:08:21,000 --> 00:08:25,000
We'll call it FactInternetSales_Staging,

208
00:08:25,000 --> 00:08:28,000
we'll load that up, and then, in order to get our data from

209
00:08:28,000 --> 00:08:31,000
staging to our production table, we'll just simply do a

210
00:08:31,000 --> 00:08:34,000
partition switch, which is an instantaneous operation

211
00:08:34,000 --> 00:08:36,000
because all you're doing is switching the data pointers

212
00:08:36,000 --> 00:08:40,000
from one partition to another partition.

213
00:08:40,000 --> 00:08:41,000
The best way to understand partition

214
00:08:41,000 --> 00:08:43,000
switching is with an example.

215
00:08:43,000 --> 00:08:45,000
So let's start at the beginning here.

216
00:08:45,000 --> 00:08:48,000
Let's say that we wanted to partition our

217
00:08:48,000 --> 00:08:49,000
FactInternetSales table.

218
00:08:49,000 --> 00:08:53,000
The first thing we would do is create a partition function.

219
00:08:53,000 --> 00:08:54,000
And let's say we wanted to split it up by year based on

220
00:08:54,000 --> 00:08:55,000
the order date.

221
00:08:55,000 --> 00:08:57,000
So we've got an order date inside of this table, so we

222
00:08:57,000 --> 00:09:00,000
create a partition function on that order date column, and we

223
00:09:00,000 --> 00:09:01,000
specify our ranges.

224
00:09:01,000 --> 00:09:06,000
So let's say that this is our 2005 data in one file, 2006

225
00:09:06,000 --> 00:09:09,000
data in another, and 2007 in yet another.

226
00:09:09,000 --> 00:09:12,000
So our partition function contains a range for each one

227
00:09:12,000 --> 00:09:13,000
of these years.

228
00:09:13,000 --> 00:09:16,000
Then we create our partition scheme tied to the partition

229
00:09:16,000 --> 00:09:19,000
function, and then we recreate our table tied to this

230
00:09:19,000 --> 00:09:20,000
partition scheme.

231
00:09:20,000 --> 00:09:24,000
Once we do our initial load, all of that data gets spliced

232
00:09:24,000 --> 00:09:27,000
up and put inside of its appropriate file.

233
00:09:27,000 --> 00:09:32,000
So now it's 2008, and we have a big data load to do to get

234
00:09:32,000 --> 00:09:34,000
that 2008 data into our fact table.

235
00:09:34,000 --> 00:09:34,000
How do we do it?

236
00:09:34,000 --> 00:09:37,000
The first thing we do is create an empty partition for

237
00:09:37,000 --> 00:09:39,000
our 2008 data.

238
00:09:39,000 --> 00:09:41,000
We expand our partition function to include that

239
00:09:41,000 --> 00:09:44,000
range, and now we create a staging table.

240
00:09:44,000 --> 00:09:47,000
This staging table is the exact same structure.

241
00:09:47,000 --> 00:09:49,000
It is our FactInternetSales table.

242
00:09:49,000 --> 00:09:52,000
We'll just call it FactInternetSales_Staging.

243
00:09:52,000 --> 00:09:54,000
And this is a non-partition table.

244
00:09:54,000 --> 00:09:56,000
It's just a regular old table, nothing special about it.

245
00:09:56,000 --> 00:09:59,000
It sits inside of the primary file group, uses the MDF file

246
00:09:59,000 --> 00:10:00,000
to store its data.

247
00:10:00,000 --> 00:10:02,000
So nothing special about this table.

248
00:10:02,000 --> 00:10:04,000
The only requirement here, it needs to have

249
00:10:04,000 --> 00:10:06,000
the same exact structure.

250
00:10:06,000 --> 00:10:09,000
So now, we can use the old fashioned way, use SSIS to

251
00:10:09,000 --> 00:10:11,000
load up this staging table.

252
00:10:11,000 --> 00:10:14,000
But rather than use a set-based operation or even

253
00:10:14,000 --> 00:10:17,000
just SSIS to load the data and push it into production, we

254
00:10:17,000 --> 00:10:19,000
can run the switch statement ALTER TABLE.

255
00:10:19,000 --> 00:10:22,000
We would alter our staging table against our fact

256
00:10:22,000 --> 00:10:28,000
internet table, use the SWITCH SQL command, and SQL will just

257
00:10:28,000 --> 00:10:28,000
switch the data pointers.

258
00:10:28,000 --> 00:10:33,000
So now this file will look loaded, and the staging table

259
00:10:33,000 --> 00:10:35,000
underlying data pages will look empty.

260
00:10:35,000 --> 00:10:36,000
So it literally just switches it around.

261
00:10:36,000 --> 00:10:40,000
It takes one second for SQL Server to do the switch.

262
00:10:40,000 --> 00:10:42,000
And the really cool thing about this is, let's say that

263
00:10:42,000 --> 00:10:45,000
we needed to load up three more years of data

264
00:10:45,000 --> 00:10:46,000
all at the same time.

265
00:10:46,000 --> 00:10:49,000
We can create a package with a parameter in it, say a year

266
00:10:49,000 --> 00:10:54,000
parameter, and then we could execute the package with 2009,

267
00:10:54,000 --> 00:10:56,000
execute it again with 2010, execute it again with 2011.

268
00:10:56,000 --> 00:10:59,000
So now we can do concurrent loading.

269
00:10:59,000 --> 00:11:01,000
We can load all of these at the same time, and when

270
00:11:01,000 --> 00:11:03,000
they're done, we can just do switch, switch, switch,

271
00:11:03,000 --> 00:11:05,000
switch, and that is as good it gets.

272
00:11:05,000 --> 00:11:06,000
That's really cool.

273
00:11:06,000 --> 00:11:09,000
And that's how Microsoft loaded a terabyte of data in

274
00:11:09,000 --> 00:11:13,000
30 minutes, by doing this concurrent data loading all in

275
00:11:13,000 --> 00:11:17,000
their own staging tables, and then just doing a switch

276
00:11:17,000 --> 00:11:19,000
against all staging tables into their own partition.

277
00:11:19,000 --> 00:11:20,000
Really neat stuff.

278
00:11:20,000 --> 00:11:22,000
I'll give you a good demo of this when I get into the

279
00:11:22,000 --> 00:11:24,000
Virtual Nugget Lab, because everybody should know how to

280
00:11:24,000 --> 00:11:26,000
do partition switching.

281
00:11:26,000 --> 00:11:29,000
And the reason you should know how to do this-- and it goes

282
00:11:29,000 --> 00:11:30,000
well beyond data warehousing.

283
00:11:30,000 --> 00:11:32,000
You can do this in large SQL Server databases.

284
00:11:32,000 --> 00:11:36,000
But my biggest thing about this is now you don't have to

285
00:11:36,000 --> 00:11:38,000
hit your production databases and your

286
00:11:38,000 --> 00:11:41,000
production tables directly.

287
00:11:41,000 --> 00:11:43,000
Any time I'm hitting a production table directly

288
00:11:43,000 --> 00:11:46,000
using SSIS or any other means, I'm biting my nails the entire

289
00:11:46,000 --> 00:11:48,000
time the process is running, praying something doesn't go

290
00:11:48,000 --> 00:11:52,000
wrong, so I don't have to stay late or use extreme measures

291
00:11:52,000 --> 00:11:54,000
to make sure that I get everything in and it's back to

292
00:11:54,000 --> 00:11:55,000
the way it was.

293
00:11:55,000 --> 00:11:59,000
Now, I can take my sweet time, load it into staging, ensure

294
00:11:59,000 --> 00:12:01,000
that everything looks good, and when I've ensured

295
00:12:01,000 --> 00:12:03,000
everything looks good, I just switch into production and

296
00:12:03,000 --> 00:12:04,000
it's automatically there.

297
00:12:04,000 --> 00:12:05,000
Really cool stuff.

298
00:12:05,000 --> 00:12:08,000
And by the way, you can use this technique to unload data,

299
00:12:08,000 --> 00:12:09,000
delete data, remove data.

300
00:12:09,000 --> 00:12:12,000
You never want to run delete statements against your

301
00:12:12,000 --> 00:12:14,000
production tables, especially large ones.

302
00:12:14,000 --> 00:12:16,000
That will cause all kinds of chaos, locking blocking

303
00:12:16,000 --> 00:12:19,000
problems in your production database.

304
00:12:19,000 --> 00:12:22,000
Rather than do that, if you need to remove data, say our

305
00:12:22,000 --> 00:12:25,000
2005 data is old, we can just create a table

306
00:12:25,000 --> 00:12:26,000
and switch the partition.

307
00:12:26,000 --> 00:12:29,000
And all that data will just magically disappear our of

308
00:12:29,000 --> 00:12:30,000
production and end up in our other table.

309
00:12:30,000 --> 00:12:32,000
So again, great stuff.

310
00:12:32,000 --> 00:12:36,000
Works for loading and unloading data in tables.

311
00:12:36,000 --> 00:12:38,000
All right, why don't we head into the Virtual Nugget Lab

312
00:12:38,000 --> 00:12:40,000
and learn how to do all this wonderful stuff.

313
00:12:40,000 --> 00:12:43,000
We'll learn how to load up our fact tables initially,

314
00:12:43,000 --> 00:12:46,000
incrementally, use set-based operations, and work with

315
00:12:46,000 --> 00:12:48,000
partition switching.

316
00:12:48,000 --> 00:12:50,000
In the Virtual Nugget Lab, I have launched everything we're

317
00:12:50,000 --> 00:12:52,000
going to need in this Nugget.

318
00:12:52,000 --> 00:12:55,000
We have access to our support files directory here for this

319
00:12:55,000 --> 00:12:56,000
Nugget, 12-DataLoading.

320
00:12:56,000 --> 00:12:57,000
We'll be in here a little bit.

321
00:12:57,000 --> 00:12:59,000
When we get into partition switching, we're going to need

322
00:12:59,000 --> 00:13:02,000
somewhere to put those data files, so we have a data

323
00:13:02,000 --> 00:13:03,000
folder ready for us.

324
00:13:03,000 --> 00:13:06,000
In SSIS, obviously, we have a project that we're going to be

325
00:13:06,000 --> 00:13:08,000
working with, 12-DataLoading.

326
00:13:08,000 --> 00:13:12,000
I've pre-loaded our connection managers, and it's some SSIS

327
00:13:12,000 --> 00:13:14,000
packages, one for full, one for incremental, one for

328
00:13:14,000 --> 00:13:16,000
working with CDC.

329
00:13:16,000 --> 00:13:18,000
And when we get into partition switching, we'll use our full

330
00:13:18,000 --> 00:13:21,000
package here to show off partition switching.

331
00:13:21,000 --> 00:13:22,000
And these are mostly empty.

332
00:13:22,000 --> 00:13:24,000
The full and the incremental one we're

333
00:13:24,000 --> 00:13:25,000
going to do from scratch.

334
00:13:25,000 --> 00:13:28,000
The CDC one I've pre-built, we'll walk through it, and

335
00:13:28,000 --> 00:13:30,000
I'll show you how it works.

336
00:13:30,000 --> 00:13:32,000
We also have a couple of scripts down here to work with

337
00:13:32,000 --> 00:13:34,000
in each one of these areas as well.

338
00:13:34,000 --> 00:13:36,000
And we'll use these scripts from SQL

339
00:13:36,000 --> 00:13:37,000
Server Management Studio.

340
00:13:37,000 --> 00:13:38,000
So I also have that pre-launched with our

341
00:13:38,000 --> 00:13:41,000
connection to our SQL Nugget instance.

342
00:13:41,000 --> 00:13:43,000
So let's start by designing our full package, which is

343
00:13:43,000 --> 00:13:44,000
what we're going to use to initially

344
00:13:44,000 --> 00:13:46,000
load our data warehouse.

345
00:13:46,000 --> 00:13:50,000
And by the way, we're going to use the Adventure Works 2012

346
00:13:50,000 --> 00:13:52,000
database and our production data warehouse, so we're

347
00:13:52,000 --> 00:13:54,000
working with both production databases here.

348
00:13:54,000 --> 00:13:57,000
We're going to use our staging tables and put them inside of

349
00:13:57,000 --> 00:13:58,000
our production database.

350
00:13:58,000 --> 00:14:00,000
We're going to need that anyway for partition

351
00:14:00,000 --> 00:14:02,000
switching, because in order for partition switching to

352
00:14:02,000 --> 00:14:07,000
work, you need your tables to reside in the same file group.

353
00:14:07,000 --> 00:14:08,000
So we're just going to create our staging tables right in

354
00:14:08,000 --> 00:14:09,000
our production database.

355
00:14:09,000 --> 00:14:11,000
It'll make it easier for set-based updates, we won't

356
00:14:11,000 --> 00:14:13,000
have to cross-database queries and all that

357
00:14:13,000 --> 00:14:14,000
kind of stuff either.

358
00:14:14,000 --> 00:14:16,000
We can just go from our staging table to our

359
00:14:16,000 --> 00:14:17,000
production table.

360
00:14:17,000 --> 00:14:18,000
Pretty straightforward.

361
00:14:18,000 --> 00:14:20,000
So let's head back into our environment here, and the

362
00:14:20,000 --> 00:14:22,000
first thing we're going to need to do is drag and execute

363
00:14:22,000 --> 00:14:26,000
SQL task into the designer, because we need to create our

364
00:14:26,000 --> 00:14:27,000
staging tables.

365
00:14:27,000 --> 00:14:31,000
So let's give this the name, how about CreateStaging, and

366
00:14:31,000 --> 00:14:32,000
then let's open up our

367
00:14:32,000 --> 00:14:34,000
FactInternetSales_Staging script.

368
00:14:34,000 --> 00:14:36,000
I'm just going to copy this, and this is really just a

369
00:14:36,000 --> 00:14:39,000
clone or our FactInternetSales table with

370
00:14:39,000 --> 00:14:40,000
_Staging appended to it.

371
00:14:40,000 --> 00:14:43,000
So we need the exact same structure, that way it'll make

372
00:14:43,000 --> 00:14:44,000
set-based updates easy.

373
00:14:44,000 --> 00:14:47,000
So we'll do all of our transformations and taking all

374
00:14:47,000 --> 00:14:50,000
that data from our OLTP world, and pushing it into our

375
00:14:50,000 --> 00:14:53,000
staging table, and it'll just be a straight shot into our

376
00:14:53,000 --> 00:14:54,000
production table.

377
00:14:54,000 --> 00:14:56,000
All right, so let's open up our create staging task, let's

378
00:14:56,000 --> 00:14:57,000
assign a connection to it.

379
00:14:57,000 --> 00:15:00,000
We're going to do this inside of our data warehouse, and

380
00:15:00,000 --> 00:15:02,000
let's just paste in our code.

381
00:15:02,000 --> 00:15:03,000
There we go.

382
00:15:03,000 --> 00:15:05,000
So that will create our staging table.

383
00:15:05,000 --> 00:15:08,000
Any time we run this package, if that staging table is

384
00:15:08,000 --> 00:15:11,000
already there, it'll drop it and then recreate it.

385
00:15:11,000 --> 00:15:13,000
And now we need to stage that data.

386
00:15:13,000 --> 00:15:14,000
So let's go back into the SSIS toolbox.

387
00:15:14,000 --> 00:15:18,000
Let's drag a data flow into here, and let's just call this

388
00:15:18,000 --> 00:15:25,000
data flow, how about, Stage FactInternetSales data.

389
00:15:25,000 --> 00:15:28,000
And now let's go in here, this is just going to be a straight

390
00:15:28,000 --> 00:15:30,000
shot from our source to our destination.

391
00:15:30,000 --> 00:15:33,000
Our source is actually going to be a command that is going

392
00:15:33,000 --> 00:15:38,000
to contain our query that takes all the data from all

393
00:15:38,000 --> 00:15:42,000
the tables inside of our OLTP world and models it as the

394
00:15:42,000 --> 00:15:44,000
FactInternetSales table here in our data warehousing world.

395
00:15:44,000 --> 00:15:45,000
So let's grab a source.

396
00:15:45,000 --> 00:15:48,000
How about just an OLE DB source, here we go.

397
00:15:48,000 --> 00:15:49,000
So we'll drag that onto here.

398
00:15:49,000 --> 00:15:51,000
Let's also grab our destination while we're in

399
00:15:51,000 --> 00:15:54,000
here, OLE DB destination, there we go.

400
00:15:54,000 --> 00:15:58,000
And for our source here, we're going to use this OLTP query.

401
00:15:58,000 --> 00:16:01,000
Let me flip over here to management studio just to show

402
00:16:01,000 --> 00:16:02,000
you here so we can run it.

403
00:16:02,000 --> 00:16:03,000
I'm going to open up a file.

404
00:16:03,000 --> 00:16:06,000
We're just open up that query, same one that's in our SSIS

405
00:16:06,000 --> 00:16:07,000
project, and here it is.

406
00:16:07,000 --> 00:16:09,000
Here's what it looks like.

407
00:16:09,000 --> 00:16:12,000
So again, we're taking a lot of data from many tables here.

408
00:16:12,000 --> 00:16:14,000
You can see we're taking it from sales order header, sales

409
00:16:14,000 --> 00:16:17,000
order detail, product, customer.

410
00:16:17,000 --> 00:16:19,000
We're linking in our dimension, our product

411
00:16:19,000 --> 00:16:23,000
dimension and our currency dimension, to get the keys

412
00:16:23,000 --> 00:16:25,000
that our fact table expects out of there.

413
00:16:25,000 --> 00:16:27,000
We're also hitting the product cost history, currency rate,

414
00:16:27,000 --> 00:16:30,000
and then our human resources employee table there.

415
00:16:30,000 --> 00:16:36,000
So if we just run this right now, this will grab all of the

416
00:16:36,000 --> 00:16:37,000
data because we don't have any filters on it.

417
00:16:37,000 --> 00:16:38,000
This is everything.

418
00:16:38,000 --> 00:16:41,000
We're going to 60,000 rows loaded into our fact table

419
00:16:41,000 --> 00:16:43,000
when we do our initial load, and this is

420
00:16:43,000 --> 00:16:44,000
all of it right here.

421
00:16:44,000 --> 00:16:47,000
So this is exactly what that fact table looks like.

422
00:16:47,000 --> 00:16:50,000
So that's one key here.

423
00:16:50,000 --> 00:16:54,000
Obviously we're going to need to create a query that the

424
00:16:54,000 --> 00:16:56,000
results of that query are going to match the structure

425
00:16:56,000 --> 00:16:57,000
of our fact table, right?

426
00:16:57,000 --> 00:16:58,000
Pretty straightforward.

427
00:16:58,000 --> 00:17:03,000
So let's copy this, head back into SSIS, open up our source.

428
00:17:03,000 --> 00:17:06,000
We're going to set the source here as the data warehouse,

429
00:17:06,000 --> 00:17:09,000
and then our data access mode here is SQL command, and we're

430
00:17:09,000 --> 00:17:10,000
just going to paste in that query.

431
00:17:10,000 --> 00:17:12,000
Now we're going to hit the columns button that will

432
00:17:12,000 --> 00:17:14,000
generate all of the columns.

433
00:17:14,000 --> 00:17:15,000
There we go.

434
00:17:15,000 --> 00:17:16,000
So let's hit OK.

435
00:17:16,000 --> 00:17:16,000
Source looks good.

436
00:17:16,000 --> 00:17:18,000
Let's drag this down to our destination

437
00:17:18,000 --> 00:17:19,000
and do the same thing.

438
00:17:19,000 --> 00:17:22,000
So our destination is going to be our data warehouse, and

439
00:17:22,000 --> 00:17:26,000
then our table is going to be our staging table.

440
00:17:26,000 --> 00:17:27,000
But guess what?

441
00:17:27,000 --> 00:17:28,000
It doesn't exist yet.

442
00:17:28,000 --> 00:17:29,000
So you know what we're going to do?

443
00:17:29,000 --> 00:17:30,000
We're going to create it.

444
00:17:30,000 --> 00:17:32,000
Let's head back to the control flow, right click on Create

445
00:17:32,000 --> 00:17:34,000
Staging, hit execute.

446
00:17:34,000 --> 00:17:35,000
Now it's created.

447
00:17:35,000 --> 00:17:38,000
So now let's head back into our data flow, into our

448
00:17:38,000 --> 00:17:41,000
destination, change our connection once again here to

449
00:17:41,000 --> 00:17:45,000
our data warehouse, and now we should see an _staging.

450
00:17:45,000 --> 00:17:47,000
There it is.

451
00:17:47,000 --> 00:17:49,000
We'll map it, everything will line up, since the column

452
00:17:49,000 --> 00:17:52,000
names are the same as the output of the query, and we're

453
00:17:52,000 --> 00:17:52,000
good to go.

454
00:17:52,000 --> 00:17:56,000
So staging the data from our OLTP database into our staging

455
00:17:56,000 --> 00:17:57,000
table looks good.

456
00:17:57,000 --> 00:17:58,000
Let's head back to our control flow.

457
00:17:58,000 --> 00:18:01,000
The last thing we need to do is push this data from our

458
00:18:01,000 --> 00:18:04,000
staging table into our production table using a

459
00:18:04,000 --> 00:18:05,000
set-based operation.

460
00:18:05,000 --> 00:18:08,000
So let's head back into the SSIS toolbox, let's drag and

461
00:18:08,000 --> 00:18:10,000
execute SQL Task onto here.

462
00:18:10,000 --> 00:18:16,000
Let's just call this one Push Staging to Production.

463
00:18:16,000 --> 00:18:17,000
There we go.

464
00:18:17,000 --> 00:18:20,000
Let's open it up, assign a connection to it.

465
00:18:20,000 --> 00:18:22,000
It's all going to be done inside of our data warehouse.

466
00:18:22,000 --> 00:18:24,000
And this is as easy as it gets, really.

467
00:18:24,000 --> 00:18:26,000
Here's the perfect set-based operation.

468
00:18:26,000 --> 00:18:35,000
Insert, we want to go DBO.FactInternetSales.

469
00:18:35,000 --> 00:18:35,000
That's production, right?

470
00:18:35,000 --> 00:18:38,000
We want to insert into that table.

471
00:18:38,000 --> 00:18:44,000
Let me just copy this by doing a select star from our staging

472
00:18:44,000 --> 00:18:47,000
table, and because these structures line up, that's all

473
00:18:47,000 --> 00:18:48,000
we need to do.

474
00:18:48,000 --> 00:18:50,000
So this is a set-based operation because we're not

475
00:18:50,000 --> 00:18:51,000
doing row by row.

476
00:18:51,000 --> 00:18:54,000
If we were to load up this fact table by creating a data

477
00:18:54,000 --> 00:18:58,000
flow and sending the flow, that flow would insert it row

478
00:18:58,000 --> 00:19:01,000
by row into production, very slow process.

479
00:19:01,000 --> 00:19:04,000
This way, using set-based, it just grabs all the data out of

480
00:19:04,000 --> 00:19:07,000
our staging table and throws it all into production.

481
00:19:07,000 --> 00:19:09,000
And it's pretty easy there, as far as coding

482
00:19:09,000 --> 00:19:10,000
goes as well, right?

483
00:19:10,000 --> 00:19:11,000
Look at that, two lines of code.

484
00:19:11,000 --> 00:19:12,000
So we're good.

485
00:19:12,000 --> 00:19:13,000
Let's hit OK.

486
00:19:13,000 --> 00:19:18,000
Now we can just connect everything together, and our

487
00:19:18,000 --> 00:19:20,000
full package is done.

488
00:19:20,000 --> 00:19:22,000
Now let's pretend this really is an initial load, and right

489
00:19:22,000 --> 00:19:25,000
now we have data inside of our FactInternetSales table.

490
00:19:25,000 --> 00:19:27,000
So let's delete all the data.

491
00:19:27,000 --> 00:19:29,000
Now, we removed all the constraints, so we should just

492
00:19:29,000 --> 00:19:33,000
be able to do a truncate table FactInternetSales, highlight

493
00:19:33,000 --> 00:19:36,000
that, hit F5 to execute it, and now, if we grab everything

494
00:19:36,000 --> 00:19:40,000
here from our fact table, it's empty.

495
00:19:40,000 --> 00:19:43,000
So our production table is empty, we just created our

496
00:19:43,000 --> 00:19:44,000
staging table so that's empty.

497
00:19:44,000 --> 00:19:47,000
In fact, here, let's just refresh our tables,

498
00:19:47,000 --> 00:19:48,000
and here they are.

499
00:19:48,000 --> 00:19:52,000
So production empty, staging empty.

500
00:19:52,000 --> 00:19:54,000
This package will load up staging, and then do a

501
00:19:54,000 --> 00:19:57,000
set-based operation to take that staging data and push it

502
00:19:57,000 --> 00:19:58,000
into production.

503
00:19:58,000 --> 00:20:00,000
Let's run it and see what happens here, and

504
00:20:00,000 --> 00:20:00,000
see how long it takes.

505
00:20:00,000 --> 00:20:04,000
Shouldn't take long at all, and we do have 60,000 rows, so

506
00:20:04,000 --> 00:20:06,000
that's a good amount, but there you go.

507
00:20:06,000 --> 00:20:07,000
Not too bad, right?

508
00:20:07,000 --> 00:20:10,000
Now let's take a look here.

509
00:20:10,000 --> 00:20:12,000
And look at that, we have data.

510
00:20:12,000 --> 00:20:14,000
We have all 60,000 rows loaded into production.

511
00:20:14,000 --> 00:20:16,000
So that's how we can do an initial load.

512
00:20:16,000 --> 00:20:19,000
One of the easiest ways to do it, anyway.

513
00:20:19,000 --> 00:20:21,000
Load up the staging, and then do a set-based operation to

514
00:20:21,000 --> 00:20:22,000
push it in.

515
00:20:22,000 --> 00:20:25,000
So with the initial load in, now we need to develop our

516
00:20:25,000 --> 00:20:30,000
incremental package so all future runs only grab new and

517
00:20:30,000 --> 00:20:32,000
changed data, rather than running the

518
00:20:32,000 --> 00:20:33,000
full load every time.

519
00:20:33,000 --> 00:20:35,000
So again, small data warehouses, that might work,

520
00:20:35,000 --> 00:20:38,000
but extremely large ones, we need to reduce the amount of

521
00:20:38,000 --> 00:20:40,000
time that these loads take.

522
00:20:40,000 --> 00:20:43,000
So first, we're going to copy this task here, because we're

523
00:20:43,000 --> 00:20:46,000
going to need the same exact create staging task over in

524
00:20:46,000 --> 00:20:47,000
our incremental package.

525
00:20:47,000 --> 00:20:48,000
Then we'll load up our incremental package.

526
00:20:48,000 --> 00:20:51,000
I'm just going to paste this right in here, there we go.

527
00:20:51,000 --> 00:20:54,000
And so we already have our create staging task done.

528
00:20:54,000 --> 00:20:57,000
So the next step is finding new records.

529
00:20:57,000 --> 00:21:01,000
How do we find which records are new in our source?

530
00:21:01,000 --> 00:21:04,000
Well, what we can do is we can write a query, an aggregate,

531
00:21:04,000 --> 00:21:08,000
that gives us that highest product key out of our

532
00:21:08,000 --> 00:21:10,000
destination, out of our production data.

533
00:21:10,000 --> 00:21:12,000
And then we can store that in a variable, and we can use

534
00:21:12,000 --> 00:21:16,000
Dynamic SQL to pull that data from the source,

535
00:21:16,000 --> 00:21:17,000
based on that variable.

536
00:21:17,000 --> 00:21:18,000
Let's do it.

537
00:21:18,000 --> 00:21:22,000
So let's grab another Execute SQL task, drag

538
00:21:22,000 --> 00:21:24,000
that right down here.

539
00:21:24,000 --> 00:21:32,000
Let's just call this one, how about, Get MAXKey from

540
00:21:32,000 --> 00:21:33,000
Production.

541
00:21:33,000 --> 00:21:36,000
Something like that, that looks good.

542
00:21:36,000 --> 00:21:40,000
Connect these together, let's open this up, let's tie it to

543
00:21:40,000 --> 00:21:42,000
our dw, our data warehouse here.

544
00:21:42,000 --> 00:21:45,000
And then our SQL statement is going to be-- this should be a

545
00:21:45,000 --> 00:21:47,000
pretty straightforward SQL query.

546
00:21:47,000 --> 00:21:50,000
We're just going to do is SELECT MAX, and we're going to

547
00:21:50,000 --> 00:21:54,000
SalesOrderNumber, that is the primary key of the table here,

548
00:21:54,000 --> 00:22:01,000
as our MAX fact key from our production database, which is

549
00:22:01,000 --> 00:22:05,000
our FactInternetSales.

550
00:22:05,000 --> 00:22:06,000
That's it.

551
00:22:06,000 --> 00:22:10,000
So if you're dealing with a really large fact table and

552
00:22:10,000 --> 00:22:12,000
you do not want any of your queries affecting it

553
00:22:12,000 --> 00:22:16,000
whatsoever, you can use the WITH NOLOCK in here, and that

554
00:22:16,000 --> 00:22:20,000
will not generate any read or shared locks or anything.

555
00:22:20,000 --> 00:22:23,000
It'll basically allow us to kind of run a ninja query

556
00:22:23,000 --> 00:22:26,000
against the table so we don't affect anything in production.

557
00:22:26,000 --> 00:22:28,000
So that's a good idea to do that against large tables.

558
00:22:28,000 --> 00:22:30,000
So that's it.

559
00:22:30,000 --> 00:22:34,000
We'll hit OK, and now we want to actually use that value

560
00:22:34,000 --> 00:22:37,000
that that query returns here in our package.

561
00:22:37,000 --> 00:22:39,000
So we're going to go up to the Result Set property on the

562
00:22:39,000 --> 00:22:43,000
Execute SQL task, drop this down, choose single row, head

563
00:22:43,000 --> 00:22:45,000
into Result Set, and add one.

564
00:22:45,000 --> 00:22:47,000
We'll just call this zero, here.

565
00:22:47,000 --> 00:22:49,000
And then we need to specify a new variable to

566
00:22:49,000 --> 00:22:50,000
put the value in.

567
00:22:50,000 --> 00:22:54,000
So let's call this value here our MAXKey, and everything

568
00:22:54,000 --> 00:22:54,000
else looks good.

569
00:22:54,000 --> 00:22:57,000
String variable, you can choose the container that it's

570
00:22:57,000 --> 00:22:58,000
associated with.

571
00:22:58,000 --> 00:23:00,000
We'll leave it at the package level, here.

572
00:23:00,000 --> 00:23:01,000
We're going to need it in other tasks.

573
00:23:01,000 --> 00:23:03,000
And we'll hit OK.

574
00:23:03,000 --> 00:23:05,000
We hit OK here, and we're good.

575
00:23:05,000 --> 00:23:08,000
So now that we have a way to identify new records and we

576
00:23:08,000 --> 00:23:11,000
can use that in our data flow, along with a little dynamic

577
00:23:11,000 --> 00:23:16,000
SQL, to get everything from our source after that key,

578
00:23:16,000 --> 00:23:19,000
what about modified records, changed records?

579
00:23:19,000 --> 00:23:22,000
Now, with this particular design, it's going to be a

580
00:23:22,000 --> 00:23:24,000
little challenging doing it this way, because if you're

581
00:23:24,000 --> 00:23:27,000
look in here in our FactInternetSales table, we

582
00:23:27,000 --> 00:23:29,000
don't have a modified date in our destination, in our

583
00:23:29,000 --> 00:23:30,000
production data.

584
00:23:30,000 --> 00:23:33,000
So how could we do it in this scenario?

585
00:23:33,000 --> 00:23:35,000
Well that's where CDC will come into play, and that's

586
00:23:35,000 --> 00:23:37,000
exactly what I'll show you how to do with CDC.

587
00:23:37,000 --> 00:23:40,000
But for now, for this specific incremental package, we're

588
00:23:40,000 --> 00:23:41,000
just going to create a parameter.

589
00:23:41,000 --> 00:23:43,000
So we're going to kind of do this the static way.

590
00:23:43,000 --> 00:23:47,000
Let's call this parameter, how about, BeginModifiedDate.

591
00:23:47,000 --> 00:23:50,000
Make this a string, and just put the first of the year in

592
00:23:50,000 --> 00:23:52,000
here, so anything modified after the first of the year.

593
00:23:52,000 --> 00:23:55,000
When we add some records in there to test this, we'll make

594
00:23:55,000 --> 00:23:57,000
sure that they have a modified date after this date so we'll

595
00:23:57,000 --> 00:23:59,000
see them come in.

596
00:23:59,000 --> 00:24:02,000
So now the next step is getting a data flow in that

597
00:24:02,000 --> 00:24:06,000
makes use of our variable and our parameter to get the new

598
00:24:06,000 --> 00:24:07,000
and changed data.

599
00:24:07,000 --> 00:24:09,000
So what I'm going to do is I'm going to head back to our full

600
00:24:09,000 --> 00:24:13,000
package, I'm just going to copy onto the clipboard our

601
00:24:13,000 --> 00:24:14,000
data flow, because it's going to be the

602
00:24:14,000 --> 00:24:15,000
same exact data flow.

603
00:24:15,000 --> 00:24:19,000
The difference is we just need to parameterize that query in

604
00:24:19,000 --> 00:24:20,000
the source, all right?

605
00:24:20,000 --> 00:24:22,000
So let's connect things together, let's go into the

606
00:24:22,000 --> 00:24:25,000
data flow, let's go into our source,

607
00:24:25,000 --> 00:24:26,000
scroll down to the bottom.

608
00:24:26,000 --> 00:24:28,000
So our data flow grabs everything, we're going to add

609
00:24:28,000 --> 00:24:29,000
a where clause here.

610
00:24:29,000 --> 00:24:38,000
We're going to say WHERE sales order header .SalesOrderNumber

611
00:24:38,000 --> 00:24:40,000
is greater than question mark.

612
00:24:40,000 --> 00:24:42,000
We're going to use this parameters button to fill in

613
00:24:42,000 --> 00:24:43,000
that question mark with a variable.

614
00:24:43,000 --> 00:24:45,000
So that's all new data.

615
00:24:45,000 --> 00:24:48,000
That will get us all our new data, and we also want to say

616
00:24:48,000 --> 00:24:56,000
then, or soh.ModifiedDate greater than or equal to a

617
00:24:56,000 --> 00:24:57,000
question mark.

618
00:24:57,000 --> 00:24:59,000
That's going to run off of the premise that we just created.

619
00:24:59,000 --> 00:25:00,000
So now let's hit this parameters button.

620
00:25:00,000 --> 00:25:07,000
The first parameter is going to be our MAXKey that we did a

621
00:25:07,000 --> 00:25:09,000
look up against production, and our second parameter is on

622
00:25:09,000 --> 00:25:12,000
modified date, and we will use our package parameter that we

623
00:25:12,000 --> 00:25:13,000
just created for that.

624
00:25:13,000 --> 00:25:14,000
There we go.

625
00:25:14,000 --> 00:25:16,000
So the last thing we need to do, once again, is push this

626
00:25:16,000 --> 00:25:21,000
data from our stating table to our production table by doing

627
00:25:21,000 --> 00:25:22,000
a set-based operation.

628
00:25:22,000 --> 00:25:24,000
So let's head back into the control flow, let's head into

629
00:25:24,000 --> 00:25:28,000
the SSIS toolbox, and drag an Execute SQL task down here.

630
00:25:28,000 --> 00:25:30,000
And we'll give this one the same name as we did in our

631
00:25:30,000 --> 00:25:35,000
previous package, Push Staging to Production.

632
00:25:35,000 --> 00:25:37,000
And now let's connect these tasks together.

633
00:25:37,000 --> 00:25:39,000
Let's go into our Execute SQL task.

634
00:25:39,000 --> 00:25:42,000
Let's choose our connection as or data warehouse, and then,

635
00:25:42,000 --> 00:25:43,000
as far as a SQL statement goes, I'm just going to paste

636
00:25:43,000 --> 00:25:45,000
this in here, we'll go through it.

637
00:25:45,000 --> 00:25:47,000
Two statements, one for the insert,

638
00:25:47,000 --> 00:25:49,000
and one for the update.

639
00:25:49,000 --> 00:25:52,000
And the thing is here, that we're going from staging to

640
00:25:52,000 --> 00:25:54,000
production, two identical tables.

641
00:25:54,000 --> 00:25:58,000
So for inserting records, this is easy.

642
00:25:58,000 --> 00:26:00,000
Just like we did when we did our full package.

643
00:26:00,000 --> 00:26:03,000
The only difference is we have a WHERE clause on so we can

644
00:26:03,000 --> 00:26:06,000
identify those new records inside of our staging table.

645
00:26:06,000 --> 00:26:10,000
To identify updated records, we do not have a modified date

646
00:26:10,000 --> 00:26:12,000
inside of our staging or our production table, that was

647
00:26:12,000 --> 00:26:13,000
purely in the source.

648
00:26:13,000 --> 00:26:16,000
But we can still identify which records are modified

649
00:26:16,000 --> 00:26:20,000
records in our staging area, because we can do anything

650
00:26:20,000 --> 00:26:22,000
that's prior to the sale number.

651
00:26:22,000 --> 00:26:25,000
Remember, anything after the sale number is a new record,

652
00:26:25,000 --> 00:26:27,000
so anything prior to the sale number then, in our staging

653
00:26:27,000 --> 00:26:30,000
table, is a modified record.

654
00:26:30,000 --> 00:26:33,000
And I'm just going to update three fields in here.

655
00:26:33,000 --> 00:26:36,000
We're setting the prod equal to the staging.

656
00:26:36,000 --> 00:26:38,000
Obviously, in a real world, we'd want to do all of the

657
00:26:38,000 --> 00:26:41,000
fields, but for now, just for demonstration purposes, we'll

658
00:26:41,000 --> 00:26:41,000
just do these dates.

659
00:26:41,000 --> 00:26:43,000
And we'll make sure that we change these dates and the

660
00:26:43,000 --> 00:26:46,000
modified date in the source data when we do our testing,

661
00:26:46,000 --> 00:26:50,000
so we see both new records and modified records flow into our

662
00:26:50,000 --> 00:26:51,000
production table.

663
00:26:51,000 --> 00:26:51,000
So this looks good.

664
00:26:51,000 --> 00:26:53,000
The only thing we need to do here is we need to wire in

665
00:26:53,000 --> 00:26:55,000
these parameters into our Execute SQL task.

666
00:26:55,000 --> 00:26:57,000
So we can go right into our parameter

667
00:26:57,000 --> 00:26:59,000
mapping, we can hit add.

668
00:26:59,000 --> 00:27:01,000
We'll need to hit it twice.

669
00:27:01,000 --> 00:27:02,000
First parameter-- actually they're both going to be the

670
00:27:02,000 --> 00:27:05,000
same parameter, they're both going to be the User MAXKey,

671
00:27:05,000 --> 00:27:07,000
so let's specify those.

672
00:27:07,000 --> 00:27:11,000
And then they're both input, these are actually character

673
00:27:11,000 --> 00:27:14,000
data, so we'll choose VARCHAR here.

674
00:27:14,000 --> 00:27:17,000
Sales order number is a character.

675
00:27:17,000 --> 00:27:19,000
And then we want to do zero for the parameter name, and

676
00:27:19,000 --> 00:27:20,000
one for the second parameter name.

677
00:27:20,000 --> 00:27:23,000
That lines them up, and that's it.

678
00:27:23,000 --> 00:27:24,000
Let's test this package out.

679
00:27:24,000 --> 00:27:26,000
So let's head back into Management Studio.

680
00:27:26,000 --> 00:27:28,000
I have an insert statement here, a couple of them.

681
00:27:28,000 --> 00:27:29,000
One of them will add one to the Sales Order header and the

682
00:27:29,000 --> 00:27:30,000
sales order details.

683
00:27:30,000 --> 00:27:32,000
Since these are inner-joined, we may have to make sure that

684
00:27:32,000 --> 00:27:33,000
we have a detail record.

685
00:27:33,000 --> 00:27:38,000
So if we execute this against Adventure Works, our OLTP

686
00:27:38,000 --> 00:27:40,000
database, that'll add a new record.

687
00:27:40,000 --> 00:27:42,000
Now, let's also get an updated record in here.

688
00:27:42,000 --> 00:27:46,000
Let's go right into tables here in Adventure Works-- not

689
00:27:46,000 --> 00:27:48,000
Data Warehouse, Adventure Works, there we go.

690
00:27:48,000 --> 00:27:52,000
Let's head into our sales order header table down here.

691
00:27:52,000 --> 00:27:53,000
There it is.

692
00:27:53,000 --> 00:27:56,000
Let's just right click on it and choose Edit Top 200 Rows,

693
00:27:56,000 --> 00:28:00,000
and let's change in order of date, or how

694
00:28:00,000 --> 00:28:00,000
about a ship date.

695
00:28:00,000 --> 00:28:03,000
Let's say the ship date has changed here.

696
00:28:03,000 --> 00:28:11,000
Let's just do 2013-01-01, and we have to change modified

697
00:28:11,000 --> 00:28:14,000
date too, since that's what we're running on.

698
00:28:14,000 --> 00:28:17,000
So let's go back, and there it is.

699
00:28:17,000 --> 00:28:21,000
This one, let's do 2013-02-01.

700
00:28:21,000 --> 00:28:25,000
All right, we have changed data, we have new data.

701
00:28:25,000 --> 00:28:27,000
Let's head back into the development environment.

702
00:28:27,000 --> 00:28:29,000
And something really neat that I want to show you here is if

703
00:28:29,000 --> 00:28:33,000
we had into our data flow and we double click on the actual

704
00:28:33,000 --> 00:28:36,000
line itself, we can head down to data viewer and enable our

705
00:28:36,000 --> 00:28:37,000
data viewer.

706
00:28:37,000 --> 00:28:39,000
This is our window into the data flow.

707
00:28:39,000 --> 00:28:41,000
Any time we execute a package from the development

708
00:28:41,000 --> 00:28:44,000
environment and it hits this flow, any time it has a

709
00:28:44,000 --> 00:28:46,000
magnifying glass on it, it's going to pause right there and

710
00:28:46,000 --> 00:28:48,000
open up a window that'll show us all of

711
00:28:48,000 --> 00:28:50,000
the data in the flow.

712
00:28:50,000 --> 00:28:52,000
Really handy for troubleshooting and doing

713
00:28:52,000 --> 00:28:53,000
stuff like this.

714
00:28:53,000 --> 00:28:55,000
So this allows us to see those two changed records.

715
00:28:55,000 --> 00:28:58,000
Let's hit run here and see if we get two changed records.

716
00:28:58,000 --> 00:29:01,000
And you see, it'll pause right there until you

717
00:29:01,000 --> 00:29:01,000
hit the play button.

718
00:29:01,000 --> 00:29:05,000
Now we only have one record, and this is like a new one.

719
00:29:05,000 --> 00:29:08,000
I know the highest sale order number before we did that

720
00:29:08,000 --> 00:29:10,000
insert was 123 as the last three digits.

721
00:29:10,000 --> 00:29:14,000
So 124 tells me this is the new record making it through,

722
00:29:14,000 --> 00:29:18,000
but our modified record didn't make it over.

723
00:29:18,000 --> 00:29:20,000
So let's stop this and find out why.

724
00:29:20,000 --> 00:29:23,000
Let's head back into Management Studio, and I think

725
00:29:23,000 --> 00:29:23,000
I know why.

726
00:29:23,000 --> 00:29:26,000
If we scroll over to the left here, remember, this is

727
00:29:26,000 --> 00:29:27,000
FactInternetSales.

728
00:29:27,000 --> 00:29:28,000
Look at this.

729
00:29:28,000 --> 00:29:31,000
We should have put a filter on, because the online order

730
00:29:31,000 --> 00:29:32,000
flag is false.

731
00:29:32,000 --> 00:29:35,000
So let's modify a record that is actually an

732
00:29:35,000 --> 00:29:36,000
online order record.

733
00:29:36,000 --> 00:29:38,000
Let's just put a WHERE clause here in our SQL statement,

734
00:29:38,000 --> 00:29:42,000
WHERE OnlineOrderFlag equals 1.

735
00:29:42,000 --> 00:29:45,000
We'll hit execute, so these are all online orders.

736
00:29:45,000 --> 00:29:47,000
This is the kind of data we should be modifying.

737
00:29:47,000 --> 00:29:50,000
So let's just head over here and change or modify a date.

738
00:29:50,000 --> 00:29:55,000
We'll change it to today's date here, so this year's

739
00:29:55,000 --> 00:29:59,000
date, 02-01, that looks good.

740
00:29:59,000 --> 00:30:03,000
Now if we flip back and run our package, it should pause

741
00:30:03,000 --> 00:30:04,000
on our data flow, and look at that.

742
00:30:04,000 --> 00:30:05,000
We have two records.

743
00:30:05,000 --> 00:30:07,000
So here's the one that we just changed.

744
00:30:07,000 --> 00:30:11,000
Now if we hit run, it's going to send them over.

745
00:30:11,000 --> 00:30:11,000
Awesome.

746
00:30:11,000 --> 00:30:16,000
So we've got those two new records over into production

747
00:30:16,000 --> 00:30:18,000
from our set-based operation, and that's how we can do

748
00:30:18,000 --> 00:30:22,000
incremental packages using straight trans x equal.

749
00:30:22,000 --> 00:30:25,000
Now, this will be a better solution for CDC, the change

750
00:30:25,000 --> 00:30:29,000
data capture, because we don't have that modified date issue

751
00:30:29,000 --> 00:30:33,000
where we would have to control it through a parameter.

752
00:30:33,000 --> 00:30:35,000
Because if we were to run this package again, that same

753
00:30:35,000 --> 00:30:38,000
record would keep coming over, because it has a modified date

754
00:30:38,000 --> 00:30:39,000
greater than the date that we stored

755
00:30:39,000 --> 00:30:41,000
inside of our parameter.

756
00:30:41,000 --> 00:30:44,000
So again, we could do this easily if we did create our

757
00:30:44,000 --> 00:30:47,000
own intermediate table that tracked every time we ran what

758
00:30:47,000 --> 00:30:48,000
our last modified date was.

759
00:30:48,000 --> 00:30:49,000
We could do that.

760
00:30:49,000 --> 00:30:53,000
But CDC can do all that for us.

761
00:30:53,000 --> 00:30:53,000
Let's see how this stuff works.

762
00:30:53,000 --> 00:30:55,000
So what we're going to do is we're going to open up the

763
00:30:55,000 --> 00:30:57,000
CDC.SQL script over here in Management Studio.

764
00:30:57,000 --> 00:31:01,000
So we'll just do a File, Open File, and here

765
00:31:01,000 --> 00:31:02,000
is our CDC.SQL script.

766
00:31:02,000 --> 00:31:05,000
If we open this up, the first thing we need to do is enable

767
00:31:05,000 --> 00:31:07,000
it on the database.

768
00:31:07,000 --> 00:31:11,000
So what we can do is switch to Adventure Works 2012, and then

769
00:31:11,000 --> 00:31:17,000
we will run the first EXEC statement, sp_cdc_enable_db.

770
00:31:17,000 --> 00:31:21,000
This will turn it on, get all the objects installed for CDC

771
00:31:21,000 --> 00:31:23,000
tracking inside of the Adventure Works database.

772
00:31:23,000 --> 00:31:24,000
So let's do that first.

773
00:31:24,000 --> 00:31:25,000
We'll just highlight those lines of code.

774
00:31:25,000 --> 00:31:27,000
It'll switch this to Adventure Works, it'll

775
00:31:27,000 --> 00:31:30,000
enable it in the database.

776
00:31:30,000 --> 00:31:31,000
Now we want to enable it in the table.

777
00:31:31,000 --> 00:31:34,000
And by the way, what that did, if we open up Adventurer Works

778
00:31:34,000 --> 00:31:37,000
2012 here, tables, and let's go right into system tables,

779
00:31:37,000 --> 00:31:38,000
check it out.

780
00:31:38,000 --> 00:31:39,000
This is what it did.

781
00:31:39,000 --> 00:31:41,000
It added a bunch of system tables that SQL will use to

782
00:31:41,000 --> 00:31:44,000
track all of the changed data.

783
00:31:44,000 --> 00:31:47,000
So now, we want to enable it for our sales

784
00:31:47,000 --> 00:31:48,000
order header table.

785
00:31:48,000 --> 00:31:50,000
So we need to pass in the source schema sales, the table

786
00:31:50,000 --> 00:31:53,000
name, and then optionally, you can assign it a role here.

787
00:31:53,000 --> 00:31:56,000
So let's hit execute, that'll enable it on a table, take a

788
00:31:56,000 --> 00:32:00,000
little bit to set everything up, and we'll be good to go.

789
00:32:00,000 --> 00:32:02,000
And that's really all there is to it.

790
00:32:02,000 --> 00:32:06,000
So now CDC is fully enabled and running on the database.

791
00:32:06,000 --> 00:32:10,000
Any time we make any changes into the sales order header

792
00:32:10,000 --> 00:32:13,000
table, if we alter a row, add a row, whatever we do to that

793
00:32:13,000 --> 00:32:17,000
table, SQL is tracking it underneath the hood in those

794
00:32:17,000 --> 00:32:19,000
system tables that we just looked at.

795
00:32:19,000 --> 00:32:21,000
And if you ever want to view CDC information for a

796
00:32:21,000 --> 00:32:25,000
database, you can run the sp_cdc_help_change_data_capture

797
00:32:25,000 --> 00:32:28,000
system stored procedure here, if we can execute, here it is.

798
00:32:28,000 --> 00:32:32,000
This will show us all the tables that our CDC enabled.

799
00:32:32,000 --> 00:32:34,000
And by the way, this is important if you ever want to

800
00:32:34,000 --> 00:32:37,000
remove CDC, because see this capture instance here,

801
00:32:37,000 --> 00:32:39,000
Sales_SalesOrderHeader?

802
00:32:39,000 --> 00:32:42,000
You need that if you ever want to disable it for a table.

803
00:32:42,000 --> 00:32:45,000
Same thing though, schema, name of the table, and then

804
00:32:45,000 --> 00:32:46,000
the instance of the table.

805
00:32:46,000 --> 00:32:48,000
If we were to run this, it would

806
00:32:48,000 --> 00:32:51,000
disable CDC for the table.

807
00:32:51,000 --> 00:32:53,000
The last thing we need to do for prep here is create

808
00:32:53,000 --> 00:32:54,000
another staging table.

809
00:32:54,000 --> 00:32:58,000
This one we're going to call cdcstaging, this way we don't

810
00:32:58,000 --> 00:33:01,000
step on the previous demo that we did, kind of creating our

811
00:33:01,000 --> 00:33:03,000
own staging area for the CDC demo.

812
00:33:03,000 --> 00:33:05,000
And rather than model the entire FactInternetSales

813
00:33:05,000 --> 00:33:08,000
table, we're just going to take the sales order ID out,

814
00:33:08,000 --> 00:33:11,000
and with CDC's help, we'll be able to identify what

815
00:33:11,000 --> 00:33:12,000
operation we did.

816
00:33:12,000 --> 00:33:15,000
So this will make it very easy for us to do a set-based

817
00:33:15,000 --> 00:33:19,000
update using this operation column, along with

818
00:33:19,000 --> 00:33:20,000
the sales order ID.

819
00:33:20,000 --> 00:33:21,000
So let's do this.

820
00:33:21,000 --> 00:33:25,000
Let's switch over here to our Adventure Works DW 2012,

821
00:33:25,000 --> 00:33:27,000
highlight our create table statement, hit execute, and

822
00:33:27,000 --> 00:33:28,000
there we go.

823
00:33:28,000 --> 00:33:31,000
And by the way, if we flip back here into SQL Server data

824
00:33:31,000 --> 00:33:34,000
tools, that's exactly the same code I have here inside of our

825
00:33:34,000 --> 00:33:34,000
create staging.

826
00:33:34,000 --> 00:33:37,000
So that will get recreated every time we run the package.

827
00:33:37,000 --> 00:33:39,000
So with CDC enabled at the database level, we have a

828
00:33:39,000 --> 00:33:42,000
table enabled for CDC, and we have our staging

829
00:33:42,000 --> 00:33:43,000
environments set up.

830
00:33:43,000 --> 00:33:47,000
The last step is to create an SSIS package that controls all

831
00:33:47,000 --> 00:33:51,000
that data, pushes the new and modified data into our

832
00:33:51,000 --> 00:33:53,000
destination, our data warehouse.

833
00:33:53,000 --> 00:33:56,000
In order to do this, we need to help out CDC

834
00:33:56,000 --> 00:33:57,000
a little bit here.

835
00:33:57,000 --> 00:34:00,000
And we do this through what's known as the CDC control task.

836
00:34:00,000 --> 00:34:04,000
Inside of other tasks, you'll see here the CDC control task.

837
00:34:04,000 --> 00:34:06,000
I have two of them on.

838
00:34:06,000 --> 00:34:07,000
Let me enable both of them.

839
00:34:07,000 --> 00:34:09,000
We have our CDC start, and at the very bottom, we

840
00:34:09,000 --> 00:34:12,000
have our CDC end.

841
00:34:12,000 --> 00:34:17,000
So the easy control task is there to tell CDC what to push

842
00:34:17,000 --> 00:34:20,000
over and what it already pushed over, and it's going to

843
00:34:20,000 --> 00:34:22,000
do this by managing state.

844
00:34:22,000 --> 00:34:25,000
And let's open up our CDC start task.

845
00:34:25,000 --> 00:34:29,000
So this is all done through an ADO.NET connection manager.

846
00:34:29,000 --> 00:34:33,000
That's why I added an ADO.NET connection package manager

847
00:34:33,000 --> 00:34:34,000
right down here.

848
00:34:34,000 --> 00:34:36,000
We have project connection managers, these are both OLE

849
00:34:36,000 --> 00:34:38,000
DBs, so we couldn't use these, they're not even available on

850
00:34:38,000 --> 00:34:39,000
the drop down.

851
00:34:39,000 --> 00:34:42,000
So that's why we created our own, called AW CDC that points

852
00:34:42,000 --> 00:34:46,000
to Adventure Works 2012, where we have CDC enabled.

853
00:34:46,000 --> 00:34:48,000
The next thing we need to do here is we need to specify the

854
00:34:48,000 --> 00:34:50,000
control operation.

855
00:34:50,000 --> 00:34:53,000
CDC expects you to do an initial load, so that's why we

856
00:34:53,000 --> 00:34:55,000
have a package that does both here.

857
00:34:55,000 --> 00:34:57,000
We're going to use this package first as an initial

858
00:34:57,000 --> 00:35:00,000
loader, then we're going to switch these control tasks to

859
00:35:00,000 --> 00:35:03,000
work on incremental changes only.

860
00:35:03,000 --> 00:35:05,000
But you do need to do an initial load first so CDC

861
00:35:05,000 --> 00:35:07,000
knows that you did one, and it can mark that you've done an

862
00:35:07,000 --> 00:35:10,000
initial load, so it knows where to start for the

863
00:35:10,000 --> 00:35:12,000
modified data and new records.

864
00:35:12,000 --> 00:35:14,000
So that's why, during your initial load, you're going to

865
00:35:14,000 --> 00:35:18,000
set up the control task first as a mark initial load start.

866
00:35:18,000 --> 00:35:20,000
You also need to specify a variable.

867
00:35:20,000 --> 00:35:23,000
This is so CDC can take all this state that it's going to

868
00:35:23,000 --> 00:35:26,000
store using a table down below, and use it across

869
00:35:26,000 --> 00:35:28,000
package executions.

870
00:35:28,000 --> 00:35:29,000
So if you don't have a variable in here, just hit

871
00:35:29,000 --> 00:35:32,000
new, give it a name, and there you go.

872
00:35:32,000 --> 00:35:34,000
Now you have a variable for our package

873
00:35:34,000 --> 00:35:35,000
to track CDC state.

874
00:35:35,000 --> 00:35:41,000
Then also for, again, cross-package execution, where

875
00:35:41,000 --> 00:35:44,000
is this variable data going to go when the package ends?

876
00:35:44,000 --> 00:35:47,000
And so you can create a table here to put that data.

877
00:35:47,000 --> 00:35:49,000
So again, specify the connection manager, hit the

878
00:35:49,000 --> 00:35:53,000
New button here, which is what I did, and it will create the

879
00:35:53,000 --> 00:35:55,000
table, create the structure here of what it

880
00:35:55,000 --> 00:35:57,000
needs to store state.

881
00:35:57,000 --> 00:36:00,000
So I did that, I hit OK and Run, and it created that

882
00:36:00,000 --> 00:36:02,000
staging table, and you can also give it a name here.

883
00:36:02,000 --> 00:36:05,000
And then once you hit OK, you're good to go.

884
00:36:05,000 --> 00:36:07,000
And then you'll just need to go down to the very end of the

885
00:36:07,000 --> 00:36:10,000
package, set up another CDC control.

886
00:36:10,000 --> 00:36:11,000
The only difference here--

887
00:36:11,000 --> 00:36:13,000
you'll want to point to the same variables and same tables

888
00:36:13,000 --> 00:36:15,000
and such-- is mark the initial load end.

889
00:36:15,000 --> 00:36:18,000
So that's the only difference between both of these control

890
00:36:18,000 --> 00:36:22,000
tasks is one is the start and one marks it as the end.

891
00:36:22,000 --> 00:36:23,000
So we're ready to go for our initial load.

892
00:36:23,000 --> 00:36:27,000
Let's remove some of this incremental stuff out of here,

893
00:36:27,000 --> 00:36:32,000
and wire up everything to our initial load data flow.

894
00:36:32,000 --> 00:36:33,000
There we go.

895
00:36:33,000 --> 00:36:35,000
And if we enable this data flow and head into it, you'll

896
00:36:35,000 --> 00:36:37,000
notice we're not doing anything special.

897
00:36:37,000 --> 00:36:40,000
We're simply using an OLE DB source pointed at our CDC

898
00:36:40,000 --> 00:36:43,000
enabled table, and pushing all that data using an OLE DB

899
00:36:43,000 --> 00:36:45,000
destination to our staging table.

900
00:36:45,000 --> 00:36:47,000
We're not doing anything CDC special in here.

901
00:36:47,000 --> 00:36:50,000
We're not using the CDC source or any CDC transformations.

902
00:36:50,000 --> 00:36:54,000
That's because your initial load is really just all about

903
00:36:54,000 --> 00:36:57,000
setting up the state for CDC using these control tasks.

904
00:36:57,000 --> 00:37:00,000
That way it's going to take all of this initial load

905
00:37:00,000 --> 00:37:04,000
information, store it into its CDC table so it knows where

906
00:37:04,000 --> 00:37:05,000
our starting point is.

907
00:37:05,000 --> 00:37:07,000
And that way, future incremental loads, it will

908
00:37:07,000 --> 00:37:11,000
only send over the data that is changed.

909
00:37:11,000 --> 00:37:12,000
So let's see how to do the incremental now.

910
00:37:12,000 --> 00:37:14,000
Let's hook everything back into our

911
00:37:14,000 --> 00:37:20,000
incremental data flow here.

912
00:37:20,000 --> 00:37:21,000
There we go.

913
00:37:21,000 --> 00:37:23,000
And let's enable everything, so we'll enable our

914
00:37:23,000 --> 00:37:26,000
incremental load, we'll enable our set-based updates,

915
00:37:26,000 --> 00:37:29,000
inserts, and deletes.

916
00:37:29,000 --> 00:37:32,000
And the only thing we need to do here in control flow is

917
00:37:32,000 --> 00:37:34,000
head into our CDC start, and get it off of

918
00:37:34,000 --> 00:37:35,000
mark initial load.

919
00:37:35,000 --> 00:37:39,000
Now we want to choose get processing range, hit OK, and

920
00:37:39,000 --> 00:37:43,000
our CDC end, we want to hit mark processing range.

921
00:37:43,000 --> 00:37:46,000
And by the way, this other one, mark CDC start, is if you

922
00:37:46,000 --> 00:37:48,000
want to control where it goes.

923
00:37:48,000 --> 00:37:50,000
Then you can have a little manual control

924
00:37:50,000 --> 00:37:52,000
over where it starts.

925
00:37:52,000 --> 00:37:56,000
And SQL Server controls it all through LSNs, which is what

926
00:37:56,000 --> 00:38:00,000
the transaction log uses to mark transactions.

927
00:38:00,000 --> 00:38:04,000
So let's go ahead and choose mark process range so the get

928
00:38:04,000 --> 00:38:07,000
starts it, the mark ends it, and now the only thing we need

929
00:38:07,000 --> 00:38:10,000
to do is go into our data flow, and this time we use a

930
00:38:10,000 --> 00:38:12,000
CDC source.

931
00:38:12,000 --> 00:38:14,000
So you open up your CDC source to find your connection

932
00:38:14,000 --> 00:38:18,000
manager point at your CDC enabled table, the instance,

933
00:38:18,000 --> 00:38:22,000
choose your processing mode, all with old values.

934
00:38:22,000 --> 00:38:25,000
So what happens here is SQL Server tracks not only

935
00:38:25,000 --> 00:38:28,000
inserts, updates, and deletes, but it tracks the before

936
00:38:28,000 --> 00:38:31,000
update and the after update, which is a lot like what

937
00:38:31,000 --> 00:38:32,000
triggers do.

938
00:38:32,000 --> 00:38:36,000
So all with old values also gives you access to the data

939
00:38:36,000 --> 00:38:40,000
how it looked before the change, so current values and

940
00:38:40,000 --> 00:38:41,000
previous values kind of thing.

941
00:38:41,000 --> 00:38:43,000
So we're going to choose net here to give us anything that

942
00:38:43,000 --> 00:38:46,000
has changed since the last load, inserts, or updates, or

943
00:38:46,000 --> 00:38:49,000
deletes, and then you choose the variable that holds he

944
00:38:49,000 --> 00:38:50,000
information.

945
00:38:50,000 --> 00:38:52,000
And we check out the columns, you can see the output is

946
00:38:52,000 --> 00:38:56,000
everything from our table, our CDC table, plus CDC adds its

947
00:38:56,000 --> 00:38:58,000
own extra columns in here to work with.

948
00:38:58,000 --> 00:39:00,000
And the one we're really going to be concerned with here is

949
00:39:00,000 --> 00:39:02,000
the operation count.

950
00:39:02,000 --> 00:39:04,000
This is nice because this will allow us to identify what

951
00:39:04,000 --> 00:39:06,000
operation occurred on the record.

952
00:39:06,000 --> 00:39:09,000
Is it an insert, was it a delete, was it a before

953
00:39:09,000 --> 00:39:10,000
update, an after update?

954
00:39:10,000 --> 00:39:11,000
That kind of stuff.

955
00:39:11,000 --> 00:39:14,000
So if we hit OK here, let's head down to the CDC splitter.

956
00:39:14,000 --> 00:39:17,000
Now we don't really need the CDC splitter in this case,

957
00:39:17,000 --> 00:39:20,000
because in our staging table, we're tracking what

958
00:39:20,000 --> 00:39:21,000
operation it was.

959
00:39:21,000 --> 00:39:23,000
But I wanted to show it to you here just so you can get an

960
00:39:23,000 --> 00:39:25,000
idea of what it does, and it's really simple.

961
00:39:25,000 --> 00:39:27,000
There's really no configurable properties on the splitter.

962
00:39:27,000 --> 00:39:31,000
You just drag your source to the splitter and that splits

963
00:39:31,000 --> 00:39:34,000
the output for us, and now we can identify through our data

964
00:39:34,000 --> 00:39:38,000
flow what was inserted, what was updated, what was deleted.

965
00:39:38,000 --> 00:39:41,000
So we could do transformations on specific ones, or just go

966
00:39:41,000 --> 00:39:43,000
straight into our tables if we need to.

967
00:39:43,000 --> 00:39:45,000
So that's pretty neat.

968
00:39:45,000 --> 00:39:47,000
But all of these destinations do the same thing.

969
00:39:47,000 --> 00:39:50,000
They just load it up, they load up the sale order ID and

970
00:39:50,000 --> 00:39:53,000
the operation into our staging tables.

971
00:39:53,000 --> 00:39:54,000
So they all do the same exact thing.

972
00:39:54,000 --> 00:39:58,000
Again, we could just omit the splitter and do a source right

973
00:39:58,000 --> 00:40:00,000
into one destination because, again, we store the operation.

974
00:40:00,000 --> 00:40:01,000
But that's it.

975
00:40:01,000 --> 00:40:05,000
Now, once all that data is loaded into staging, we can do

976
00:40:05,000 --> 00:40:08,000
our set-based insert, updates, and deletes.

977
00:40:08,000 --> 00:40:12,000
So we have an insert which, really, all I'm doing here is

978
00:40:12,000 --> 00:40:14,000
now we're molding what the

979
00:40:14,000 --> 00:40:17,000
FactInternet table is expecting.

980
00:40:17,000 --> 00:40:19,000
Because we're just running a big honking select statement,

981
00:40:19,000 --> 00:40:22,000
the same query we ran over when we talked about our

982
00:40:22,000 --> 00:40:25,000
incremental package previously, this OLTP query.

983
00:40:25,000 --> 00:40:29,000
It's just that we're also joining to it our CDC staging

984
00:40:29,000 --> 00:40:30,000
on the sale order ID.

985
00:40:30,000 --> 00:40:33,000
And we're putting a WHERE clause where operation column

986
00:40:33,000 --> 00:40:37,000
in that staging table equals 2, which is inserts.

987
00:40:37,000 --> 00:40:41,000
So that will do a set-based update against our production

988
00:40:41,000 --> 00:40:45,000
table, using our staging table where that operations is two.

989
00:40:45,000 --> 00:40:49,000
If we look at our updates SQL task, we're doing the same

990
00:40:49,000 --> 00:40:51,000
thing we did with our previous incremental package, just

991
00:40:51,000 --> 00:40:54,000
updating those three columns.

992
00:40:54,000 --> 00:40:56,000
Only here's the important part--

993
00:40:56,000 --> 00:40:59,000
where the operation equals 4, this is after updates, where 3

994
00:40:59,000 --> 00:41:00,000
is before updates.

995
00:41:00,000 --> 00:41:03,000
So again, CDC, great for auditing, give you that

996
00:41:03,000 --> 00:41:05,000
snapshot before and after look of data.

997
00:41:05,000 --> 00:41:09,000
And so 4 is updates, 3 before updates, and then finally

998
00:41:09,000 --> 00:41:12,000
here, 1, is deletes.

999
00:41:12,000 --> 00:41:15,000
So you can see where the operation equals 1, it will

1000
00:41:15,000 --> 00:41:17,000
identify all the deleted records here

1001
00:41:17,000 --> 00:41:18,000
in our staging table.

1002
00:41:18,000 --> 00:41:19,000
So the only thing left, really, to do

1003
00:41:19,000 --> 00:41:21,000
here is test it out.

1004
00:41:21,000 --> 00:41:22,000
So we need to modify some data.

1005
00:41:22,000 --> 00:41:25,000
Let's flip back over to SQL Server Management Studio, and

1006
00:41:25,000 --> 00:41:27,000
I opened up our sales order header table once again.

1007
00:41:27,000 --> 00:41:29,000
This time I remembered to add our online

1008
00:41:29,000 --> 00:41:30,000
order flag equals 1.

1009
00:41:30,000 --> 00:41:32,000
So now, if we change some data in here-- and

1010
00:41:32,000 --> 00:41:33,000
we can change anything.

1011
00:41:33,000 --> 00:41:35,000
We don't have to change just what we were looking at

1012
00:41:35,000 --> 00:41:39,000
previously, like the modified date, because CDC's enabled on

1013
00:41:39,000 --> 00:41:40,000
the entire table.

1014
00:41:40,000 --> 00:41:43,000
So let's just pick a record here, we'll enter test, and

1015
00:41:43,000 --> 00:41:44,000
there we go.

1016
00:41:44,000 --> 00:41:46,000
Now something else really neat, and we need to refresh

1017
00:41:46,000 --> 00:41:50,000
our tables here, but if you wanted to see and take a peek

1018
00:41:50,000 --> 00:41:53,000
inside of CDC, you could expand system tables.

1019
00:41:53,000 --> 00:41:55,000
And when we enabled it at the table level, the database

1020
00:41:55,000 --> 00:41:58,000
level added all these tables here.

1021
00:41:58,000 --> 00:42:00,000
But enabling it at the table level added a

1022
00:42:00,000 --> 00:42:03,000
Sales_SalesOrderHeader_cttable.

1023
00:42:03,000 --> 00:42:07,000
This is where SQL Server and CDC work together to track all

1024
00:42:07,000 --> 00:42:07,000
that information.

1025
00:42:07,000 --> 00:42:09,000
Because if we select top 1,000 rows, check it out.

1026
00:42:09,000 --> 00:42:12,000
Here's all the data for just that one change that we made,

1027
00:42:12,000 --> 00:42:15,000
because it tracks befores and afters.

1028
00:42:15,000 --> 00:42:17,000
And here's your operation, before, after, before, after.

1029
00:42:17,000 --> 00:42:20,000
Even though we only changed one, it's in here many times,

1030
00:42:20,000 --> 00:42:22,000
because it has revisions on it.

1031
00:42:22,000 --> 00:42:26,000
So if we close out of this, we head into SSIS, and we run our

1032
00:42:26,000 --> 00:42:30,000
package, we should get one row, and look at that, we do.

1033
00:42:30,000 --> 00:42:33,000
We have one row from our source, heading down our

1034
00:42:33,000 --> 00:42:34,000
splitter, heading down the update

1035
00:42:34,000 --> 00:42:36,000
path, and getting staged.

1036
00:42:36,000 --> 00:42:39,000
And then we don't obviously have any new records or

1037
00:42:39,000 --> 00:42:41,000
deleted records, so those won't find anything, but our

1038
00:42:41,000 --> 00:42:45,000
set-based update will have updated those fields inside of

1039
00:42:45,000 --> 00:42:46,000
our production database.

1040
00:42:46,000 --> 00:42:47,000
So that's how CDC works.

1041
00:42:47,000 --> 00:42:48,000
It's pretty straightforward.

1042
00:42:48,000 --> 00:42:49,000
There's really not a lot to it--

1043
00:42:49,000 --> 00:42:52,000
SQL Server does all the hard work, we just have to know and

1044
00:42:52,000 --> 00:42:55,000
remember to set up our start and our end, because these

1045
00:42:55,000 --> 00:42:59,000
control tasks just help CDC and SQL Server figure out,

1046
00:42:59,000 --> 00:43:02,000
through state, where it's at, and where it will go in the

1047
00:43:02,000 --> 00:43:05,000
future when it needs to send new and changed and deleted

1048
00:43:05,000 --> 00:43:08,000
records into our destinations.

1049
00:43:08,000 --> 00:43:10,000
So CDC can be a great alternative if you have no way

1050
00:43:10,000 --> 00:43:14,000
to identify changed information in

1051
00:43:14,000 --> 00:43:16,000
your SQL Server sources.

1052
00:43:16,000 --> 00:43:17,000
The last thing I want to show you here is how to do

1053
00:43:17,000 --> 00:43:20,000
partition switching, and we're going to do this just like we

1054
00:43:20,000 --> 00:43:22,000
did in our white board, where we're going to take our

1055
00:43:22,000 --> 00:43:25,000
FactInternetSales table, we're going to splice it up on

1056
00:43:25,000 --> 00:43:27,000
partitions by year.

1057
00:43:27,000 --> 00:43:30,000
And we're going to do this by using our full package.

1058
00:43:30,000 --> 00:43:32,000
So we're going to recreate our production table so we'll have

1059
00:43:32,000 --> 00:43:36,000
nothing in it, our staging table will have nothing in it,

1060
00:43:36,000 --> 00:43:38,000
and so this is a perfect thing, because we're going to

1061
00:43:38,000 --> 00:43:40,000
take all the data, push it into staging, and then do the

1062
00:43:40,000 --> 00:43:42,000
switch to get it in production, rather than a

1063
00:43:42,000 --> 00:43:43,000
set-based update.

1064
00:43:43,000 --> 00:43:46,000
So we can disable our set-based update, which is a

1065
00:43:46,000 --> 00:43:49,000
last step here in our full package.

1066
00:43:49,000 --> 00:43:51,000
We could just put our switch statement right in here in our

1067
00:43:51,000 --> 00:43:53,000
Execute SQL Task, but I want to do this manually so you can

1068
00:43:53,000 --> 00:43:55,000
see the process work.

1069
00:43:55,000 --> 00:43:57,000
So let's head back into Management Studio.

1070
00:43:57,000 --> 00:43:58,000
We're going to open up the query here

1071
00:43:58,000 --> 00:44:01,000
called partition switching.

1072
00:44:01,000 --> 00:44:03,000
And so what this query does is the first thing we're going to

1073
00:44:03,000 --> 00:44:07,000
do is we need to get some secondary data files in to

1074
00:44:07,000 --> 00:44:09,000
store each year's worth of data.

1075
00:44:09,000 --> 00:44:12,000
So we have four altered database statements that will

1076
00:44:12,000 --> 00:44:13,000
do just that.

1077
00:44:13,000 --> 00:44:16,000
So let's highlight all this code, hit execute.

1078
00:44:16,000 --> 00:44:18,000
That'll take a little, bit because it's creating 100mb

1079
00:44:18,000 --> 00:44:20,000
files, but where we're putting them is right here.

1080
00:44:20,000 --> 00:44:24,000
I created a data directory here in the 12-DataLoading

1081
00:44:24,000 --> 00:44:26,000
directory for this Nugget, and you can see that's creating

1082
00:44:26,000 --> 00:44:30,000
all of them and allocating all that space for all four files.

1083
00:44:30,000 --> 00:44:31,000
And look at that, we're good to go.

1084
00:44:31,000 --> 00:44:33,000
Now we need to set up our partition stuff, and we need

1085
00:44:33,000 --> 00:44:35,000
to switch our database here, because we're going to go

1086
00:44:35,000 --> 00:44:36,000
against production.

1087
00:44:36,000 --> 00:44:37,000
There we go.

1088
00:44:37,000 --> 00:44:40,000
And so we need to change to Adventure Works DW 2012, we

1089
00:44:40,000 --> 00:44:43,000
need to create our partition function, which specifies our

1090
00:44:43,000 --> 00:44:46,000
ranges for each year, and then we need to create a partition

1091
00:44:46,000 --> 00:44:50,000
scheme tied to that function, so that way, when we create

1092
00:44:50,000 --> 00:44:54,000
tables, we're going to create tables rather than our file

1093
00:44:54,000 --> 00:44:56,000
group, we're going to create them on this partition scheme,

1094
00:44:56,000 --> 00:44:58,000
which is tied to this function.

1095
00:44:58,000 --> 00:44:59,000
So let's do that.

1096
00:44:59,000 --> 00:45:02,000
We'll hit execute here, that will create our function,

1097
00:45:02,000 --> 00:45:04,000
create our scheme tied to that function.

1098
00:45:04,000 --> 00:45:06,000
Now we just need to drop our table.

1099
00:45:06,000 --> 00:45:08,000
We need to drop our table, because we're going to

1100
00:45:08,000 --> 00:45:12,000
recreate it on our partition scheme.

1101
00:45:12,000 --> 00:45:15,000
And we're doing this all on order date key, by the way.

1102
00:45:15,000 --> 00:45:18,000
So order date key is what will get passed through the

1103
00:45:18,000 --> 00:45:21,000
partition function in order to splice up that data across all

1104
00:45:21,000 --> 00:45:23,000
those files by year.

1105
00:45:23,000 --> 00:45:26,000
So let's highlight all of this code and drop our

1106
00:45:26,000 --> 00:45:27,000
current fact table.

1107
00:45:27,000 --> 00:45:29,000
There we go.

1108
00:45:29,000 --> 00:45:35,000
And now we can simply recreate our FactInternetSales table on

1109
00:45:35,000 --> 00:45:36,000
that partition scheme.

1110
00:45:36,000 --> 00:45:37,000
So let's do it.

1111
00:45:37,000 --> 00:45:39,000
Let's create the production table.

1112
00:45:39,000 --> 00:45:42,000
All right, now let's head down here and we can create our

1113
00:45:42,000 --> 00:45:44,000
staging table as well here.

1114
00:45:44,000 --> 00:45:47,000
I'm doing this just so we can compare apples to apples, but

1115
00:45:47,000 --> 00:45:51,000
this staging table does not have to be on a partition.

1116
00:45:51,000 --> 00:45:53,000
We can simply have a non-partition table, and then

1117
00:45:53,000 --> 00:45:56,000
when we do the switch, we just wouldn't specify a partition

1118
00:45:56,000 --> 00:45:57,000
for the staging table.

1119
00:45:57,000 --> 00:45:59,000
But we're going to create one on there anyway.

1120
00:45:59,000 --> 00:46:02,000
We'll hit OK, hit execute there, and we're good to go.

1121
00:46:02,000 --> 00:46:05,000
So the last thing to do now is do the switch.

1122
00:46:05,000 --> 00:46:08,000
But we need to load data into our staging table to do that.

1123
00:46:08,000 --> 00:46:10,000
But literally this is it for the switch statement.

1124
00:46:10,000 --> 00:46:14,000
We're switching partition four of our staging table with

1125
00:46:14,000 --> 00:46:16,000
partition four of our internet sales table, which is going to

1126
00:46:16,000 --> 00:46:18,000
be or 2008 data.

1127
00:46:18,000 --> 00:46:20,000
So let's head into SSIS.

1128
00:46:20,000 --> 00:46:22,000
Let's just disable our create staging, too.

1129
00:46:22,000 --> 00:46:23,000
We no longer need that.

1130
00:46:23,000 --> 00:46:26,000
We just simply need to stage the data.

1131
00:46:26,000 --> 00:46:27,000
And what we're going to do is go into our source, and we're

1132
00:46:27,000 --> 00:46:29,000
going to say where the year-- put a little

1133
00:46:29,000 --> 00:46:31,000
WHERE clause in here--

1134
00:46:31,000 --> 00:46:37,000
WHERE year, order date, equals 2008.

1135
00:46:37,000 --> 00:46:40,000
And that's an integer, so we don't need to

1136
00:46:40,000 --> 00:46:41,000
string enable it there.

1137
00:46:41,000 --> 00:46:42,000
That's it.

1138
00:46:42,000 --> 00:46:45,000
So this will only grab 2008 data out of our source.

1139
00:46:45,000 --> 00:46:46,000
That's all we should see in here.

1140
00:46:46,000 --> 00:46:48,000
If I hit the preview button, there it is.

1141
00:46:48,000 --> 00:46:50,000
All 2008 data.

1142
00:46:50,000 --> 00:46:52,000
So now we can simply execute this package, this will load

1143
00:46:52,000 --> 00:46:55,000
up our staging table with only 2008 data

1144
00:46:55,000 --> 00:46:58,000
inside of the 2008 partition.

1145
00:46:58,000 --> 00:47:01,000
So if we just check here, let's make sure that both

1146
00:47:01,000 --> 00:47:03,000
tables-- our staging table should have data in it, there

1147
00:47:03,000 --> 00:47:07,000
it is and our production table has no data in it.

1148
00:47:07,000 --> 00:47:09,000
So now we can run this really simple switch statement.

1149
00:47:09,000 --> 00:47:11,000
We're going to alter our staging table, switch the

1150
00:47:11,000 --> 00:47:15,000
partition four from our staging to the partition four

1151
00:47:15,000 --> 00:47:16,000
of our production.

1152
00:47:16,000 --> 00:47:17,000
Ready for this?

1153
00:47:17,000 --> 00:47:19,000
Boom, done, one second.

1154
00:47:19,000 --> 00:47:20,000
They should be reversed.

1155
00:47:20,000 --> 00:47:24,000
We should have data in our production table and no data

1156
00:47:24,000 --> 00:47:26,000
inside of our staging table.

1157
00:47:26,000 --> 00:47:26,000
Pretty cool, huh?

1158
00:47:26,000 --> 00:47:29,000
That's partition switching at its finest.

1159
00:47:29,000 --> 00:47:33,000
Just to reiterate, if your staging table is not built on

1160
00:47:33,000 --> 00:47:36,000
a partition, which most of the time, they won't be, then you

1161
00:47:36,000 --> 00:47:37,000
can just omit this part right here.

1162
00:47:37,000 --> 00:47:41,000
So you can do an ALTER TABLE, staging switch to production,

1163
00:47:41,000 --> 00:47:43,000
and then which partition in production you're

1164
00:47:43,000 --> 00:47:44,000
switching it into.

1165
00:47:44,000 --> 00:47:46,000
So there's partition switching.

1166
00:47:46,000 --> 00:47:50,000
Just wonderful stuff, doing even set-based updates and

1167
00:47:50,000 --> 00:47:54,000
operations against, production tables can cause problems in

1168
00:47:54,000 --> 00:47:56,000
extremely large data warehouses.

1169
00:47:56,000 --> 00:47:59,000
And who wouldn't want to do a one second metadata switch

1170
00:47:59,000 --> 00:48:00,000
from staging to production?

1171
00:48:00,000 --> 00:48:03,000
We can spend all the time in the world loading up staging,

1172
00:48:03,000 --> 00:48:05,000
and when we want to put it in production,

1173
00:48:05,000 --> 00:48:06,000
boom, just switch it.

1174
00:48:06,000 --> 00:48:08,000
Easy as that.

1175
00:48:08,000 --> 00:48:10,000
In this CBT Nugget, we learned how to load fact tables, and

1176
00:48:10,000 --> 00:48:13,000
saw the data loading options available to us.

1177
00:48:13,000 --> 00:48:15,000
We started with an overview, saw that we need a full

1178
00:48:15,000 --> 00:48:17,000
package for an initial load.

1179
00:48:17,000 --> 00:48:19,000
We need incremental package to deal with changes, and we saw

1180
00:48:19,000 --> 00:48:22,000
some of the challenges that go along with incremental loads,

1181
00:48:22,000 --> 00:48:25,000
and some of the technologies we could use to do so.

1182
00:48:25,000 --> 00:48:27,000
We then jumped into the Virtual Nugget Lab and we got

1183
00:48:27,000 --> 00:48:29,000
familiar with how to first do a full load.

1184
00:48:29,000 --> 00:48:32,000
We looked at how to do an incremental load, both using

1185
00:48:32,000 --> 00:48:36,000
SQL and using CDC, the change data capture

1186
00:48:36,000 --> 00:48:38,000
features of SQL Server.

1187
00:48:38,000 --> 00:48:41,000
At the end here, I showed you how to do partition switching

1188
00:48:41,000 --> 00:48:43,000
to quickly load up your production tables from your

1189
00:48:43,000 --> 00:48:44,000
staging tables.

1190
00:48:44,000 --> 00:48:46,000
I hope this has been informative for you, and I

1191
00:48:46,000 --> 00:48:47,000
thank you for viewing.
