1
1

00:00:00,180  -->  00:00:01,500
<v ->All data and information</v>
2

2

00:00:01,500  -->  00:00:03,820
has a lifecycle associated with it.
3

3

00:00:03,820  -->  00:00:06,520
The data life cycle is the entire period of time
4

4

00:00:06,520  -->  00:00:08,910
the data exists within your systems.
5

5

00:00:08,910  -->  00:00:11,930
Data goes through six main stages throughout its life cycle.
6

6

00:00:11,930  -->  00:00:15,580
From creation, to usage, to sharing, to storage,
7

7

00:00:15,580  -->  00:00:17,730
to archival, to destruction.
8

8

00:00:17,730  -->  00:00:19,830
First, we have data creation.
9

9

00:00:19,830  -->  00:00:21,600
Data can be created in your system
10

10

00:00:21,600  -->  00:00:24,510
whenever it's acquired, entered, or captured.
11

11

00:00:24,510  -->  00:00:26,840
Data acquisition occurs when existing data
12

12

00:00:26,840  -->  00:00:28,510
that's produced outside of your system
13

13

00:00:28,510  -->  00:00:31,060
is imported automatically into your system.
14

14

00:00:31,060  -->  00:00:34,090
For example, if I create an email and I sent to you,
15

15

00:00:34,090  -->  00:00:35,960
your system has acquired that data
16

16

00:00:35,960  -->  00:00:38,780
and began its life cycle within your systems.
17

17

00:00:38,780  -->  00:00:40,960
Data entry is going to occur when information
18

18

00:00:40,960  -->  00:00:42,820
is manually typed in your system
19

19

00:00:42,820  -->  00:00:44,960
by personnel within your organization.
20

20

00:00:44,960  -->  00:00:47,290
For example, if you open up a word document
21

21

00:00:47,290  -->  00:00:49,500
and you start taking notes while watching this lesson,
22

22

00:00:49,500  -->  00:00:51,820
you're going to be performing data entry.
23

23

00:00:51,820  -->  00:00:55,060
Now data capture occurs when data is generated by a device
24

24

00:00:55,060  -->  00:00:56,710
used in your organization.
25

25

00:00:56,710  -->  00:00:58,830
For example, if your routers and switches
26

26

00:00:58,830  -->  00:01:00,780
are constantly generating log files,
27

27

00:01:00,780  -->  00:01:03,200
those are a form of data capture.
28

28

00:01:03,200  -->  00:01:05,320
Second, we have data use.
29

29

00:01:05,320  -->  00:01:07,460
Now data uses the phase of the life cycle
30

30

00:01:07,460  -->  00:01:09,690
where data is put to work to achieve some purpose
31

31

00:01:09,690  -->  00:01:11,240
within your organization.
32

32

00:01:11,240  -->  00:01:13,730
If you're viewing, processing, modifying,
33

33

00:01:13,730  -->  00:01:17,120
or saving the data, you are currently performing data use.
34

34

00:01:17,120  -->  00:01:19,474
Every time a critical piece of data is opened and accessed,
35

35

00:01:19,474  -->  00:01:22,330
there should be an audit trail that maintains a log
36

36

00:01:22,330  -->  00:01:24,580
of who accessed the data and when.
37

37

00:01:24,580  -->  00:01:26,640
Third, we have data sharing.
38

38

00:01:26,640  -->  00:01:28,420
Now, data sharing occurs when a user
39

39

00:01:28,420  -->  00:01:30,500
makes the data available to somebody else
40

40

00:01:30,500  -->  00:01:32,290
outside of the organization.
41

41

00:01:32,290  -->  00:01:34,760
For example, when I began recording this video,
42

42

00:01:34,760  -->  00:01:36,780
only my staff had access to this video
43

43

00:01:36,780  -->  00:01:38,890
so we could create this course and all the subtitles
44

44

00:01:38,890  -->  00:01:40,480
for this particular lesson.
45

45

00:01:40,480  -->  00:01:42,700
But once we were at a point where we want you
46

46

00:01:42,700  -->  00:01:44,110
to be able to see this video,
47

47

00:01:44,110  -->  00:01:46,490
we had to share it with other organizations and people
48

48

00:01:46,490  -->  00:01:48,230
outside of Dion training.
49

49

00:01:48,230  -->  00:01:50,150
When data is shared, it's important that you
50

50

00:01:50,150  -->  00:01:51,810
put the right protections in place
51

51

00:01:51,810  -->  00:01:54,400
based on who should be able to access the data being shared
52

52

00:01:54,400  -->  00:01:56,870
and where that data should be shared to.
53

53

00:01:56,870  -->  00:01:58,900
Fourth, we have data storage.
54

54

00:01:58,900  -->  00:02:00,640
Now, data storage occurs when the data
55

55

00:02:00,640  -->  00:02:02,670
is not being actively used.
56

56

00:02:02,670  -->  00:02:05,540
Every piece of data needs to be stored for later retrieval,
57

57

00:02:05,540  -->  00:02:07,610
processing, use or transfer,
58

58

00:02:07,610  -->  00:02:09,930
but while it isn't actively being used,
59

59

00:02:09,930  -->  00:02:12,080
it's going to have to be stored someplace.
60

60

00:02:12,080  -->  00:02:14,510
Now the data may be stored as a digital file,
61

61

00:02:14,510  -->  00:02:16,730
such as a word document or a single item
62

62

00:02:16,730  -->  00:02:18,100
within a larger database,
63

63

00:02:18,100  -->  00:02:19,250
depending on the type of data
64

64

00:02:19,250  -->  00:02:21,120
and the protections it requires.
65

65

00:02:21,120  -->  00:02:22,410
Data that is going to be stored
66

66

00:02:22,410  -->  00:02:23,780
is going to be placed into an area
67

67

00:02:23,780  -->  00:02:27,300
that is instantly accessible when needed by your users.
68

68

00:02:27,300  -->  00:02:29,350
Fifth, we have data archival.
69

69

00:02:29,350  -->  00:02:32,320
Now data archival is the copying of data to an environment
70

70

00:02:32,320  -->  00:02:34,340
where it's going to be stored in case it's going to be needed
71

71

00:02:34,340  -->  00:02:37,020
in an active production environment again later on.
72

72

00:02:37,020  -->  00:02:40,040
For example, your organization might conduct nightly backups
73

73

00:02:40,040  -->  00:02:42,500
of all of your servers and put that onto a backup tape
74

74

00:02:42,500  -->  00:02:44,490
or a cloud-based glacial server.
75

75

00:02:44,490  -->  00:02:47,580
In that case, the data won't be instantly available anymore,
76

76

00:02:47,580  -->  00:02:51,200
but your organization can recover to it and restore from it
77

77

00:02:51,200  -->  00:02:53,600
if they need to, taking those from the archives
78

78

00:02:53,600  -->  00:02:55,610
and putting them back onto your production servers
79

79

00:02:55,610  -->  00:02:58,580
in the case of an emergency or an investigation.
80

80

00:02:58,580  -->  00:03:00,760
Six, we have data destruction.
81

81

00:03:00,760  -->  00:03:04,630
At some point, the data you've created used, shared, stored,
82

82

00:03:04,630  -->  00:03:07,600
and archived is going to be no longer valuable to you.
83

83

00:03:07,600  -->  00:03:10,200
At that point, it's going to be time to destroy the data
84

84

00:03:10,200  -->  00:03:12,310
and bring it to an end of its useful life.
85

85

00:03:12,310  -->  00:03:15,340
After all, we can't keep all of our data indefinitely
86

86

00:03:15,340  -->  00:03:17,230
because we're going to end up running out of storage space,
87

87

00:03:17,230  -->  00:03:20,090
or it's going to simply cost us too much to buy more storage
88

88

00:03:20,090  -->  00:03:23,180
space for all of that data that has no useful purpose.
89

89

00:03:23,180  -->  00:03:24,830
This destruction could be as simple
90

90

00:03:24,830  -->  00:03:26,640
as running a delete command on a server,
91

91

00:03:26,640  -->  00:03:29,020
or it could be overriding that area of a hard disk
92

92

00:03:29,020  -->  00:03:32,250
with zeros, or you could physically destroy a tape backup
93

93

00:03:32,250  -->  00:03:33,870
by shredding that tape.
94

94

00:03:33,870  -->  00:03:36,090
The exact method here isn't really important,
95

95

00:03:36,090  -->  00:03:37,910
but the concept is that that data
96

96

00:03:37,910  -->  00:03:39,310
has to go through a lifecycle
97

97

00:03:39,310  -->  00:03:40,950
and that's what we're concerned with here.
98

98

00:03:40,950  -->  00:03:43,430
Remember, all data moves through this life cycle
99

99

00:03:43,430  -->  00:03:46,600
from creation to use, to sharing, to storage,
100

100

00:03:46,600  -->  00:03:48,700
to archiving, to destruction.
101

101

00:03:48,700  -->  00:03:51,040
Now that we understand the basic data life cycle,
102

102

00:03:51,040  -->  00:03:53,870
we need to discuss the concept of a data inventory.
103

103

00:03:53,870  -->  00:03:56,580
Now, a data inventory serves as a single source of truth
104

104

00:03:56,580  -->  00:03:58,150
within your organization.
105

105

00:03:58,150  -->  00:04:00,210
A data inventory is going to be used to provide
106

106

00:04:00,210  -->  00:04:02,650
instant insight into all the sources of data
107

107

00:04:02,650  -->  00:04:04,510
that an organization has access to,
108

108

00:04:04,510  -->  00:04:06,910
what information is being collected by these sources,
109

109

00:04:06,910  -->  00:04:08,440
where that data is being stored,
110

110

00:04:08,440  -->  00:04:10,920
and what will ultimately happen to that data.
111

111

00:04:10,920  -->  00:04:13,160
This is also referred to as a data mapping
112

112

00:04:13,160  -->  00:04:14,750
in some organizations.
113

113

00:04:14,750  -->  00:04:17,390
So why is it important to conduct a data inventory
114

114

00:04:17,390  -->  00:04:18,760
or data mapping?
115

115

00:04:18,760  -->  00:04:21,050
Well, if we're going to be responsible for protecting
116

116

00:04:21,050  -->  00:04:22,410
our organization's data,
117

117

00:04:22,410  -->  00:04:24,020
then it's really important that we understand
118

118

00:04:24,020  -->  00:04:26,330
exactly where all that data is located.
119

119

00:04:26,330  -->  00:04:27,630
Now this may sound easy,
120

120

00:04:27,630  -->  00:04:30,030
but these days it's actually quite challenging
121

121

00:04:30,030  -->  00:04:32,800
because we have data located all over the place.
122

122

00:04:32,800  -->  00:04:34,700
Do you have data on your company's shared drive
123

123

00:04:34,700  -->  00:04:35,840
and email servers?
124

124

00:04:35,840  -->  00:04:38,020
Well, most likely you do,
125

125

00:04:38,020  -->  00:04:39,670
and you probably have full control over those servers,
126

126

00:04:39,670  -->  00:04:42,400
but there's a lot more of your data out there as well.
127

127

00:04:42,400  -->  00:04:45,150
In my own company, we have data in our accounting software
128

128

00:04:45,150  -->  00:04:47,090
and our credit card processing software.
129

129

00:04:47,090  -->  00:04:49,930
Both of these are software-as-a-service solutions.
130

130

00:04:49,930  -->  00:04:51,400
Now we also have some of our data
131

131

00:04:51,400  -->  00:04:52,880
in our learning management system
132

132

00:04:52,880  -->  00:04:54,130
and other parts of our data
133

133

00:04:54,130  -->  00:04:56,350
in our customer relationship management system.
134

134

00:04:56,350  -->  00:05:00,390
We use tools like Slack, Office 365, and Google workspace,
135

135

00:05:00,390  -->  00:05:02,350
and all of these have our data too.
136

136

00:05:02,350  -->  00:05:04,240
So I only just scratched the surface here,
137

137

00:05:04,240  -->  00:05:06,210
but I've already listed out nine different places
138

138

00:05:06,210  -->  00:05:09,210
where our data resides and we're a really small company.
139

139

00:05:09,210  -->  00:05:11,850
This is why conducting a data inventory or data mapping
140

140

00:05:11,850  -->  00:05:14,040
is truly important here because once you know
141

141

00:05:14,040  -->  00:05:15,240
where all your data is,
142

142

00:05:15,240  -->  00:05:17,340
you can then begin to determine how you're going to secure
143

143

00:05:17,340  -->  00:05:19,550
that data and protect that data across
144

144

00:05:19,550  -->  00:05:21,220
all of these disparate storage arrays
145

145

00:05:21,220  -->  00:05:22,780
that you've now created.
146

146

00:05:22,780  -->  00:05:24,980
Now, once you've identified all of this data,
147

147

00:05:24,980  -->  00:05:26,810
you need to figure out how to ensure its integrity
148

148

00:05:26,810  -->  00:05:28,350
is also being maintained.
149

149

00:05:28,350  -->  00:05:30,950
This is known as data integrity management.
150

150

00:05:30,950  -->  00:05:33,730
Now data integrity is all about protecting data
151

151

00:05:33,730  -->  00:05:37,140
against improper maintenance, modification or alteration.
152

152

00:05:37,140  -->  00:05:39,630
And it also includes data authenticity.
153

153

00:05:39,630  -->  00:05:42,080
Integrity has to do with the accuracy of information,
154

154

00:05:42,080  -->  00:05:44,800
including its authenticity and trustworthiness.
155

155

00:05:44,800  -->  00:05:47,180
Now information with low integrity concerns
156

156

00:05:47,180  -->  00:05:49,220
may be considered unimportant to your business
157

157

00:05:49,220  -->  00:05:51,740
because it doesn't have a precise operational function,
158

158

00:05:51,740  -->  00:05:53,350
and therefore it's not necessary
159

159

00:05:53,350  -->  00:05:55,400
to vigorously check that for errors.
160

160

00:05:55,400  -->  00:05:57,650
Information with high integrity concerns though
161

161

00:05:57,650  -->  00:06:00,520
are considered to be crucial and critical to your functions,
162

162

00:06:00,520  -->  00:06:03,180
and therefore they must be accurate in order to prevent
163

163

00:06:03,180  -->  00:06:06,130
negative impacts to your organization's activities.
164

164

00:06:06,130  -->  00:06:07,460
For example, if you're dealing
165

165

00:06:07,460  -->  00:06:08,810
with your accounting software,
166

166

00:06:08,810  -->  00:06:11,560
you likely need to ensure it has a high level of integrity
167

167

00:06:11,560  -->  00:06:13,650
because you don't want to have a customer's balance
168

168

00:06:13,650  -->  00:06:15,530
saying that they owe you $10,000
169

169

00:06:15,530  -->  00:06:17,500
when they only owe you $1,000.
170

170

00:06:17,500  -->  00:06:20,240
That would be a big problem and due to a lack of integrity
171

171

00:06:20,240  -->  00:06:22,150
because it's changing those numbers.
172

172

00:06:22,150  -->  00:06:24,780
Therefore you want to build out your data protection plans
173

173

00:06:24,780  -->  00:06:26,960
for your accounting systems and implement things
174

174

00:06:26,960  -->  00:06:29,070
like journaling and hashing of your data
175

175

00:06:29,070  -->  00:06:32,320
to ensure the integrity remains intact at all times.
176

176

00:06:32,320  -->  00:06:34,420
Conversely, if you're dealing with some kind of data
177

177

00:06:34,420  -->  00:06:36,130
that doesn't require high integrity,
178

178

00:06:36,130  -->  00:06:37,580
you might choose not to implement
179

179

00:06:37,580  -->  00:06:39,280
these more expensive controls.
180

180

00:06:39,280  -->  00:06:41,430
This is ultimately a decision that's going to be made
181

181

00:06:41,430  -->  00:06:43,790
using your risk management and considering the cost
182

182

00:06:43,790  -->  00:06:46,510
versus the benefits of adding these additional controls
183

183

00:06:46,510  -->  00:06:49,010
to each of your data processing systems.
184

184

00:06:49,010  -->  00:06:51,260
Finally, we need to discuss data storage
185

185

00:06:51,260  -->  00:06:52,840
a bit more in depth here.
186

186

00:06:52,840  -->  00:06:54,470
By far, the most common place
187

187

00:06:54,470  -->  00:06:56,920
we're going to store our data to is a RAID.
188

188

00:06:56,920  -->  00:06:59,710
A redundant array of inexpensive disks, or RAID,
189

189

00:06:59,710  -->  00:07:01,930
is a hard drive technology that allows data
190

190

00:07:01,930  -->  00:07:03,710
to be written to a logical partition
191

191

00:07:03,710  -->  00:07:04,543
that's going to be spread
192

192

00:07:04,543  -->  00:07:06,640
across multiple physical disk drives.
193

193

00:07:06,640  -->  00:07:09,650
This ensures that even if a single disk drive in the array
194

194

00:07:09,650  -->  00:07:12,900
fails, that data is still going to be available by restoring
195

195

00:07:12,900  -->  00:07:14,170
it from the RAID itself,
196

196

00:07:14,170  -->  00:07:16,400
instead of having to restore it from a tape backup.
197

197

00:07:16,400  -->  00:07:18,960
Now, there are four main types of RAID arrays
198

198

00:07:18,960  -->  00:07:20,510
that you should be familiar with.
199

199

00:07:20,510  -->  00:07:23,250
RAID zero, which is referred to as disk striping.
200

200

00:07:23,250  -->  00:07:25,700
RAID one, which is called disk mirroring.
201

201

00:07:25,700  -->  00:07:28,720
RAID three, which is called byte level data striping
202

202

00:07:28,720  -->  00:07:30,050
with dedicated parody.
203

203

00:07:30,050  -->  00:07:32,730
And RAID five, which is block level data striping
204

204

00:07:32,730  -->  00:07:34,290
with distributed parody.
205

205

00:07:34,290  -->  00:07:36,500
With a raid zero or disk striping,
206

206

00:07:36,500  -->  00:07:39,610
this is going to involve a minimum of two physical disks.
207

207

00:07:39,610  -->  00:07:41,400
In this configuration, half of the data
208

208

00:07:41,400  -->  00:07:43,390
is stored on one of the physical drives,
209

209

00:07:43,390  -->  00:07:45,920
while the other half is stored on the other drive.
210

210

00:07:45,920  -->  00:07:48,060
This increases the responsiveness and the delivery
211

211

00:07:48,060  -->  00:07:50,090
of the data store on this kind of RAID,
212

212

00:07:50,090  -->  00:07:52,890
but there is no added redundancy to this data.
213

213

00:07:52,890  -->  00:07:54,900
If either of these physical drives fail,
214

214

00:07:54,900  -->  00:07:57,010
all of the data is going to be lost.
215

215

00:07:57,010  -->  00:07:58,730
Now, if we want to have some redundancy,
216

216

00:07:58,730  -->  00:08:00,400
we can move to a RAID one.
217

217

00:08:00,400  -->  00:08:02,180
RAID one or disk mirroring,
218

218

00:08:02,180  -->  00:08:03,810
places the importance of redundancy
219

219

00:08:03,810  -->  00:08:05,840
over speed in this array.
220

220

00:08:05,840  -->  00:08:07,420
In this type of configuration,
221

221

00:08:07,420  -->  00:08:09,590
you need to have at least two physical disks,
222

222

00:08:09,590  -->  00:08:11,290
and you're going to have a copy of the data written
223

223

00:08:11,290  -->  00:08:13,600
to both disks at the same time.
224

224

00:08:13,600  -->  00:08:15,970
This provides an always ready and available backup
225

225

00:08:15,970  -->  00:08:18,700
in case either of those individual drives fail.
226

226

00:08:18,700  -->  00:08:20,900
The next type we have is known as a RAID three
227

227

00:08:20,900  -->  00:08:24,230
or byte level data striping with a dedicated parody drive
228

228

00:08:24,230  -->  00:08:26,750
and this uses a minimum of three disks.
229

229

00:08:26,750  -->  00:08:28,320
In this type of configuration,
230

230

00:08:28,320  -->  00:08:30,630
a portion of your data is placed in the first drive
231

231

00:08:30,630  -->  00:08:33,090
and another portion is placed on the second drive.
232

232

00:08:33,090  -->  00:08:36,610
Then we use a mathematical algorithm to calculate a parody
233

233

00:08:36,610  -->  00:08:38,590
that's going to be stored on the third drive.
234

234

00:08:38,590  -->  00:08:40,050
If a single drive fails,
235

235

00:08:40,050  -->  00:08:42,400
then the parody can be used to recalculate the values
236

236

00:08:42,400  -->  00:08:44,350
that were stored on one of the drives that failed
237

237

00:08:44,350  -->  00:08:47,170
once we put in a new drive and we rebuild the array.
238

238

00:08:47,170  -->  00:08:49,840
This allows us to rebuild itself very quickly and provide
239

239

00:08:49,840  -->  00:08:52,430
data to our users in no time at all.
240

240

00:08:52,430  -->  00:08:54,300
Next, we have a RAID five.
241

241

00:08:54,300  -->  00:08:56,910
Now RAID five is the most commonly used RAID.
242

242

00:08:56,910  -->  00:08:59,190
It is known as a block level data striping
243

243

00:08:59,190  -->  00:09:00,910
with distributed parody.
244

244

00:09:00,910  -->  00:09:04,310
In this array, a minimum of three drives is also required.
245

245

00:09:04,310  -->  00:09:06,040
When the data stored on this array,
246

246

00:09:06,040  -->  00:09:08,280
a piece of the data is placed on each of the drives
247

247

00:09:08,280  -->  00:09:11,130
and the parody is also stored on those drives.
248

248

00:09:11,130  -->  00:09:12,590
Instead of reserving a single drive
249

249

00:09:12,590  -->  00:09:14,080
for all the parody storage,
250

250

00:09:14,080  -->  00:09:16,420
we're going to have data and parody equally distributed
251

251

00:09:16,420  -->  00:09:18,370
across all three drives.
252

252

00:09:18,370  -->  00:09:20,300
This type of array is very popular
253

253

00:09:20,300  -->  00:09:22,320
because we can replace any single drive
254

254

00:09:22,320  -->  00:09:23,920
without having to shut down the server
255

255

00:09:23,920  -->  00:09:25,810
and this allows operations to continue
256

256

00:09:25,810  -->  00:09:27,950
while we're rebuilding a failed drive.
257

257

00:09:27,950  -->  00:09:29,190
Now RAIDs can be implemented
258

258

00:09:29,190  -->  00:09:31,590
using either software or hardware.
259

259

00:09:31,590  -->  00:09:33,820
Now it's cheaper to use a software-based solution,
260

260

00:09:33,820  -->  00:09:36,530
but hardware-based solutions will operate faster
261

261

00:09:36,530  -->  00:09:38,140
for most environments.
262

262

00:09:38,140  -->  00:09:40,230
Another storage option we have is known
263

263

00:09:40,230  -->  00:09:42,630
as storage area networks or SANs
264

264

00:09:42,630  -->  00:09:45,710
and these are very common in our larger enterprise networks.
265

265

00:09:45,710  -->  00:09:47,920
A SAN provides high capacity storage
266

266

00:09:47,920  -->  00:09:49,490
by connecting storage devices,
267

267

00:09:49,490  -->  00:09:51,410
using a high-speed private network
268

268

00:09:51,410  -->  00:09:52,640
that is going to be interconnected
269

269

00:09:52,640  -->  00:09:54,640
by storage specific switches.
270

270

00:09:54,640  -->  00:09:55,960
This is usually going to be handled
271

271

00:09:55,960  -->  00:09:57,610
by a fiber channel network.
272

272

00:09:57,610  -->  00:10:00,010
SANs are going to be great for their scalability
273

273

00:10:00,010  -->  00:10:01,280
and high availability,
274

274

00:10:01,280  -->  00:10:04,320
but they're quite expensive to produce and to procure
275

275

00:10:04,320  -->  00:10:05,920
and they require a high level of skill
276

276

00:10:05,920  -->  00:10:07,440
to maintain these things.
277

277

00:10:07,440  -->  00:10:08,320
These days,
278

278

00:10:08,320  -->  00:10:11,140
a lot of our data is also going to be stored in the cloud.
279

279

00:10:11,140  -->  00:10:12,780
This can be inside of a database,
280

280

00:10:12,780  -->  00:10:15,510
a block-level storage, or a binary large object
281

281

00:10:15,510  -->  00:10:16,770
known as a blob.
282

282

00:10:16,770  -->  00:10:18,840
Regardless of where we end up storing our data,
283

283

00:10:18,840  -->  00:10:20,580
It's always important for us to have a backup
284

284

00:10:20,580  -->  00:10:23,560
and recovery plan for that data because all forms of storage
285

285

00:10:23,560  -->  00:10:26,243
are subject to outages and data loss eventually.

