1
00:00:00,003 --> 00:00:01,001
- [Lecturer] Welcome to this lesson

2
00:00:01,001 --> 00:00:04,001
on the Cloud Data Life Cycle, Data Dispersion

3
00:00:04,001 --> 00:00:07,002
and Data Flows.

4
00:00:07,002 --> 00:00:08,004
Since we've already been over this,

5
00:00:08,004 --> 00:00:09,005
the focus of this lesson

6
00:00:09,005 --> 00:00:11,006
will be to discuss different

7
00:00:11,006 --> 00:00:13,006
security considerations and controls

8
00:00:13,006 --> 00:00:17,001
that we're going to apply in each phase.

9
00:00:17,001 --> 00:00:21,007
We'll also briefly discuss data dispersion and data flows.

10
00:00:21,007 --> 00:00:24,000
As a reminder, the cloud secure data lifecycle

11
00:00:24,000 --> 00:00:25,008
is made up of six steps.

12
00:00:25,008 --> 00:00:30,000
These steps are create, store, use, share, archive,

13
00:00:30,000 --> 00:00:31,002
and destroy,

14
00:00:31,002 --> 00:00:33,000
and I absolutely recommend memorizing

15
00:00:33,000 --> 00:00:37,001
the order of these steps before going into your CCSP exam.

16
00:00:37,001 --> 00:00:38,003
The first step is create,

17
00:00:38,003 --> 00:00:41,002
and this is where the initial generation or modification

18
00:00:41,002 --> 00:00:45,000
of digital information takes place.

19
00:00:45,000 --> 00:00:47,007
Some activities that we might take during this phase

20
00:00:47,007 --> 00:00:49,006
are going to be data classification,

21
00:00:49,006 --> 00:00:51,008
and this refers to the process

22
00:00:51,008 --> 00:00:56,003
of assigning a security label to the data.

23
00:00:56,003 --> 00:00:59,002
For example, some labels could be public, confidential,

24
00:00:59,002 --> 00:01:02,001
secret or top secret based on both the content

25
00:01:02,001 --> 00:01:04,000
and the purpose of the data.

26
00:01:04,000 --> 00:01:08,004
This is also an important step to conduct input validation

27
00:01:08,004 --> 00:01:10,003
so that as we're collecting the data,

28
00:01:10,003 --> 00:01:12,005
we're ensuring that it meets certain criteria

29
00:01:12,005 --> 00:01:14,004
to prevent malicious input.

30
00:01:14,004 --> 00:01:18,007
This is primarily going to apply to web interfaces

31
00:01:18,007 --> 00:01:20,007
where users are inputting data

32
00:01:20,007 --> 00:01:23,001
that is eventually going to make it somewhere

33
00:01:23,001 --> 00:01:25,002
into our application.

34
00:01:25,002 --> 00:01:29,005
Step two of the data lifecycle is storing data,

35
00:01:29,005 --> 00:01:32,008
and some key requirements are going to be setting up

36
00:01:32,008 --> 00:01:35,004
that initial encryption, both in transit

37
00:01:35,004 --> 00:01:38,001
and at rest for the data.

38
00:01:38,001 --> 00:01:40,005
In this phase, we also may select

39
00:01:40,005 --> 00:01:41,008
different storage solutions

40
00:01:41,008 --> 00:01:43,006
depending on the sensitivity of the data,

41
00:01:43,006 --> 00:01:45,009
which was identified during the create phase,

42
00:01:45,009 --> 00:01:48,000
and subsequently send the data

43
00:01:48,000 --> 00:01:49,005
to the appropriate storage solution

44
00:01:49,005 --> 00:01:51,002
based on that classification.

45
00:01:51,002 --> 00:01:54,000
This is also the phase where we will determine

46
00:01:54,000 --> 00:01:57,005
and potentially apply the data retention policies,

47
00:01:57,005 --> 00:01:59,001
which is determining how long

48
00:01:59,001 --> 00:02:01,009
the data should be stored based on, once again,

49
00:02:01,009 --> 00:02:05,002
the classification and the purpose of the data.

50
00:02:05,002 --> 00:02:08,005
Step three is where we use the data,

51
00:02:08,005 --> 00:02:11,007
and similarly, it's important during this step

52
00:02:11,007 --> 00:02:14,003
that we have robust encryption

53
00:02:14,003 --> 00:02:16,001
and that we're effectively identifying

54
00:02:16,001 --> 00:02:20,001
and enforcing access controls for both people

55
00:02:20,001 --> 00:02:23,002
and systems that might be accessing this data,

56
00:02:23,002 --> 00:02:25,002
and then auditing the activities

57
00:02:25,002 --> 00:02:28,006
that take place when they do access it.

58
00:02:28,006 --> 00:02:31,009
Another useful thing we can do during the use phase is,

59
00:02:31,009 --> 00:02:34,006
depending on exactly how and why we're using it,

60
00:02:34,006 --> 00:02:37,000
you can also apply data masking,

61
00:02:37,000 --> 00:02:38,007
which is a practice of displaying

62
00:02:38,007 --> 00:02:40,005
only a portion of the sensitive data

63
00:02:40,005 --> 00:02:44,009
and removing the fields which may be considered sensitive

64
00:02:44,009 --> 00:02:47,003
if they're not actively required for the task

65
00:02:47,003 --> 00:02:49,001
being accomplished.

66
00:02:49,001 --> 00:02:51,001
The next step is sharing,

67
00:02:51,001 --> 00:02:52,008
where we will be granting data access

68
00:02:52,008 --> 00:02:55,004
to other users or entities.

69
00:02:55,004 --> 00:02:57,007
The most important aspects of this phase

70
00:02:57,007 --> 00:02:59,005
are going to be making sure that we're effectively

71
00:02:59,005 --> 00:03:03,003
authenticating and authorizing users and systems

72
00:03:03,003 --> 00:03:05,005
that are sharing the data,

73
00:03:05,005 --> 00:03:07,007
as well as applying encryption in transit

74
00:03:07,007 --> 00:03:10,000
for the data moving between systems,

75
00:03:10,000 --> 00:03:12,008
using protocols like SFTP and HTTPS

76
00:03:12,008 --> 00:03:15,003
to securely share the data.

77
00:03:15,003 --> 00:03:18,000
This might also be a phase where we determine

78
00:03:18,000 --> 00:03:20,005
and apply data usage agreements,

79
00:03:20,005 --> 00:03:24,000
such as requiring third parties to agree to terms

80
00:03:24,000 --> 00:03:26,003
about how the data is going to be used.

81
00:03:26,003 --> 00:03:28,004
The next step is archiving,

82
00:03:28,004 --> 00:03:30,007
and this is where we are storing data long-term

83
00:03:30,007 --> 00:03:33,003
for compliance or historical purposes.

84
00:03:33,003 --> 00:03:35,006
Similar to that initial store phase,

85
00:03:35,006 --> 00:03:37,002
some activities that will take place here

86
00:03:37,002 --> 00:03:40,000
is going to be determining what our solution is going to be

87
00:03:40,000 --> 00:03:42,008
for this long-term storage.

88
00:03:42,008 --> 00:03:45,007
And cold storage is a common term you might hear

89
00:03:45,007 --> 00:03:47,008
for referring to data solutions

90
00:03:47,008 --> 00:03:50,007
that are expressly designed for archival data.

91
00:03:50,007 --> 00:03:52,004
Some other activities we can do here

92
00:03:52,004 --> 00:03:55,002
are going to be conducting regular data integrity checks

93
00:03:55,002 --> 00:03:59,001
to validate that nothing has changed about the data.

94
00:03:59,001 --> 00:04:00,005
And then a special point I want to make

95
00:04:00,005 --> 00:04:03,003
is that in situations where,

96
00:04:03,003 --> 00:04:04,005
we've discussed throughout the course

97
00:04:04,005 --> 00:04:06,000
how important cryptography

98
00:04:06,000 --> 00:04:09,001
and specifically key management is in the cloud

99
00:04:09,001 --> 00:04:11,002
and in general, and specifically when it comes

100
00:04:11,002 --> 00:04:14,003
to archiving data, you should always consider

101
00:04:14,003 --> 00:04:16,008
how you are going to manage the key material

102
00:04:16,008 --> 00:04:19,003
that's used to encrypt your data

103
00:04:19,003 --> 00:04:21,001
and ensuring that the lifecycle assigned

104
00:04:21,001 --> 00:04:24,004
for the key is at least as long as the lifecycle

105
00:04:24,004 --> 00:04:26,002
of the data you are archiving.

106
00:04:26,002 --> 00:04:28,008
In fact, a key aspect of key management

107
00:04:28,008 --> 00:04:31,004
is tracking the existence of that key material

108
00:04:31,004 --> 00:04:35,002
through each of these phases of the data lifecycle.

109
00:04:35,002 --> 00:04:38,002
Okay, and the last phase is going to be destroy.

110
00:04:38,002 --> 00:04:40,007
This is where we are securely erasing our data

111
00:04:40,007 --> 00:04:43,000
once it's no longer needed.

112
00:04:43,000 --> 00:04:46,000
The goals of this phase are to ensure a permanent deletion

113
00:04:46,000 --> 00:04:51,000
of the data in a way that meets security requirements,

114
00:04:51,000 --> 00:04:53,003
and once again, it's very unlikely in a cloud environment

115
00:04:53,003 --> 00:04:56,006
that you will gain physical access to destroy disks

116
00:04:56,006 --> 00:04:58,008
where your data is written.

117
00:04:58,008 --> 00:05:01,005
So, we can use mechanisms such as crypto shredding

118
00:05:01,005 --> 00:05:03,007
to ensure that our data is unreadable,

119
00:05:03,007 --> 00:05:06,006
even if it's still written to a disc.

120
00:05:06,006 --> 00:05:09,002
Next up, let's talk about data dispersion.

121
00:05:09,002 --> 00:05:12,008
This is the concept of breaking up data into smaller chunks

122
00:05:12,008 --> 00:05:15,003
for diversified storage.

123
00:05:15,003 --> 00:05:17,001
So, you can imagine if this puzzle right here

124
00:05:17,001 --> 00:05:19,005
is a coherent set of data,

125
00:05:19,005 --> 00:05:23,000
then we're breaking it up into distinct portions

126
00:05:23,000 --> 00:05:26,003
and storing it on different drives.

127
00:05:26,003 --> 00:05:27,006
And compared to on-premises,

128
00:05:27,006 --> 00:05:31,004
this is very similar to the concept of RAID striping,

129
00:05:31,004 --> 00:05:35,007
but it's tailored specifically for cloud environments.

130
00:05:35,007 --> 00:05:36,006
Along with this,

131
00:05:36,006 --> 00:05:41,000
there's also something called erasure coding,

132
00:05:41,000 --> 00:05:43,006
and this is comparable to RAID'S parody bit,

133
00:05:43,006 --> 00:05:47,003
where if there's a failure of a disc that causes us

134
00:05:47,003 --> 00:05:51,002
to lose some portion of our original data,

135
00:05:51,002 --> 00:05:54,001
that mathematical calculations can be applied

136
00:05:54,001 --> 00:05:56,008
to the segments that are retained,

137
00:05:56,008 --> 00:06:00,001
which allows us to recover the lost data.

138
00:06:00,001 --> 00:06:01,008
Of course, this is not the algorithm applied,

139
00:06:01,008 --> 00:06:05,003
but meant to represent the application

140
00:06:05,003 --> 00:06:08,004
of complex mathematical calculations.

141
00:06:08,004 --> 00:06:11,002
Some benefits of this setup are going to be enhancing

142
00:06:11,002 --> 00:06:15,006
overall data availability as well as increased security

143
00:06:15,006 --> 00:06:17,009
since no complete set of usable data

144
00:06:17,009 --> 00:06:20,004
is available on a single device.

145
00:06:20,004 --> 00:06:24,000
However, it can become a complication

146
00:06:24,000 --> 00:06:27,001
for compliance considerations,

147
00:06:27,001 --> 00:06:30,003
especially in countries that require sensitive data

148
00:06:30,003 --> 00:06:33,004
to be retained within the country.

149
00:06:33,004 --> 00:06:36,002
There's also a possibility for some potential latency

150
00:06:36,002 --> 00:06:38,005
with data dispersion.

151
00:06:38,005 --> 00:06:41,009
Okay, finally, we'll talk about data flows.

152
00:06:41,009 --> 00:06:44,004
A data flow is a mapping of data movement

153
00:06:44,004 --> 00:06:47,003
in modular application architectures.

154
00:06:47,003 --> 00:06:50,001
The cloud has allowed us to build advanced systems

155
00:06:50,001 --> 00:06:52,004
that are modular in design,

156
00:06:52,004 --> 00:06:54,005
which offer things like faster development,

157
00:06:54,005 --> 00:06:58,007
faster deployment, and independently scalable features.

158
00:06:58,007 --> 00:07:01,008
However, sometimes these new modern designs

159
00:07:01,008 --> 00:07:05,001
have led to reduced visibility.

160
00:07:05,001 --> 00:07:07,000
In some cases, organizations can struggle

161
00:07:07,000 --> 00:07:09,004
to pinpoint exact data locations,

162
00:07:09,004 --> 00:07:13,005
routes, and open ports and services of systems.

163
00:07:13,005 --> 00:07:16,005
So, one of the things that we can build to help regain

164
00:07:16,005 --> 00:07:17,009
some of that system visibility

165
00:07:17,009 --> 00:07:20,007
are going to be data flow diagrams.

166
00:07:20,007 --> 00:07:22,002
And this is a very simple example,

167
00:07:22,002 --> 00:07:27,003
but in a data flow diagram, the purpose is to focus on

168
00:07:27,003 --> 00:07:30,001
the direction and content of data that is flowing

169
00:07:30,001 --> 00:07:33,002
through our independent systems, where maybe in this case,

170
00:07:33,002 --> 00:07:37,003
a set of users is accessing a front-end web application,

171
00:07:37,003 --> 00:07:40,002
and then that web application is communicating via an API

172
00:07:40,002 --> 00:07:42,009
to a back-end database.

173
00:07:42,009 --> 00:07:46,004
In this case, a data flow diagram should focus on

174
00:07:46,004 --> 00:07:48,008
which systems are communicating with other systems

175
00:07:48,008 --> 00:07:50,001
and what the content is

176
00:07:50,001 --> 00:07:53,008
and should be of those communications.

177
00:07:53,008 --> 00:07:56,003
In summary, in this lesson we talked about the cloud data

178
00:07:56,003 --> 00:07:57,009
lifecycle and some security

179
00:07:57,009 --> 00:08:00,005
and controls that we might apply during each phase

180
00:08:00,005 --> 00:08:03,003
of the cloud data lifecycle.

181
00:08:03,003 --> 00:08:05,006
We then talked about the concept of data dispersion

182
00:08:05,006 --> 00:08:07,004
as well as data flows

183
00:08:07,004 --> 00:08:10,003
and the importance of building data flow diagrams

184
00:08:10,003 --> 00:08:12,000
alongside our architectural diagrams.


