1
00:00:00,211 --> 00:00:10,211
[MUSIC]

2
00:00:18,521 --> 00:00:20,921
Today, we'll continue our discussion of
data

3
00:00:20,921 --> 00:00:24,600
validation in the life cycle inventory
analysis step.

4
00:00:24,600 --> 00:00:26,630
If you recall, a few lectures ago, we

5
00:00:26,630 --> 00:00:31,470
discussed analysis based estimation
methods for addressing data gaps.

6
00:00:31,470 --> 00:00:33,240
I'd like to stress again, that these
methods

7
00:00:33,240 --> 00:00:36,600
can also be used as part of data
validation.

8
00:00:36,600 --> 00:00:40,350
More specifically, these estimation
methods can provide independent

9
00:00:40,350 --> 00:00:43,690
reference points against which to test
whether inventory

10
00:00:43,690 --> 00:00:45,550
data makes sense.

11
00:00:45,550 --> 00:00:49,070
For example, I used an air emissions
factor for CO2

12
00:00:49,070 --> 00:00:53,290
emissions from diesel fuel combustion to
check if the CO2 emissions

13
00:00:53,290 --> 00:00:57,190
estimated from my black box unit process
model for light trucks

14
00:00:57,190 --> 00:01:00,220
were consistent with the amount of diesel
fuel in the inventory.

15
00:01:01,240 --> 00:01:04,810
Additional ways of validating data include
checking to ensure unit

16
00:01:04,810 --> 00:01:08,980
process inventory conserves both mass and
energy between its inputs and

17
00:01:08,980 --> 00:01:13,130
outputs, checking your inventory data for
anomalies or gaps,

18
00:01:13,130 --> 00:01:17,200
by comparing them to similar inventories,
or taking process measurements.

19
00:01:18,400 --> 00:01:21,190
Today, we'll focus on data quality
assessment, which

20
00:01:21,190 --> 00:01:23,900
is the process of ensuring our life cycle
inventory

21
00:01:23,900 --> 00:01:26,820
data have met the data quality
requirements that

22
00:01:26,820 --> 00:01:30,720
we specify during the scope definition of
our study.

23
00:01:30,720 --> 00:01:34,150
We'll use my plastic bag LCA model as an
example when discussing

24
00:01:34,150 --> 00:01:34,730
this concept.

25
00:01:36,160 --> 00:01:38,760
Note that it is good practice to evaluate
data quality

26
00:01:38,760 --> 00:01:42,590
throughout a life cycle inventory, as
we've done by documenting the

27
00:01:42,590 --> 00:01:47,620
time related, geographical, and technology
coverage for each unit process, as

28
00:01:47,620 --> 00:01:49,740
well as important explanatory notes,

29
00:01:49,740 --> 00:01:52,230
describing any allocation and agrication
decisions.

30
00:01:53,590 --> 00:01:56,760
Thus, the goal of data quality assessment
should not

31
00:01:56,760 --> 00:01:59,300
be to alert you to major quality issues
after

32
00:01:59,300 --> 00:02:01,130
you've completed your entire inventory.

33
00:02:01,130 --> 00:02:05,440
If that happens, it means you probably
weren't paying close enough attention

34
00:02:05,440 --> 00:02:09,830
to your own data quality requirements when
you were compiling your inventory data.

35
00:02:09,830 --> 00:02:14,810
To the contrary, the data quality
assessment really serves two purposes.

36
00:02:14,810 --> 00:02:19,330
First, it forces one to think about if
data requirements were met in a

37
00:02:19,330 --> 00:02:21,540
consistent manner for the inventory as a

38
00:02:21,540 --> 00:02:25,220
whole, and with respect to key data
characteristics.

39
00:02:25,220 --> 00:02:29,880
Second, it forces one to document a
structured and intuitive summary of data

40
00:02:29,880 --> 00:02:32,560
quality for the study's audience, which
aids

41
00:02:32,560 --> 00:02:34,570
in their interpretation of the study's
results.

42
00:02:35,940 --> 00:02:38,060
There is no standard method for data
quality

43
00:02:38,060 --> 00:02:41,340
assessment in LCA, rather different
methods have been

44
00:02:41,340 --> 00:02:43,640
proposed by different authors, some of
which I

45
00:02:43,640 --> 00:02:46,150
will refer you to in the lecture notes.

46
00:02:46,150 --> 00:02:50,440
I'd like to use the data pedigree mix
proposed by Wydama and Wasnes in 1996,

47
00:02:50,440 --> 00:02:55,220
because it is easy to apply and intuitive
to interpret.

48
00:02:55,220 --> 00:02:57,390
This data pedigree matrix has five data

49
00:02:57,390 --> 00:03:00,900
quality indicators, which I'll discuss one
by one.

50
00:03:00,900 --> 00:03:04,670
The reliability indicator relates to the
sources, acquistion

51
00:03:04,670 --> 00:03:08,340
methods and verification procedures used
to obtain the data.

52
00:03:09,530 --> 00:03:14,060
The completeness indicator relates to the
statistical properties of the data.

53
00:03:14,060 --> 00:03:16,000
How representative is the sample?

54
00:03:16,000 --> 00:03:18,620
Does the sample include a sufficient
number of data?

55
00:03:18,620 --> 00:03:21,480
And is the period adequate to even normal
fluctuations?

56
00:03:22,566 --> 00:03:26,720
Now, the next three data quality indicator
should look very familiar to you,

57
00:03:26,720 --> 00:03:28,560
because we specified them in our data

58
00:03:28,560 --> 00:03:31,735
quality requirements in the scope
definition step.

59
00:03:31,735 --> 00:03:35,880
We've also been tracking them carefully
for each unit process inventory.

60
00:03:35,880 --> 00:03:39,190
The temporal correlation indicator
represents the correlation

61
00:03:39,190 --> 00:03:41,150
between the stated analysis years of the

62
00:03:41,150 --> 00:03:45,370
study and the years of the obtained
inventory data.

63
00:03:45,370 --> 00:03:49,180
Recall that I specified that, if possible,
my inventory data for

64
00:03:49,180 --> 00:03:52,580
the plastic bag life cycle should be from
the last five years.

65
00:03:54,130 --> 00:03:56,940
The geographical correlation indicator
represents the

66
00:03:56,940 --> 00:03:59,540
correlation between the stated
geographical focus

67
00:03:59,540 --> 00:04:01,380
of a study and the geographical

68
00:04:01,380 --> 00:04:04,610
characteristics of the obtained inventory
data.

69
00:04:04,610 --> 00:04:06,170
Recall that I specified

70
00:04:06,170 --> 00:04:08,870
the United States as the geographical
focus of my

71
00:04:08,870 --> 00:04:14,960
plastic bag life cycle and, finally, the
technological correlation indicator.

72
00:04:14,960 --> 00:04:16,870
This represents the correlation between
the

73
00:04:16,870 --> 00:04:20,530
technologies and/or technology mix
specified for a

74
00:04:20,530 --> 00:04:23,630
study and the technologies and, or
technology

75
00:04:23,630 --> 00:04:27,430
mix represented by the obtained inventory
data.

76
00:04:27,430 --> 00:04:31,510
For my plastic bag life cycle model, I
specify that a US average

77
00:04:31,510 --> 00:04:35,010
technology mix for my target time period
would be acceptable.

78
00:04:36,760 --> 00:04:39,610
For each of these five data quality
indicators, we

79
00:04:39,610 --> 00:04:41,770
assign a score for our studies data on a

80
00:04:41,770 --> 00:04:44,820
scale of one to five with one indicating
the

81
00:04:44,820 --> 00:04:49,550
highest data quality and five indicating
the lowest data quality.

82
00:04:49,550 --> 00:04:53,560
The resulting data pedigree matrix looks
like this, which allows

83
00:04:53,560 --> 00:04:56,760
one to see at a glance, how well the study
with

84
00:04:56,760 --> 00:05:01,510
respect to each data quality indicator,
and with respect to data quality overall.

85
00:05:03,090 --> 00:05:05,410
What you're seeing here, is the data
pedigree table

86
00:05:05,410 --> 00:05:08,850
that I completed for my plastic bag LCA
study.

87
00:05:08,850 --> 00:05:11,020
So, how did I determine what numerical
score

88
00:05:11,020 --> 00:05:14,120
to give my study for each data quality
indicator?

89
00:05:14,120 --> 00:05:19,310
Well, the data pedigree matrix comes with
guidelines for how to assign scores.

90
00:05:19,310 --> 00:05:22,470
For example, when scoring temporal
correlation,

91
00:05:22,470 --> 00:05:26,650
the guidance states that I should assign a
1 if my data are less that 3

92
00:05:26,650 --> 00:05:31,675
years old, or a 2 if my data are less than
6 years old, and so on.

93
00:05:31,675 --> 00:05:36,740
Let's take a look at the scoring guidance
for geographical correlation which

94
00:05:36,740 --> 00:05:39,830
states that I should assign a one if my
data are from the

95
00:05:39,830 --> 00:05:43,980
area under study at the other end of the
spectrum I should assign

96
00:05:43,980 --> 00:05:47,510
a five if the data are from an unknown
area or an area

97
00:05:47,510 --> 00:05:50,390
with completely different production
conditions.

98
00:05:50,390 --> 00:05:52,180
Since, all of the inventory data in my

99
00:05:52,180 --> 00:05:55,340
plastic bag LCA model are from the Unidted
States.

100
00:05:55,340 --> 00:05:56,740
I have assigned a score of one

101
00:05:56,740 --> 00:05:59,970
in my data quality assessment for
geographical correlation.

102
00:06:01,010 --> 00:06:03,290
Note also that I provided an explanation
for

103
00:06:03,290 --> 00:06:06,860
my chosen score for each data quality
indicator.

104
00:06:06,860 --> 00:06:10,220
This gives the audience some insight into
my rational

105
00:06:10,220 --> 00:06:12,990
Now they may or may not agree with my
rationale,

106
00:06:12,990 --> 00:06:15,520
but clearly documenting it is critical for

107
00:06:15,520 --> 00:06:18,200
ensuring my study is transparent in every
respect.

108
00:06:19,870 --> 00:06:23,300
We don't have time to go through the
guidelines for each category.

109
00:06:23,300 --> 00:06:26,440
So I've provided the full scoring guidance
table in the lecture notes.

110
00:06:27,530 --> 00:06:29,720
You'll use the same scoring guidance for

111
00:06:29,720 --> 00:06:32,340
assessing data quality in your bottled
soda LCA.

112
00:06:32,340 --> 00:06:35,670
As I mentioned earlier, in addition to

113
00:06:35,670 --> 00:06:38,960
providing important documentation for your
study's audience,

114
00:06:38,960 --> 00:06:42,640
the data quality assessment also forces us
to take a step back

115
00:06:42,640 --> 00:06:46,990
and carefully review just how well we did
with respect to data quality.

116
00:06:46,990 --> 00:06:49,810
While hopefully you won't identify major
data issues at

117
00:06:49,810 --> 00:06:53,760
this point You might often find that, in
retrospect, the

118
00:06:53,760 --> 00:06:57,090
data you compiled did not perfectly
satisfy the data quality

119
00:06:57,090 --> 00:07:00,650
requirements that you laid out during the
scope definition step.

120
00:07:00,650 --> 00:07:04,240
If this happens, what should you do?
Well,

121
00:07:04,240 --> 00:07:06,990
one option is to go back and try to
compile better

122
00:07:06,990 --> 00:07:10,920
data to strengthen quality with respect to
one or more indicators.

123
00:07:10,920 --> 00:07:13,430
But, this isn't always possible.

124
00:07:13,430 --> 00:07:15,510
Another option is to go back and adjust

125
00:07:15,510 --> 00:07:18,090
your goal and scope definition so that
your stated

126
00:07:18,090 --> 00:07:21,570
application, purpose, and audience are
still valid, in

127
00:07:21,570 --> 00:07:25,760
light of your data quality limitations,
whatever the action.

128
00:07:25,760 --> 00:07:29,730
Hopefully, you can now see the value in
data quality assessment, to the analyst

129
00:07:29,730 --> 00:07:32,720
And to the audience.
I'll see you next time.


