1
00:00:07,289 --> 00:00:11,888
>> Because adding potential confounding
variables towards statistical model can

2
00:00:11,888 --> 00:00:16,418
help us to gain a deeper understanding
of the relationship between variables or

3
00:00:16,418 --> 00:00:19,330
lead us to rethink an association.

4
00:00:19,330 --> 00:00:24,092
It's important to learn about statistical
tools that will allow us to examine

5
00:00:24,092 --> 00:00:28,274
multiple variables simultaneously,
that is look at more than two or

6
00:00:28,274 --> 00:00:30,381
three variables at the same time.

7
00:00:30,381 --> 00:00:35,238
The general purpose of multi variant
modeling techniques, such as multiple

8
00:00:35,238 --> 00:00:40,022
regression and logistic regression is
to learn more about the relationship

9
00:00:40,022 --> 00:00:44,372
between several explanatory variables and
one response variable.

10
00:00:44,372 --> 00:00:48,527
>> These regression procedures
are very widely used in research.

11
00:00:48,527 --> 00:00:52,787
In general, they allow us to ask and
hopefully answer the question,

12
00:00:52,787 --> 00:00:54,703
what is the best predictor of?

13
00:00:54,703 --> 00:00:59,405
And does variable A or variable B
confound the relationship between my

14
00:00:59,405 --> 00:01:03,560
explanatory variable of interest and
my response variable?

15
00:01:05,990 --> 00:01:09,180
>> For example, educational researchers

16
00:01:09,180 --> 00:01:12,610
might want to learn about the best
predictors of success in high school.

17
00:01:13,968 --> 00:01:18,140
Sociologists may wanna find out which
of the multiple social indicators

18
00:01:18,140 --> 00:01:19,500
best predict whether or

19
00:01:19,500 --> 00:01:22,619
not a new immigrant group will adapt
to their new country of residence.

20
00:01:23,700 --> 00:01:27,640
Biologists may want to find out which
factors, such as temperature or

21
00:01:27,640 --> 00:01:32,090
barometric pressure or humidity best
predict caterpillar reproduction.

22
00:01:33,440 --> 00:01:37,650
So how can multivariate models help
us to evaluate the presence or

23
00:01:37,650 --> 00:01:40,069
absence of confounding or
lurking variables?

24
00:01:41,690 --> 00:01:46,677
Since the difficulty arises because
of the lurking variables values being

25
00:01:46,677 --> 00:01:51,264
tied in with those of the explanatory
variable, one way to attempt to

26
00:01:51,264 --> 00:01:55,853
unravel the true nature of
the relationship between explanatory and

27
00:01:55,853 --> 00:02:00,941
response variables is to separate out
the effects of the lurking variable.

28
00:02:00,941 --> 00:02:04,685
>> You may have already identified
a significant relationship between

29
00:02:04,685 --> 00:02:07,082
your explanatory and response variables.

30
00:02:07,082 --> 00:02:11,726
And now want to think about whether this
is a real relationship or if instead,

31
00:02:11,726 --> 00:02:15,742
the relationship is confounded by one or
more lurking variables.

32
00:02:15,742 --> 00:02:21,631
>> For example, here's a graphical
association between birth order and

33
00:02:21,631 --> 00:02:27,141
number of cases of down syndrome
per 100,000 live births.

34
00:02:27,141 --> 00:02:31,668
As you can see, it looks like a linear
association where the first born

35
00:02:31,668 --> 00:02:35,749
in a family has the lowest
likelihood of having down syndrome.

36
00:02:37,350 --> 00:02:40,407
With later birth order up
to a fifth born child,

37
00:02:40,407 --> 00:02:44,241
there's increased risk of
being born with down syndrome.

38
00:02:44,241 --> 00:02:47,887
This is a statistically
significant association when

39
00:02:47,887 --> 00:02:51,692
analyzed via a chi square test
of independence with birth

40
00:02:51,692 --> 00:02:56,050
order as the categorical explanatory
variable in the presence or

41
00:02:56,050 --> 00:03:01,060
absence of down syndrome as the two
level categorical response variable.

42
00:03:03,197 --> 00:03:06,381
Another statistically
significant relationship is

43
00:03:06,381 --> 00:03:09,988
the association between maternal
age at a child's birth and

44
00:03:09,988 --> 00:03:13,326
the likelihood that the child
will have down syndrome.

45
00:03:13,326 --> 00:03:19,326
You can see here that babies of younger
women up to about the age of 29 or

46
00:03:19,326 --> 00:03:23,739
30 to 34 have really low
rates of down syndrome.

47
00:03:23,739 --> 00:03:30,229
Among mothers age 35 to 39 and older,
you see the rates are clearly higher.

48
00:03:32,225 --> 00:03:35,611
>> Remember, in the case of
the confounding variable the observed

49
00:03:35,611 --> 00:03:39,814
association with the response variable
should be attributed to the confounder

50
00:03:39,814 --> 00:03:42,860
rather than the explanatory variable.

51
00:03:42,860 --> 00:03:47,450
We test for confounders by including
these third variables or fourth or

52
00:03:47,450 --> 00:03:51,730
fifth in our statistical modules that
may explain the association of interest.

53
00:03:53,550 --> 00:03:58,594
>> In these examples, it's possible
that the association between a child's

54
00:03:58,594 --> 00:04:03,657
birth order and risk for down syndrome
could be confounded by maternal age.

55
00:04:03,657 --> 00:04:07,963
Alternately, the association
between maternal age and

56
00:04:07,963 --> 00:04:13,342
down syndrome might be confounded by
birth order or both birth order and

57
00:04:13,342 --> 00:04:17,917
maternal age might independently
predict the likelihood of

58
00:04:17,917 --> 00:04:22,782
a diagnosis of down syndrome
after controlling for each other.

59
00:04:22,782 --> 00:04:28,264
Here's a graph that answers this question
by showing that maternal age confounds

60
00:04:28,264 --> 00:04:33,349
the relationship between birth rank and
down syndrome and that it's really

61
00:04:33,349 --> 00:04:38,690
maternal age rather than birth rank,
that's associated with Down Syndrome.

62
00:04:41,452 --> 00:04:46,351
Here, you see birth order
along the horizontal axis.

63
00:04:46,351 --> 00:04:50,647
The maternal age groups
are along the z-axis.

64
00:04:50,647 --> 00:04:57,369
Then on the y-axis, we have cases of
down syndrome per 100,000 live births.

65
00:04:57,369 --> 00:05:03,079
If we look across birth order
separately for each maternal age group,

66
00:05:03,079 --> 00:05:09,871
we see that there really is no difference
in rates of downs syndrome by birth order.

67
00:05:09,871 --> 00:05:14,997
In other words, once we control for
the age of the mother that is examine

68
00:05:14,997 --> 00:05:20,123
the rates of down syndrome across
birth order but one maternal age group

69
00:05:20,123 --> 00:05:25,614
at a time, there's no association
between birth order and down syndrome.

70
00:05:25,614 --> 00:05:28,540
[MUSIC]

71
00:05:28,540 --> 00:05:32,708
If we look at rates of down
syndrome across maternal age for

72
00:05:32,708 --> 00:05:38,675
each individual birth order we see an
upward trend as maternal aging, Increases.

73
00:05:39,895 --> 00:05:42,093
So if you look across these colors,

74
00:05:42,093 --> 00:05:46,272
this is a great graphical representation
where we see that it isn't

75
00:05:46,272 --> 00:05:50,838
birth order that is associated with
down syndrome, it's maternal age.

76
00:05:50,838 --> 00:05:54,127
In other words,
once we control for birth order,

77
00:05:54,127 --> 00:05:58,918
there's still an association between
maternal age and down syndrome.

78
00:05:58,918 --> 00:06:02,916
Birth order does not confound
the relationship between maternal age and

79
00:06:02,916 --> 00:06:03,840
Down Syndrome.

80
00:06:03,840 --> 00:06:08,380
The relationship holds even after
controlling for birth order.
