﻿1
00:00:01,210 --> 00:00:04,920
Let's kick off our exploratory data analysis.

2
00:00:07,390 --> 00:00:09,280
Exploratory data analysis.

3
00:00:09,630 --> 00:00:10,880
And we are going to do two things.

4
00:00:11,190 --> 00:00:13,560
So the first one is that we're going to

5
00:00:13,570 --> 00:00:18,500
do a correlation between caret and price.

6
00:00:19,210 --> 00:00:21,980
And that will be super simple.

7
00:00:22,110 --> 00:00:32,640
So data frame, we get the caret and then
we get the price and we do dot core and

8
00:00:32,650 --> 00:00:33,520
here you go.

9
00:00:34,350 --> 00:00:37,460
We see that the correlation is 0 .94

10
00:00:37,470 --> 00:00:39,320
which is huge, right?

11
00:00:40,150 --> 00:00:42,440
Usually there can be an issue when you

12
00:00:42,450 --> 00:00:47,740
have such high correlations because you
cannot really distinguish between both

13
00:00:47,750 --> 00:00:53,340
because there's something called mutual
causality bias which is when A leads to B

14
00:00:53,350 --> 00:00:54,340
and B leads to A.

15
00:00:55,170 --> 00:00:57,460
However, in this case I think this is

16
00:00:57,470 --> 00:00:59,040
just one direction.

17
00:00:59,270 --> 00:01:02,380
I think the caret leads to the price and

18
00:01:02,390 --> 00:01:07,100
the price does not really lead to the
caret because it doesn't make sense that

19
00:01:07,110 --> 00:01:12,220
the bigger the price then the caret
increases because you cannot really make

20
00:01:12,230 --> 00:01:13,300
it happen like that.

21
00:01:13,570 --> 00:01:16,060
But if you have a bigger caret then you

22
00:01:16,070 --> 00:01:17,360
have a higher price.

23
00:01:17,950 --> 00:01:21,600
Or in other ways of putting it, you need

24
00:01:21,610 --> 00:01:23,560
to start somewhere, right?

25
00:01:23,630 --> 00:01:25,660
And I think you need to start with having

26
00:01:25,670 --> 00:01:32,260
a bigger caret that will then lead to a
higher price because the stone happened

27
00:01:32,270 --> 00:01:33,540
first, not the price.

28
00:01:34,650 --> 00:01:37,160
Therefore, the conclusion is that our

29
00:01:37,170 --> 00:01:43,040
variables are very much tied together
which is good from predictability perspective.

30
00:01:43,690 --> 00:01:50,020
Let me isolate x and y because this will
be easier as we go through the practice videos.

31
00:01:50,270 --> 00:01:58,800
Isolate x and y. So the y equals our data
frame dot price and then the x equals

32
00:01:58,810 --> 00:02:00,960
data frame dot caret.

33
00:02:01,690 --> 00:02:02,420
Here we go.

34
00:02:02,770 --> 00:02:03,400
Control enter.

35
00:02:04,090 --> 00:02:06,240
And I'm getting something.

36
00:02:06,510 --> 00:02:08,980
Object has no attribute price.

37
00:02:09,770 --> 00:02:13,220
That's because it is with a lowercase p

38
00:02:13,230 --> 00:02:14,720
and a lowercase c.

39
00:02:15,830 --> 00:02:18,680
Control enter and here we go.

40
00:02:19,250 --> 00:02:21,620
Now we can do our scatter plot.

41
00:02:22,790 --> 00:02:28,740
In its simplest form scatter plot plt dot

42
00:02:28,750 --> 00:02:36,600
scatter and then we include x and y and
then plt dot show.

43
00:02:36,910 --> 00:02:43,100
So this is the simplest form and I can
add as well that we are going to

44
00:02:43,110 --> 00:02:47,060
definitely do some or add something to it.

45
00:02:47,430 --> 00:02:51,040
Okay, so it is a lowercase y that I need

46
00:02:51,050 --> 00:02:51,780
to include here.

47
00:02:52,490 --> 00:02:55,780
So this is the simplest form of the

48
00:02:55,790 --> 00:02:57,000
scatter plot.

49
00:02:57,230 --> 00:03:00,820
But let's add a bit to it, right?

50
00:03:00,910 --> 00:03:12,060
So we can add plt dot x label and then
this is the caret plt dot y label and

51
00:03:12,070 --> 00:03:20,640
this is the price and then we can add a
title plt dot title and then this would

52
00:03:20,650 --> 00:03:27,060
be the relationship of caret and price.

53
00:03:28,770 --> 00:03:29,520
And here we go.

54
00:03:29,910 --> 00:03:32,220
I have something here.

55
00:03:32,470 --> 00:03:36,240
It should be a dot and here we go.

56
00:03:36,290 --> 00:03:40,420
So here I'm customizing the graph.

57
00:03:41,250 --> 00:03:42,500
Here we go.

58
00:03:43,410 --> 00:03:46,440
Control enter and here we go.

59
00:03:46,550 --> 00:03:49,640
So you can see that it's pretty linear,

60
00:03:49,970 --> 00:03:52,740
the relationship, so it really goes in
one direction.

61
00:03:53,190 --> 00:03:57,860
You can see that when there is, or when
caret equals one, there are a few

62
00:03:57,870 --> 00:04:03,980
outliers, or the more carets you have, it
kind of explodes, which would possibly

63
00:04:03,990 --> 00:04:09,340
mean that we actually need a few of the
other variables like color clarity or

64
00:04:09,350 --> 00:04:15,800
certification to help us explain these
more extreme values where it is exploding.

65
00:04:16,870 --> 00:04:23,760
But one conclusion is that it is pretty
linear and this is good to see because it

66
00:04:23,770 --> 00:04:28,980
helps us again to say that we can use a
linear regression to predict and it also

67
00:04:28,990 --> 00:04:29,700
makes sense, right?

68
00:04:29,790 --> 00:04:32,200
So the bigger the caret, then caret is

69
00:04:32,210 --> 00:04:36,300
the size of the diamond, then the bigger
the price.

70
00:04:36,750 --> 00:04:38,360
So this is also connected.

71
00:04:38,930 --> 00:04:42,100
We then see an inverse relationship somehow.

72
00:04:43,310 --> 00:04:44,800
And this is it.

73
00:04:45,090 --> 00:04:47,720
This is a very simple exploratory data

74
00:04:47,730 --> 00:04:53,000
analysis and in the next video we're
going to focus on the linear regression.

75
00:04:53,490 --> 00:04:54,860
Until then, have fun.



