1
00:00:00,170 --> 00:00:05,689
Hello and welcome to this new section, which is data analysis using Python.

2
00:00:05,689 --> 00:00:10,190
So in this section we will learn how to perform data analysis on a set of data.

3
00:00:10,190 --> 00:00:16,340
So this is an intro lecture when we will learn what is data analysis and why Python is actually used

4
00:00:16,340 --> 00:00:18,140
to perform data analysis.

5
00:00:19,550 --> 00:00:23,960
So let's first go ahead and understand what actually data analysis is.

6
00:00:23,990 --> 00:00:30,860
So the simple definition of data analysis is analyzing data from multiple sources to extract meaningful

7
00:00:30,860 --> 00:00:31,820
information.

8
00:00:31,850 --> 00:00:37,490
So in data analysis, what we do is that we determine some data source, which we have to analyze.

9
00:00:37,490 --> 00:00:43,010
That is, let's say you have a website or a e-commerce website which is going to generate tons of data

10
00:00:43,010 --> 00:00:44,350
on a daily basis.

11
00:00:44,360 --> 00:00:50,270
So if you want to analyze that data, you simply extract data from that source, you perform data analysis

12
00:00:50,270 --> 00:00:54,320
on it using some programming, using some language like Python.

13
00:00:54,320 --> 00:01:00,140
And then after doing data analysis, you get some meaningful information which you could use to make

14
00:01:00,140 --> 00:01:01,400
better decisions.

15
00:01:01,400 --> 00:01:05,330
So that is the basic and the most simplest definition of data analysis.

16
00:01:05,330 --> 00:01:08,090
And obviously we could go into more complicated stuff.

17
00:01:08,090 --> 00:01:10,880
But for now this is what data analysis is.

18
00:01:11,900 --> 00:01:15,930
Now let's learn why data analysis is required.

19
00:01:15,950 --> 00:01:20,670
So the first and foremost reason is that it helps us make better decisions.

20
00:01:20,670 --> 00:01:27,780
So a lot of organizations like, let's say, the e-commerce website Amazon has to make business decisions

21
00:01:27,780 --> 00:01:29,160
on a daily basis.

22
00:01:29,160 --> 00:01:33,630
So these organizations actually rely upon the data which they get.

23
00:01:33,630 --> 00:01:39,090
So for example, if you have a e-commerce website like Amazon, there are going to be a tons of users

24
00:01:39,090 --> 00:01:41,370
visiting your website on a daily basis.

25
00:01:41,370 --> 00:01:45,870
And what they are going to do is that they are going to generate a large amount of fraud data.

26
00:01:45,900 --> 00:01:51,780
Now this raw data can actually be analyzed and information could be drawn out from that data and that

27
00:01:51,780 --> 00:01:57,090
helps Amazon or any e-commerce website for that matter, to make better decisions.

28
00:01:57,330 --> 00:02:01,290
And the next most important thing is best use of available data.

29
00:02:01,290 --> 00:02:07,500
Now, not only Amazon, but any data intensive organization is going to produce a large number of data

30
00:02:07,500 --> 00:02:12,240
in the back end, and throwing away that data does not make any sense.

31
00:02:12,240 --> 00:02:18,330
And that's why this data is actually analyzed so that some conclusions could be drawn from that data

32
00:02:18,330 --> 00:02:21,240
which could be more valuable to the organization.

33
00:02:21,240 --> 00:02:25,800
So this is the main purpose of data analysis and there could be a lot more.

34
00:02:25,800 --> 00:02:32,730
But the basic purpose of a data analysis or the basic significance of data analysis is to help make

35
00:02:32,730 --> 00:02:38,430
better decisions for organizations as well as to make the best use of the available data.

36
00:02:38,850 --> 00:02:41,220
So why data analysis is the future?

37
00:02:41,220 --> 00:02:46,500
So what I've done here is that I've laid out the basic setup of an e-commerce company, or you could

38
00:02:46,500 --> 00:02:51,570
say a e-commerce website, and we will learn how data is used by this organization.

39
00:02:51,570 --> 00:02:56,520
So as you could see, we have the e-commerce website at the center, and on the left hand side you could

40
00:02:56,520 --> 00:02:57,480
see a user.

41
00:02:57,480 --> 00:03:03,660
So what this user actually does is that he visits the e-commerce website and he basically browse through

42
00:03:03,660 --> 00:03:05,490
or goes through the e-commerce website.

43
00:03:05,490 --> 00:03:11,880
He might view certain products, he might buy certain products and he might carry out a bunch of activities

44
00:03:11,880 --> 00:03:13,590
on the e-commerce website.

45
00:03:13,590 --> 00:03:19,650
So what the e-commerce website actually does is that it generates raw user data that is nothing but

46
00:03:19,650 --> 00:03:21,360
the user's browsing history.

47
00:03:21,360 --> 00:03:24,420
It also stores the products viewed by the user.

48
00:03:24,420 --> 00:03:29,700
It also stores a bunch of things like which particular product the user bought and everything like that.

49
00:03:29,700 --> 00:03:35,310
So every time a user visits on a e-commerce website, raw data is generated at the back end.

50
00:03:35,310 --> 00:03:39,300
And just imagine this thing with a big site like Amazon.

51
00:03:39,300 --> 00:03:45,600
So there are going to be a million users visiting per day on Amazon who generate a raw data.

52
00:03:45,600 --> 00:03:51,270
And with this raw data, what we could do is that we could perform data analysis on this raw data and

53
00:03:51,270 --> 00:03:56,220
we could find certain things like which product is being sold in which category, what are actually

54
00:03:56,220 --> 00:04:01,320
the most viewed products by the user, which are the trending products, which products the user are

55
00:04:01,320 --> 00:04:03,450
not buying and everything like that.

56
00:04:03,450 --> 00:04:05,610
So you could use that information.

57
00:04:05,610 --> 00:04:12,000
That is, let's assume that from the data analysis we came to know that Apple products are being sold

58
00:04:12,000 --> 00:04:14,220
most in the mobile category.

59
00:04:14,220 --> 00:04:20,190
So this information can be used to make decisions like as the users are buying Apple products in the

60
00:04:20,190 --> 00:04:26,310
mobile category, let's place that product at the top and let's place the other products at the bottom.

61
00:04:26,460 --> 00:04:34,290
So in such a way that raw data can be converted into useful information using data analysis and later

62
00:04:34,290 --> 00:04:38,280
that information can be used to make logical conclusions.

63
00:04:38,280 --> 00:04:43,020
And depending upon those conclusions, we could optimize our e-commerce website.

64
00:04:43,050 --> 00:04:48,900
Now, this was one of the most simplest example of data analysis, and obviously I have oversimplified

65
00:04:48,900 --> 00:04:49,560
things here.

66
00:04:49,590 --> 00:04:52,290
A lot of them things actually happen in the back end.

67
00:04:52,290 --> 00:04:57,330
But this is what generally happens when you perform data analysis on the raw data.

68
00:04:57,330 --> 00:05:02,190
So now the next thing which we will learn is that why Python for data analysis?

69
00:05:02,190 --> 00:05:05,550
So actually there are a lot of options to perform data analysis.

70
00:05:05,550 --> 00:05:10,050
You could use a tons of tools or a tons of or other languages as well.

71
00:05:10,050 --> 00:05:16,080
But the reason for using Python for data analysis is that it's faster as compared to other alternatives.

72
00:05:16,110 --> 00:05:18,930
Now, when you talk about data, one obvious program.

73
00:05:18,930 --> 00:05:20,930
Which comes into mind is Excel.

74
00:05:20,940 --> 00:05:27,810
And Excel is actually good at analyzing data, but Excel is only suited when the data is very less.

75
00:05:27,960 --> 00:05:31,290
So let's say if you have 100 rows of data, that's okay.

76
00:05:31,290 --> 00:05:37,890
You could analyze that with Excel, but when you have a million user data like Amazon or any other e-commerce

77
00:05:37,890 --> 00:05:44,280
website for that matter, then using Excel makes no sense because it's obviously too slow and there

78
00:05:44,280 --> 00:05:49,230
are not a lot of tools in Excel which which you could use to perform meaningful data analysis.

79
00:05:49,230 --> 00:05:53,730
So Python is comparatively much faster as compared to Excel or other alternatives.

80
00:05:53,730 --> 00:05:58,050
And that's why Python is actually used to perform data analysis.

81
00:05:58,140 --> 00:06:03,420
The next most important thing about Python is that it has simple syntax.

82
00:06:03,420 --> 00:06:09,480
Now, even if someone is not a programmer or if he's not from the computer science background, even

83
00:06:09,480 --> 00:06:13,260
then he could learn Python because it has a simpler syntax.

84
00:06:13,260 --> 00:06:18,240
And that's another reason why most people prefer Python for performing data analysis.

85
00:06:18,240 --> 00:06:24,690
And the most important and the last reason for using Python for data analysis is that it has multiple

86
00:06:24,690 --> 00:06:26,580
libraries for data analysis.

87
00:06:26,580 --> 00:06:33,330
So Python actually has a bunch of tools or a bunch of options for data analysis like numpy, pandas

88
00:06:33,330 --> 00:06:34,500
and everything like that.

89
00:06:34,500 --> 00:06:39,330
So we are actually going to learn about numpy and pandas in the upcoming lectures as well.

90
00:06:39,330 --> 00:06:44,340
So this is the main reason why we choose Python mostly for data analysis.

91
00:06:44,340 --> 00:06:46,290
So that's it for this lecture.

92
00:06:46,290 --> 00:06:52,380
And hopefully you guys were able to understand what data analysis is, how data analysis is performed

93
00:06:52,380 --> 00:06:58,770
and what is the significance of a data analysis, and also why Python is chosen as the best language

94
00:06:58,770 --> 00:07:00,600
for analyzing data.

95
00:07:00,600 --> 00:07:05,040
So thank you very much for watching and I'll see you guys next time.

96
00:07:05,040 --> 00:07:05,880
Thank you.


