1
00:00:00,360 --> 00:00:05,820
Hello and welcome to this new section which is data analysis using Python.

2
00:00:06,030 --> 00:00:10,320
So in this section we will learn how to perform data analysis on a set of data.

3
00:00:10,440 --> 00:00:16,440
So this is an intellectual when we will learn what is data analysis and why Python is actually used

4
00:00:16,440 --> 00:00:18,190
to perform data analysis.

5
00:00:19,920 --> 00:00:24,080
So let's first go ahead and understand what actually data analysis is.

6
00:00:24,210 --> 00:00:30,960
So the simple definition of data analysis is analyzing data from multiple sources to extract meaningful

7
00:00:30,960 --> 00:00:32,280
information.

8
00:00:32,280 --> 00:00:37,920
So in data analysis what we do is that we don't mind some data source which we have to analyze and that

9
00:00:37,920 --> 00:00:43,290
is let's say you have a website or an e-commerce Web site which is going to generate tons of data on

10
00:00:43,290 --> 00:00:44,640
a daily basis.

11
00:00:44,670 --> 00:00:50,370
So if you want to analyze that data you simply extract data from that source you perform data analysis

12
00:00:50,370 --> 00:00:54,550
on it using some programming using some language like Python.

13
00:00:54,660 --> 00:01:00,210
And then after doing a data analysis you get some meaningful information which you could use to make

14
00:01:00,210 --> 00:01:01,740
better decisions.

15
00:01:01,770 --> 00:01:05,480
So that is the basic and the most simplest definition of data analysis.

16
00:01:05,489 --> 00:01:08,250
And obviously we could go into more complicated stuff.

17
00:01:08,340 --> 00:01:12,020
But for now this is what the data analysis is.

18
00:01:12,300 --> 00:01:16,230
Now let's learn why didn't Alliss is required.

19
00:01:16,260 --> 00:01:20,950
So the first and foremost reason is that it helps us make better decisions.

20
00:01:20,970 --> 00:01:27,810
So a lot of organizations like let's say the e-commerce Web site Amazon has to make business decisions

21
00:01:27,840 --> 00:01:29,210
on a daily basis.

22
00:01:29,280 --> 00:01:33,720
So these organizations actually rely upon the data which they get.

23
00:01:33,720 --> 00:01:39,570
So for example if you have a e-commerce Web site like Amazon there are going to be a ton of users visiting

24
00:01:39,570 --> 00:01:44,550
your Web site on a daily basis and what they are going to do is that they are going to generate a large

25
00:01:44,550 --> 00:01:46,200
amount of raw data.

26
00:01:46,200 --> 00:01:52,590
Now this can actually be analyzed and information could be drawn from that data and that helps Amazon

27
00:01:52,620 --> 00:01:57,660
or any e-commerce website for that matter to make better decisions.

28
00:01:57,660 --> 00:02:01,350
And the next most important thing is best use of available data.

29
00:02:01,430 --> 00:02:07,740
Now not only Amazon but any data intensive organization is going to produce a large number of data in

30
00:02:07,740 --> 00:02:12,300
the backend and throwing away that data does not make any sense.

31
00:02:12,330 --> 00:02:18,360
And thats why this data is actually analyzed so that some conclusions could be drawn from that data

32
00:02:18,390 --> 00:02:21,240
which could be more valuable to the organization.

33
00:02:21,240 --> 00:02:25,840
So this is the main purpose of data analysis and there would be a lot more.

34
00:02:25,860 --> 00:02:32,730
But the basic purpose of a data analysis or the basic significance of data analysis is to help make

35
00:02:32,730 --> 00:02:38,640
better decisions for organizations as well as to make the best use of the available data.

36
00:02:39,150 --> 00:02:41,220
So why data analysis is the future.

37
00:02:41,340 --> 00:02:46,730
So what Ive done here is that I've laid out the basic set up of an economist company or you could see

38
00:02:46,840 --> 00:02:51,840
e-commerce Web site and we will learn how data is used by this organization.

39
00:02:51,840 --> 00:02:56,520
So as you could see we have the e-commerce Web site at the center and on the left hand side you could

40
00:02:56,520 --> 00:02:57,520
see user.

41
00:02:57,600 --> 00:03:03,660
So what this user actually does is that he visits the e-commerce Web site and he basically browse through

42
00:03:03,690 --> 00:03:09,810
or goes through the commerce Web site he might view certain products he might buy certain products and

43
00:03:09,840 --> 00:03:13,790
he might carry out a bunch of activities on the e-commerce Web site.

44
00:03:13,800 --> 00:03:19,470
So what the e-commerce Web site actually does is that it generates a real user data that is nothing

45
00:03:19,470 --> 00:03:21,500
but the user's browsing history.

46
00:03:21,510 --> 00:03:24,580
It also stores the products viewed by the user.

47
00:03:24,600 --> 00:03:29,880
It also stores a bunch of things like which particular product they use bought and everything like that.

48
00:03:29,880 --> 00:03:35,400
So every time a user visits on an e-commerce Web site or our data is generated at the back end.

49
00:03:35,460 --> 00:03:39,510
And just imagine this thing with a big site like Amazon.

50
00:03:39,510 --> 00:03:45,710
So there are going to be a million users visiting per day on Amazon who generate raw data.

51
00:03:45,840 --> 00:03:51,900
And with this what we could do is that we could perform data analysis on the raw data and we would find

52
00:03:51,900 --> 00:03:57,560
certain things like which product is being sold and which category or actually the most viewed products

53
00:03:57,570 --> 00:04:03,000
by the user which are the trending products which products they use are not buying and everything like

54
00:04:03,000 --> 00:04:03,550
that.

55
00:04:03,570 --> 00:04:10,320
So you could use that information that is let's assume that from the data analysis we came to know that

56
00:04:10,350 --> 00:04:17,089
Apple products are being sold most in the mobile category so this information can be used to make decisions

57
00:04:17,100 --> 00:04:21,339
like OK as the users are buying Apple products in the mobile category.

58
00:04:21,450 --> 00:04:26,760
Let's place that product at the top and let's place the other products at the bottom.

59
00:04:26,760 --> 00:04:34,290
So in such a way that raw data can be converted into useful information using data analysis and later

60
00:04:34,380 --> 00:04:38,400
that information can be used to make logical conclusions.

61
00:04:38,400 --> 00:04:43,210
And depending upon those conclusions we could optimize e-commerce website.

62
00:04:43,290 --> 00:04:49,230
Now this was one of the most simplest example of data analysis and obviously Ive always simplified things

63
00:04:49,230 --> 00:04:49,570
here.

64
00:04:49,620 --> 00:04:55,410
A lot of them things actually happen in the back end but this is what generally happens when you perform

65
00:04:55,410 --> 00:04:57,370
data analysis on the raw data.

66
00:04:57,540 --> 00:05:02,400
So now the next thing which we will learn is that why Python for data analysis.

67
00:05:02,430 --> 00:05:07,920
So actually there are a lot of options to perform data analysis you could use tons of tools or tons

68
00:05:07,920 --> 00:05:10,140
of other languages as well.

69
00:05:10,320 --> 00:05:16,310
But the reason for using Python for the analyst is that its faster as compared to other alternatives.

70
00:05:16,320 --> 00:05:22,430
Now when you talk about data one obvious program comes into mind is Excel and Excel is actually good

71
00:05:22,430 --> 00:05:28,010
at analyzing data but Excel is only suited when that data is ready less.

72
00:05:28,310 --> 00:05:33,920
So let's see if you have hundred rows of data that's ok you could analyze that with Excel but then you

73
00:05:33,920 --> 00:05:39,360
have a million user data like Amazon or any other e-commerce Web site for that matter.

74
00:05:39,380 --> 00:05:45,970
Then using axil makes no sense because it's obviously too slow and there are not a lot of tools in Excel

75
00:05:46,090 --> 00:05:49,450
which you could use to perform meaningful data analysis.

76
00:05:49,460 --> 00:05:53,860
So Python is comparatively much faster as compared to excel or the Internet.

77
00:05:53,900 --> 00:05:58,580
And that's why Python is actually used to perform data analysis.

78
00:05:58,580 --> 00:06:03,550
The next most important thing about Python is that it has a simple syntax.

79
00:06:03,620 --> 00:06:09,560
No you won't if someone is not a programmer or if he's not from the computer science background even

80
00:06:09,560 --> 00:06:13,670
then he could learn Python because it has a simpler syntax.

81
00:06:13,670 --> 00:06:19,810
And that's another reason why most people prefer Python for performing data analysis and the most important

82
00:06:19,820 --> 00:06:26,820
and the last reason for using Python for the Donnellys is that it has multiple libraries for data analysis.

83
00:06:26,840 --> 00:06:33,620
So Python actually has a bunch of tools or a bunch of options for the Daniels's like by Bandar's and

84
00:06:33,620 --> 00:06:34,520
everything like that.

85
00:06:34,520 --> 00:06:39,650
So we're actually going to learn about non-plan pandas and the upcoming lectures as well.

86
00:06:39,650 --> 00:06:44,370
So this is the main reason why we choose Python mostly for data analysis.

87
00:06:44,660 --> 00:06:50,660
So that's it for this lecture and hopefully you guys will do understand what the time analysis is how

88
00:06:50,660 --> 00:06:57,650
data analysis is performed and what is the significance of a data analysis and also why Python is chosen

89
00:06:57,650 --> 00:07:00,920
as the best language for analyzing data.

90
00:07:00,920 --> 00:07:05,010
So thank you very much for watching and I'll see you guys next time.

91
00:07:05,150 --> 00:07:05,650
Thank you.


