﻿1
00:00:00,610 --> 00:00:06,700
‫Welcome back, students, and in this lecture, we are going to talk about how to improve data in our

2
00:00:07,300 --> 00:00:16,120
‫so specifically, we are going to talk how to import three of the most used formats of the data to our

3
00:00:16,150 --> 00:00:26,690
‫sources, say, three files, which are easiest to import to our Excel files, files and text text files.

4
00:00:27,520 --> 00:00:28,700
‫So let's get started.

5
00:00:29,140 --> 00:00:36,700
‫The first thing to do is to actually to set the directory where your data is.

6
00:00:37,000 --> 00:00:42,690
‫And this is what we are doing with said to video function you are familiar already with.

7
00:00:42,700 --> 00:00:50,290
‫So let's run this code and after I run this code, I can actually input my data.

8
00:00:50,830 --> 00:00:56,380
‫The easiest way to input my data and my data is here.

9
00:00:56,380 --> 00:01:07,210
‫It's an Excel file, a very simple one, having thirty three rows and two columns showing the number

10
00:01:07,210 --> 00:01:12,200
‫of people and their average training in the gym time per week.

11
00:01:12,760 --> 00:01:20,260
‫So this is our data and I have just duplicated it in three format, which is GSV Excel.

12
00:01:20,290 --> 00:01:26,000
‫And Texte, just for your just for purpose of training.

13
00:01:26,650 --> 00:01:35,510
‫So the easiest way would be just to read in CSFI file to specify that the header is text.

14
00:01:35,590 --> 00:01:43,360
‫So are basically then I would know that the first row is actually text.

15
00:01:44,780 --> 00:01:57,290
‫Let's execute this piece of code, and with this one we have read in our data had shows us the first

16
00:01:57,530 --> 00:02:01,380
‫rows on a preview of our data.

17
00:02:01,400 --> 00:02:06,860
‫Let's put it this way and summary function summarized statistics of our data.

18
00:02:07,280 --> 00:02:15,890
‫So this was a pretty simple example, but sometimes our data comes in a format not that nicely structured

19
00:02:15,890 --> 00:02:16,570
‫as here.

20
00:02:16,580 --> 00:02:26,870
‫So you if in case your data is not separated by columns but separated by commas or semi columns, then

21
00:02:27,140 --> 00:02:32,160
‫you could there is an option to specify this in arguments here.

22
00:02:32,180 --> 00:02:36,770
‫I have just give you provided a piece of code for the case.

23
00:02:36,770 --> 00:02:44,810
‫If you have a different separation, not a tap, but commas or semicolon, then you basically have to

24
00:02:44,810 --> 00:02:48,310
‫just specify the arguments in that.

25
00:02:48,480 --> 00:02:51,170
‫It seems to be a function.

26
00:02:52,040 --> 00:03:03,380
‫Then next example is how to read in the text file and we do it with a function called Read Table, a

27
00:03:04,340 --> 00:03:05,370
‫table data.

28
00:03:05,990 --> 00:03:11,000
‫And here, I mean, the example is exemplary.

29
00:03:11,000 --> 00:03:12,280
‫Data is the same.

30
00:03:12,290 --> 00:03:23,120
‫I just copy paste it into the text file and we do it also with a specified header as text in case you

31
00:03:23,120 --> 00:03:27,930
‫need more arguments for the function to have different separate separator.

32
00:03:27,970 --> 00:03:35,180
‫You can always look at the arguments that are possible to specify in this function.

33
00:03:35,540 --> 00:03:41,330
‫So let's run this piece of code and we say that we are very successful in reading it.

34
00:03:41,330 --> 00:03:43,670
‫Also text file into our.

35
00:03:45,410 --> 00:03:54,530
‫By the way, you can always see your data here, so this example one, if you click on it, then the

36
00:03:54,530 --> 00:04:01,520
‫data appears in a separate tab and you can also view it directly in our.

37
00:04:05,430 --> 00:04:16,610
‫Now, let's read in Excel data, in order to do this, we need a new package which is called Read Excel.

38
00:04:16,980 --> 00:04:26,520
‫So in case you haven't installed it, please, on an this line and install this package.

39
00:04:26,880 --> 00:04:37,170
‫Since I did install it, I just going to read in this package and basically here is a function which

40
00:04:37,170 --> 00:04:37,710
‫we need.

41
00:04:37,710 --> 00:04:39,990
‫It's called Read Excel.

42
00:04:40,380 --> 00:04:43,230
‫And here don't specify any arguments.

43
00:04:43,230 --> 00:04:51,390
‫But similarly to the previous examples, you can look up a dysfunction and see what the arguments are.

44
00:04:51,390 --> 00:05:01,700
‫The and depending on your data, it might be useful to basically to specify some argument.

45
00:05:01,980 --> 00:05:03,600
‫So the result is the same.

46
00:05:03,900 --> 00:05:13,680
‫So in such a way, we have very fast get an overview how to retain the data interacts into our studio.


