1
00:00:00,510 --> 00:00:07,140
So let's move forward and visualize the data we have, so here I'm visualizing some of the traffic signs

2
00:00:07,140 --> 00:00:07,680
from the data.

3
00:00:09,250 --> 00:00:11,100
And let's see how it looks.

4
00:00:14,460 --> 00:00:16,890
So here are some of the images from the dataset.

5
00:00:17,810 --> 00:00:23,860
And as you can see here, the dimension of the images are not seen for all the images.

6
00:00:23,880 --> 00:00:28,220
So as the size of the images are different, we have to make them equal.

7
00:00:28,370 --> 00:00:35,420
So we will take the mean of these images here and finding out the mean of these images so we can we

8
00:00:35,420 --> 00:00:40,060
already know that there are forty three classes in this graphic, same dataset.

9
00:00:40,430 --> 00:00:43,090
So I'm running a for loop from zero to 43.

10
00:00:43,100 --> 00:00:44,270
That is what it is.

11
00:00:44,990 --> 00:00:48,040
And I'll take each image and find out the meaning of that image.

12
00:00:48,090 --> 00:00:51,140
I mentioned one and I mentioned to and will store it.

13
00:00:52,840 --> 00:00:57,730
Let's run this and store the dimension of these images to find out the meaning of these images.

14
00:00:58,090 --> 00:01:01,150
It's running right now, so we have stored this.

15
00:01:01,570 --> 00:01:05,670
Next, we will find out the meaning of these dimensions and print them out.

16
00:01:07,790 --> 00:01:16,430
So here we can see that 50 by 50 is the average for all the images here, we can see that the damage

17
00:01:16,430 --> 00:01:20,720
and one win is fifty point three two and damage in two minutes, fifty point three.

18
00:01:20,990 --> 00:01:23,840
So we can conclude that 50 50 is the average.

19
00:01:24,350 --> 00:01:28,340
Now, we will convert all these images into the ship, 50 by 50.

20
00:01:30,330 --> 00:01:35,510
For that, I'm using images, an empty list also unlabelled is an empty list.

21
00:01:37,400 --> 00:01:45,230
And we ship all these images and also store their label ideas for this, we have we are running a loop

22
00:01:45,230 --> 00:01:49,960
here for forty three classes and storing all the images, resizing them.

23
00:01:49,970 --> 00:01:51,890
And also there is also.

24
00:01:52,940 --> 00:01:59,970
Mr. Anderson, and convert these images or resize these images into the desired dimensions.

25
00:01:59,990 --> 00:02:03,040
That is 50 by 50 and also storing the.

26
00:02:06,200 --> 00:02:12,890
It's done so the next step would be to convert these images into no error and also to normalize them.

27
00:02:12,890 --> 00:02:19,070
So in order to normalize an image, we will divide it by two fifty five because the value of each image

28
00:02:19,070 --> 00:02:20,700
ranges from zero to 255.

29
00:02:20,930 --> 00:02:25,730
So we will divide each image by two fifty five to normalize them and to convert them into array will

30
00:02:25,730 --> 00:02:26,940
simply use the function.

31
00:02:26,960 --> 00:02:32,960
And so let's Grandison and convert their images into numbers and normalize them.

32
00:02:35,540 --> 00:02:36,230
So it's done.

33
00:02:38,050 --> 00:02:42,760
Similarly will store the label I listened to today and will see the shape of it.

34
00:02:45,210 --> 00:02:52,260
So here we can observe that there are three nine two zero nine ladies, similarly will print the shape

35
00:02:52,260 --> 00:02:53,130
of the images.

36
00:02:55,330 --> 00:03:02,490
So here we can see that there are three nine two zero nine images with the shape of fifty to fifty three.

37
00:03:03,010 --> 00:03:07,900
So her three is the channel, which means that they are colored images in AGP format.

38
00:03:07,930 --> 00:03:09,220
Let's move forward now.

39
00:03:09,220 --> 00:03:13,900
We'll visualize the number of the class to find out if the data is balanced or not.

40
00:03:13,960 --> 00:03:19,740
So here we can see that data is around twenty two hundred in each class.

41
00:03:19,750 --> 00:03:23,380
So we can say that the data is not imbalanced, data is balanced.

42
00:03:23,390 --> 00:03:25,190
We do not need to balance the data.

43
00:03:25,210 --> 00:03:26,500
Data is already balanced.

44
00:03:27,510 --> 00:03:30,210
Now we'll split that data using the test transmit.

45
00:03:33,060 --> 00:03:38,040
Into a ratio of 80 percent to ending data and 20 percent validation later.

46
00:03:40,980 --> 00:03:48,600
Similarly will convert the training data that will convert the label, that is which class it belongs

47
00:03:48,600 --> 00:03:52,570
to and do not including using too categorical function.

48
00:03:53,220 --> 00:03:53,910
So why not?

49
00:03:53,910 --> 00:03:58,910
Encoding is important because if we don't apply, we're not including the machine will prioritize it.

50
00:03:58,920 --> 00:04:03,530
So to avoid prioritization, we apply, but not encoding, so will apply.

51
00:04:03,540 --> 00:04:09,300
We're not encoding on the training and the validation data, the target variable that is in categorical

52
00:04:09,300 --> 00:04:10,500
and well, categorical.

