1
00:00:00,001 --> 00:00:07,008
(upbeat music)

2
00:00:07,008 --> 00:00:10,006
- [Instructor] Evolutionary fuzzing is the state of the art.

3
00:00:10,006 --> 00:00:13,007
It's an application of genetic algorithms

4
00:00:13,007 --> 00:00:17,009
to make smarter and better and faster fuzzers.

5
00:00:17,009 --> 00:00:19,009
To understand how it works,

6
00:00:19,009 --> 00:00:22,002
let's talk about genetic algorithms

7
00:00:22,002 --> 00:00:25,005
and how they model evolution.

8
00:00:25,005 --> 00:00:26,007
In a genetic algorithm,

9
00:00:26,007 --> 00:00:29,001
we have something called a population,

10
00:00:29,001 --> 00:00:30,009
which consists of individuals.

11
00:00:30,009 --> 00:00:34,008
In our case, individuals are fuzzing inputs.

12
00:00:34,008 --> 00:00:38,003
These individuals have a fitness function,

13
00:00:38,003 --> 00:00:40,006
which is a measure of how good they are.

14
00:00:40,006 --> 00:00:43,009
For fuzzing, it might be some metric we are using

15
00:00:43,009 --> 00:00:46,007
to assess how good an input is.

16
00:00:46,007 --> 00:00:48,007
So we might say that an input is good

17
00:00:48,007 --> 00:00:50,006
when it causes a crash,

18
00:00:50,006 --> 00:00:54,009
or we can say that it's good when it covers the most code,

19
00:00:54,009 --> 00:00:57,002
or it's good when it covers the most new code,

20
00:00:57,002 --> 00:00:59,005
or some other variation.

21
00:00:59,005 --> 00:01:03,008
The less fit individuals get called out, they die,

22
00:01:03,008 --> 00:01:06,007
whereas the more fit ones remain.

23
00:01:06,007 --> 00:01:08,003
In fuzzing language just means

24
00:01:08,003 --> 00:01:10,008
that inputs that we are not interested in,

25
00:01:10,008 --> 00:01:14,006
that have low fitness, we're not going to be reusing.

26
00:01:14,006 --> 00:01:17,003
These individuals that do remain,

27
00:01:17,003 --> 00:01:19,009
that survive because of high fitness,

28
00:01:19,009 --> 00:01:21,008
then reproduce.

29
00:01:21,008 --> 00:01:24,006
So you pick pairs of individuals,

30
00:01:24,006 --> 00:01:29,007
and those will reproduce to create a hybrid,

31
00:01:29,007 --> 00:01:32,007
or genetically speaking,

32
00:01:32,007 --> 00:01:36,000
they'll be combination of their genes.

33
00:01:36,000 --> 00:01:40,008
For fuzzing, we might take our two inputs

34
00:01:40,008 --> 00:01:44,002
that scored high on our metric

35
00:01:44,002 --> 00:01:47,004
and then combine them somehow.

36
00:01:47,004 --> 00:01:52,000
For instance, we might take the initial segment of one

37
00:01:52,000 --> 00:01:54,009
and replace the initial segment of the other.

38
00:01:54,009 --> 00:01:56,009
And this will create a new input,

39
00:01:56,009 --> 00:02:00,001
so an offspring in the language of biology.

40
00:02:00,001 --> 00:02:02,002
In addition, so that you're not just shuffling

41
00:02:02,002 --> 00:02:05,008
the same data over and over,

42
00:02:05,008 --> 00:02:08,009
biology introduces mutations.

43
00:02:08,009 --> 00:02:10,008
So mutations, you have all your genes,

44
00:02:10,008 --> 00:02:13,008
but then with some small probability,

45
00:02:13,008 --> 00:02:19,000
something random happens and it introduces new genes

46
00:02:19,000 --> 00:02:21,006
and new structures.

47
00:02:21,006 --> 00:02:26,000
And this allows your gene pool to expand

48
00:02:26,000 --> 00:02:28,004
to other possible individuals

49
00:02:28,004 --> 00:02:33,000
that were not already present in the gene pool.

50
00:02:33,000 --> 00:02:37,005
For fuzzing, we might take our input

51
00:02:37,005 --> 00:02:39,005
and then perform some mutation,

52
00:02:39,005 --> 00:02:44,004
and that will create a new input, which we will then retain.

53
00:02:44,004 --> 00:02:46,000
And this basically describes

54
00:02:46,000 --> 00:02:49,000
the main ideas of genetic algorithms.

55
00:02:49,000 --> 00:02:51,006
There should be some fitness function,

56
00:02:51,006 --> 00:02:55,008
there should be some method for recombination,

57
00:02:55,008 --> 00:02:58,000
there should be a method for mutation,

58
00:02:58,000 --> 00:02:59,002
and there should be a phase

59
00:02:59,002 --> 00:03:03,006
where the population gets called out.

60
00:03:03,006 --> 00:03:07,000
So note that this is the method used for AFL,

61
00:03:07,000 --> 00:03:08,009
state-of-the-art fuzzer.

62
00:03:08,009 --> 00:03:11,003
So now let's see how we can use it ourselves,

63
00:03:11,003 --> 00:03:13,001
how this works.

64
00:03:13,001 --> 00:03:14,005
So I'll be using the same function

65
00:03:14,005 --> 00:03:18,004
we're working with earlier, cgi_decoder.

66
00:03:18,004 --> 00:03:21,005
And here's all the tracing functionality.

67
00:03:21,005 --> 00:03:24,004
Now, I'm going to define a class

68
00:03:24,004 --> 00:03:28,002
of an individual, some member of the population.

69
00:03:28,002 --> 00:03:32,003
And an individual is going to have some value.

70
00:03:32,003 --> 00:03:35,002
So when I'm fuzzing the cgi_decoder,

71
00:03:35,002 --> 00:03:37,004
it's going to be some string.

72
00:03:37,004 --> 00:03:39,007
That's the value.

73
00:03:39,007 --> 00:03:43,007
And I'm also going to record its coverage.

74
00:03:43,007 --> 00:03:46,001
I'm going to use the coverage for fitness.

75
00:03:46,001 --> 00:03:50,004
And it's going to have a numerical score for its fitness.

76
00:03:50,004 --> 00:03:54,006
So let's say that I'm initializing it.

77
00:03:54,006 --> 00:03:58,005
I give it a value,

78
00:03:58,005 --> 00:04:00,009
which will be some string.

79
00:04:00,009 --> 00:04:04,001
And then I try to compute the coverage

80
00:04:04,001 --> 00:04:06,000
using our functions from before.

81
00:04:06,000 --> 00:04:12,001
And the reason it's a try and not a guaranteed success

82
00:04:12,001 --> 00:04:15,007
is that it could be actually an input

83
00:04:15,007 --> 00:04:19,003
that causes the cgi_decoder to crash,

84
00:04:19,003 --> 00:04:23,007
which is what we're looking for.

85
00:04:23,007 --> 00:04:27,000
But in case that happens,

86
00:04:27,000 --> 00:04:34,001
then I will, instead of recording the coverage and lines,

87
00:04:34,001 --> 00:04:38,004
I'll record what the type of bug is,

88
00:04:38,004 --> 00:04:42,000
print it out for our own reference,

89
00:04:42,000 --> 00:04:47,001
and I'll be recording all the coverage in a dictionary.

90
00:04:47,001 --> 00:04:50,000
And the reason for that is that I want to make sure

91
00:04:50,000 --> 00:04:55,003
that I'm keeping count of the coverage

92
00:04:55,003 --> 00:04:59,004
of how many times each type of coverage takes place.

93
00:04:59,004 --> 00:05:01,008
And I'll be using that in my fitness function.

94
00:05:01,008 --> 00:05:09,005
So if for instance, lines 1, 2, 3 get covered many times,

95
00:05:09,005 --> 00:05:11,009
I want to make sure that these inputs

96
00:05:11,009 --> 00:05:14,007
don't get repeated too much.

97
00:05:14,007 --> 00:05:18,003
So I'll be using that in my fitness function.

98
00:05:18,003 --> 00:05:22,003
And initially, fitness will be just none.

99
00:05:22,003 --> 00:05:25,001
So I'll discuss this stuff in more detail,

100
00:05:25,001 --> 00:05:27,001
but for now, just bear with me.

101
00:05:27,001 --> 00:05:30,002
Another function I want is just to be able to print,

102
00:05:30,002 --> 00:05:32,003
to be able to see the values I mentioned,

103
00:05:32,003 --> 00:05:35,007
the value, coverage, and fitness.

104
00:05:35,007 --> 00:05:39,008
And then I want to be able to compute the fitness.

105
00:05:39,008 --> 00:05:42,005
And the formula I've decided on

106
00:05:42,005 --> 00:05:46,000
makes the most sense for the application here

107
00:05:46,000 --> 00:05:48,008
is for it to be equal to the reciprocal

108
00:05:48,008 --> 00:05:54,001
of the number of times that the coverage is covered.

109
00:05:54,001 --> 00:05:58,004
So again, if lines 1, 2, 3 get covered many times

110
00:05:58,004 --> 00:06:01,004
together in the same pattern,

111
00:06:01,004 --> 00:06:04,006
then I want to make sure that this type of input

112
00:06:04,006 --> 00:06:06,008
receives a low fitness

113
00:06:06,008 --> 00:06:10,004
so that it is less likely to go on to the next generation

114
00:06:10,004 --> 00:06:12,008
because I don't want it to be repeated.

115
00:06:12,008 --> 00:06:18,006
On the other hand, coverages that rarely take place

116
00:06:18,006 --> 00:06:21,001
will have high fitness, which is what I want,

117
00:06:21,001 --> 00:06:24,004
so they get explored further.

118
00:06:24,004 --> 00:06:26,003
So this is my individual class,

119
00:06:26,003 --> 00:06:28,008
which will make up my population.

120
00:06:28,008 --> 00:06:32,000
Now, I'll be wanting to keep track of the important samples,

121
00:06:32,000 --> 00:06:36,009
so those are samples that produce new coverages.

122
00:06:36,009 --> 00:06:41,002
And then the covered dict will be keeping track

123
00:06:41,002 --> 00:06:44,004
of how many times each coverage was covered,

124
00:06:44,004 --> 00:06:48,001
so that I can use it in my fitness calculation.

125
00:06:48,001 --> 00:06:52,001
I'm going to define a function called add_to_coverage_dict.

126
00:06:52,001 --> 00:06:53,005
And it does what it sounds like,

127
00:06:53,005 --> 00:06:57,009
it takes a value and it simply increments the counter.

128
00:06:57,009 --> 00:07:02,001
And in fact, we are referring to it even earlier.

129
00:07:02,001 --> 00:07:04,008
So when we instantiate an individual,

130
00:07:04,008 --> 00:07:07,000
we'll calculate its coverage,

131
00:07:07,000 --> 00:07:10,002
and then we will add it to the dictionary.

132
00:07:10,002 --> 00:07:11,008
Now, I'm going to define something called

133
00:07:11,008 --> 00:07:14,008
the carrying capacity,

134
00:07:14,008 --> 00:07:17,007
which will determine the size of the population.

135
00:07:17,007 --> 00:07:22,005
Basically, I'll allow the population to grow,

136
00:07:22,005 --> 00:07:26,002
but then I will kill off,

137
00:07:26,002 --> 00:07:30,003
call the lower fitness individuals,

138
00:07:30,003 --> 00:07:33,003
until I'm back down to 100.

139
00:07:33,003 --> 00:07:36,005
So the population always goes back to 100.

140
00:07:36,005 --> 00:07:41,004
Once it goes past 100, it'll go back to 100.

141
00:07:41,004 --> 00:07:44,003
And this allows me to focus the fuzzer

142
00:07:44,003 --> 00:07:46,004
on the most promising inputs

143
00:07:46,004 --> 00:07:49,004
as opposed to a huge number of them

144
00:07:49,004 --> 00:07:51,004
where many are not promising.

145
00:07:51,004 --> 00:07:53,007
I'm going to define an initial seed,

146
00:07:53,007 --> 00:07:57,002
which will be the initial members of my population,

147
00:07:57,002 --> 00:08:04,005
and this will be a couple URLs that seem to be working.

148
00:08:04,005 --> 00:08:08,000
So then my population will consist of the individuals

149
00:08:08,000 --> 00:08:11,000
instantiated on these URLs.

150
00:08:11,000 --> 00:08:12,006
Another function I'll be wanting

151
00:08:12,006 --> 00:08:16,000
is the update_fitness_scores function,

152
00:08:16,000 --> 00:08:18,002
which will allow you to,

153
00:08:18,002 --> 00:08:22,007
which will allow me to update all the fitness scores.

154
00:08:22,007 --> 00:08:25,007
We'll see that in more detail soon enough.

155
00:08:25,007 --> 00:08:27,005
I also want a convenience function

156
00:08:27,005 --> 00:08:30,003
that will just show me information about my population.

157
00:08:30,003 --> 00:08:33,004
This is for debugging purposes

158
00:08:33,004 --> 00:08:36,008
and for explanatory purposes.

159
00:08:36,008 --> 00:08:41,000
And I'll also define a get_fitness_scores function,

160
00:08:41,000 --> 00:08:46,001
which will give me an array of all the fitness scores.

161
00:08:46,001 --> 00:08:49,009
And the reason is that I'll be wanting to sample from it

162
00:08:49,009 --> 00:08:54,001
because the fitnesses, the fitness scores will be used

163
00:08:54,001 --> 00:08:55,005
in a probabilistic manner.

164
00:08:55,005 --> 00:08:58,004
I won't just be taking the highest fitness individuals,

165
00:08:58,004 --> 00:09:01,007
but rather the highest fitness individuals

166
00:09:01,007 --> 00:09:06,004
who'll be more likely to carry on and reproduce.

167
00:09:06,004 --> 00:09:09,003
But there's no guarantee, just like in life.

168
00:09:09,003 --> 00:09:12,008
I'll be defining a function called

169
00:09:12,008 --> 00:09:15,003
sample_pair_for_reproduction.

170
00:09:15,003 --> 00:09:17,005
And it does what it sounds like,

171
00:09:17,005 --> 00:09:22,008
it will sample a pair from my population

172
00:09:22,008 --> 00:09:25,007
based on the fitness.

173
00:09:25,007 --> 00:09:30,006
So first, it's going to get the fitness scores,

174
00:09:30,006 --> 00:09:34,007
then it will choose one individual

175
00:09:34,007 --> 00:09:37,001
based on the probability distribution

176
00:09:37,001 --> 00:09:39,005
from the fitness scores,

177
00:09:39,005 --> 00:09:43,006
and then another individual.

178
00:09:43,006 --> 00:09:46,007
And what we're going to do is pair them up

179
00:09:46,007 --> 00:09:51,001
and perform recombination.

180
00:09:51,001 --> 00:09:55,004
So once have individual1 and individual2,

181
00:09:55,004 --> 00:10:02,004
I'm going to pick a random point in the string.

182
00:10:02,004 --> 00:10:06,008
And then I'll simply swap the first part of one

183
00:10:06,008 --> 00:10:08,006
with the first part of the other,

184
00:10:08,006 --> 00:10:13,005
and I'll get two new offsprings.

185
00:10:13,005 --> 00:10:16,007
We'll see this as an example soon enough.

186
00:10:16,007 --> 00:10:19,007
And let me update the fitness scores.

187
00:10:19,007 --> 00:10:21,008
In other words, initialize them.

188
00:10:21,008 --> 00:10:26,005
And then let's take a look finally at what they look like.

189
00:10:26,005 --> 00:10:31,001
So this URL here gets this code coverage,

190
00:10:31,001 --> 00:10:33,001
and this one has the same code coverage,

191
00:10:33,001 --> 00:10:37,006
and the third one, the same code coverage as well.

192
00:10:37,006 --> 00:10:41,009
And their fitness scores are 33%-ish,

193
00:10:41,009 --> 00:10:44,009
1/3, like it should be.

194
00:10:44,009 --> 00:10:48,000
Now, we are going to undergo,

195
00:10:48,000 --> 00:10:53,004
our population is going to undergo a recombination,

196
00:10:53,004 --> 00:10:55,007
which we'll later on put into a loop.

197
00:10:55,007 --> 00:11:00,005
But for now, basically what we do is take our population,

198
00:11:00,005 --> 00:11:04,001
recombine half using the methods we just defined,

199
00:11:04,001 --> 00:11:08,004
so sample a pair for reproduction,

200
00:11:08,004 --> 00:11:13,008
recombine them, add them to the new population.

201
00:11:13,008 --> 00:11:15,009
And then update the scores,

202
00:11:15,009 --> 00:11:18,004
and there you have a new population.

203
00:11:18,004 --> 00:11:19,008
So let's see what that looks like.

204
00:11:19,008 --> 00:11:22,000
I take my current population,

205
00:11:22,000 --> 00:11:23,008
which you're seeing here,

206
00:11:23,008 --> 00:11:26,007
consisting of three inputs,

207
00:11:26,007 --> 00:11:29,001
and I'm recombining.

208
00:11:29,001 --> 00:11:31,005
And now, let's print it.

209
00:11:31,005 --> 00:11:34,000
And you see there are a couple new individuals.

210
00:11:34,000 --> 00:11:38,000
There are now one, two, three, four, five.

211
00:11:38,000 --> 00:11:42,007
Five members in the population, so two are new.

212
00:11:42,007 --> 00:11:45,004
And they also happen to have the same coverage.

213
00:11:45,004 --> 00:11:48,009
But that's okay.

214
00:11:48,009 --> 00:11:51,000
We'll keep going.

215
00:11:51,000 --> 00:11:53,008
And we can look at them and see,

216
00:11:53,008 --> 00:11:57,004
so in the initial population, we had one question mark here,

217
00:11:57,004 --> 00:12:01,001
now we have two individuals with the question marks.

218
00:12:01,001 --> 00:12:02,007
And they're different.

219
00:12:02,007 --> 00:12:05,004
You see the ending here has this 2g,

220
00:12:05,004 --> 00:12:07,005
and this one does not.

221
00:12:07,005 --> 00:12:10,005
So you can tell these guys were,

222
00:12:10,005 --> 00:12:12,003
some recombination happened there.

223
00:12:12,003 --> 00:12:13,009
And if we look at this a little closer,

224
00:12:13,009 --> 00:12:16,002
we'll be able to tell which ones were combined,

225
00:12:16,002 --> 00:12:18,009
but that's not too important right now.

226
00:12:18,009 --> 00:12:20,009
Next, let's introduce mutation.

227
00:12:20,009 --> 00:12:24,005
So I'll pick a probability of mutation.

228
00:12:24,005 --> 00:12:28,007
Because mutation is a very powerful thing,

229
00:12:28,007 --> 00:12:33,004
you take some individual, regardless of how fit they are,

230
00:12:33,004 --> 00:12:34,009
you're going to mutate them.

231
00:12:34,009 --> 00:12:38,000
And now they can be extremely unfit.

232
00:12:38,000 --> 00:12:39,009
So you can take something really good

233
00:12:39,009 --> 00:12:42,009
and then make it really bad.

234
00:12:42,009 --> 00:12:45,000
For that reason, we got to be careful

235
00:12:45,000 --> 00:12:47,009
about how much mutation we're doing.

236
00:12:47,009 --> 00:12:50,007
But at the same time, we need mutation

237
00:12:50,007 --> 00:12:58,000
so that we don't have too much constraints on the genome.

238
00:12:58,000 --> 00:13:00,006
It allows us to explore more.

239
00:13:00,006 --> 00:13:02,007
So I'm going to recall our functions

240
00:13:02,007 --> 00:13:05,009
from the mutations lesson,

241
00:13:05,009 --> 00:13:10,008
delete, insert, swap, randomize,

242
00:13:10,008 --> 00:13:14,002
and a convenience function here, produce_random_character,

243
00:13:14,002 --> 00:13:18,000
which will simply pick a random character.

244
00:13:18,000 --> 00:13:20,007
And then I'm also going to recall our mutate function,

245
00:13:20,007 --> 00:13:26,008
which chose at random from these types of mutations,

246
00:13:26,008 --> 00:13:30,007
and then repeated this for num_mutations times.

247
00:13:30,007 --> 00:13:34,009
And now we are ready to define the mutation phase

248
00:13:34,009 --> 00:13:39,005
in which we loop over the whole population

249
00:13:39,005 --> 00:13:41,004
looking at each individual.

250
00:13:41,004 --> 00:13:44,005
And we flip a coin, not a fair coin,

251
00:13:44,005 --> 00:13:51,001
but a coin that's biased using this constant.

252
00:13:51,001 --> 00:13:55,002
And if it flips one way,

253
00:13:55,002 --> 00:13:57,009
then we will mutate the individual,

254
00:13:57,009 --> 00:14:01,003
so we'll apply the mutation.

255
00:14:01,003 --> 00:14:04,003
And otherwise, we will not.

256
00:14:04,003 --> 00:14:05,008
And I will update the fitness

257
00:14:05,008 --> 00:14:09,008
because we have changed the population.

258
00:14:09,008 --> 00:14:17,008
So let's apply mutation and take a look at what happens.

259
00:14:17,008 --> 00:14:21,004
So it doesn't look like much has happened,

260
00:14:21,004 --> 00:14:23,004
or at least it's hard to find.

261
00:14:23,004 --> 00:14:24,005
But if you look closer,

262
00:14:24,005 --> 00:14:27,000
you'll see here it used to be google,

263
00:14:27,000 --> 00:14:29,004
now it's oogle.

264
00:14:29,004 --> 00:14:31,005
That's fine. (snickers)

265
00:14:31,005 --> 00:14:33,000
That's not a problem.

266
00:14:33,000 --> 00:14:35,000
But the coverage is still the same.

267
00:14:35,000 --> 00:14:37,000
Now we're going to define the calling phase

268
00:14:37,000 --> 00:14:42,005
in which we kill off individuals that are less fit.

269
00:14:42,005 --> 00:14:46,006
So what we're going to do is look at whether it's at caring,

270
00:14:46,006 --> 00:14:51,001
beyond the caring capacity or not, which we set to be 100.

271
00:14:51,001 --> 00:14:54,003
Then we're going to get the distribution of fitnesses.

272
00:14:54,003 --> 00:14:58,008
We are going to pick up to capital N individuals.

273
00:14:58,008 --> 00:15:03,006
Exactly capital N individuals, according to the fitness.

274
00:15:03,006 --> 00:15:05,006
So now you see that the more fit individuals

275
00:15:05,006 --> 00:15:07,007
are more likely to be retained,

276
00:15:07,007 --> 00:15:11,002
and the less fit ones, less likely to be retained.

277
00:15:11,002 --> 00:15:13,001
And then that's what we do.

278
00:15:13,001 --> 00:15:15,001
In our case, we'll be retaining everyone

279
00:15:15,001 --> 00:15:18,002
because we don't have caring capacity just yet.

280
00:15:18,002 --> 00:15:20,002
So then we run our calling phase,

281
00:15:20,002 --> 00:15:24,006
should be exactly the same because nothing was called.

282
00:15:24,006 --> 00:15:28,003
And what we just did completes one generation

283
00:15:28,003 --> 00:15:31,007
of our algorithm, of our evolution.

284
00:15:31,007 --> 00:15:37,002
So we started off all the way here with the seeds,

285
00:15:37,002 --> 00:15:41,007
and we grew our population, in this case,

286
00:15:41,007 --> 00:15:45,006
to five members from three.

287
00:15:45,006 --> 00:15:50,000
And we added some mutations.

288
00:15:50,000 --> 00:15:53,006
And now we can just repeat this over and over again.

289
00:15:53,006 --> 00:15:56,001
You'll see that the coverages will change

290
00:15:56,001 --> 00:16:00,005
and we'll find bugs, and these things do evolve in time.

291
00:16:00,005 --> 00:16:03,006
So I'm going to set a number of generations,

292
00:16:03,006 --> 00:16:04,008
and I'll be iterating.

293
00:16:04,008 --> 00:16:09,003
So for each generation, go through all these phases.

294
00:16:09,003 --> 00:16:13,000
And let's run this.

295
00:16:13,000 --> 00:16:18,006
So population generation 0, 1, 2, 3, 4, 5.

296
00:16:18,006 --> 00:16:23,007
And then generation 5 already caught some error,

297
00:16:23,007 --> 00:16:25,008
so value error,

298
00:16:25,008 --> 00:16:28,002
with this input here.

299
00:16:28,002 --> 00:16:30,002
Then there's lots of that.

300
00:16:30,002 --> 00:16:33,004
And as you can see, even though there's a ton of it,

301
00:16:33,004 --> 00:16:38,006
eventually, it stops doing as many value errors.

302
00:16:38,006 --> 00:16:44,001
And that's because of the way we gave reciprocal weight

303
00:16:44,001 --> 00:16:46,002
to more frequent coverages.

304
00:16:46,002 --> 00:16:48,003
So the value error coverage

305
00:16:48,003 --> 00:16:50,009
was increasing in frequency in the dictionary,

306
00:16:50,009 --> 00:16:53,003
and since we're taking the reciprocal,

307
00:16:53,003 --> 00:16:54,006
it becomes less and less likely

308
00:16:54,006 --> 00:16:58,001
that these individuals survive.

309
00:16:58,001 --> 00:17:03,000
So then it evolved, and then now it found this index error,

310
00:17:03,000 --> 00:17:09,006
which is actually an error we are not expecting.

311
00:17:09,006 --> 00:17:16,003
So the value error is something to expect,

312
00:17:16,003 --> 00:17:20,008
but there's nothing said about index error.

313
00:17:20,008 --> 00:17:25,005
And we can see there's more stuff going on,

314
00:17:25,005 --> 00:17:27,009
and the end of the evolution.

315
00:17:27,009 --> 00:17:32,009
And now we can take a look at what went on

316
00:17:32,009 --> 00:17:35,002
by looking at the important samples,

317
00:17:35,002 --> 00:17:38,004
and that's a representative of each coverage.

318
00:17:38,004 --> 00:17:44,001
So if I look at the coverage dictionary,

319
00:17:44,001 --> 00:17:47,005
you can see the different types of coverages

320
00:17:47,005 --> 00:17:50,000
that have taken place.

321
00:17:50,000 --> 00:17:51,003
And the important samples

322
00:17:51,003 --> 00:17:55,007
are just representatives, one from each.

323
00:17:55,007 --> 00:18:01,004
So for example, if you were to run this individual,

324
00:18:01,004 --> 00:18:04,006
then it would get this coverage.

325
00:18:04,006 --> 00:18:07,002
And if you were to run this one,

326
00:18:07,002 --> 00:18:12,005
then you would get this coverage, and so on.

327
00:18:12,005 --> 00:18:15,006
And now we can take a look at this index error,

328
00:18:15,006 --> 00:18:21,002
which is a genuine bug, the whole point of finding,

329
00:18:21,002 --> 00:18:24,001
the whole point of fuzzing.

330
00:18:24,001 --> 00:18:29,005
And we can even refer to it here

331
00:18:29,005 --> 00:18:38,006
and see what happened in more details.

332
00:18:38,006 --> 00:18:42,005
And then we look and we see

333
00:18:42,005 --> 00:18:46,004
that we are getting string index out of range.

334
00:18:46,004 --> 00:18:49,006
And the reason is that, if we look here,

335
00:18:49,006 --> 00:18:52,008
we see that if there is a percentage sign,

336
00:18:52,008 --> 00:19:00,005
then the next two characters should be existent,

337
00:19:00,005 --> 00:19:02,002
but they're not.

338
00:19:02,002 --> 00:19:05,009
For instance, I can also put in this input

339
00:19:05,009 --> 00:19:07,003
and that still doesn't work,

340
00:19:07,003 --> 00:19:11,003
but if I add in another character, then it's fine.

341
00:19:11,003 --> 00:19:16,002
So basically, our fuzzer found us a real vulnerability,

342
00:19:16,002 --> 00:19:22,006
and it tells us that there's actually a bug in our code.

343
00:19:22,006 --> 00:19:24,002
So congratulations to us,

344
00:19:24,002 --> 00:19:26,005
we've successfully built up from scratch

345
00:19:26,005 --> 00:19:28,009
and executed an evolutionary fuzzing

346
00:19:28,009 --> 00:19:31,008
and found a vulnerability in a piece of code.

347
00:19:31,008 --> 00:19:36,000
Next up, we'll be learning how to use AFL.


