
1
00:00:00,159 --> 00:00:05,471
[MUSIC]. 

2
00:00:05,471 --> 00:00:09,551
Good day viewers, in this segment I'll 
talk about peer-to-peer content 

3
00:00:09,551 --> 00:00:13,852
distribution systems using BitTorrent as 
an example. 

4
00:00:13,852 --> 00:00:17,872
So, peer-to-peer content delivery is 
interesting to us, because it provides an 

5
00:00:17,872 --> 00:00:22,195
alternative to client server content 
delivery using CDNs. 

6
00:00:22,195 --> 00:00:26,500
And, and it's an alternative that runs 
without dedicated infrastructure. 

7
00:00:26,500 --> 00:00:30,130
That's what makes it interesting. 
We're going to look at these systems just 

8
00:00:30,130 --> 00:00:34,534
to understand a little bit about how they 
work using BitTorrent as an example. 

9
00:00:34,534 --> 00:00:37,584
And I'm really just going to give you 
just a quick overview of these systems, 

10
00:00:37,584 --> 00:00:43,331
so that you see what they look like. 
So, here's a little bit of context for 

11
00:00:43,331 --> 00:00:47,684
where peer-to-peer systems came from. 
We've seen delivery, where traditional 

12
00:00:47,684 --> 00:00:50,811
client-server methods, which was scaled 
up over time to become 

13
00:00:50,811 --> 00:00:55,570
content-distribution networks. 
These systems are efficient, and they're 

14
00:00:55,570 --> 00:00:58,874
able to scale up delivery for very 
popular content, and they work quite 

15
00:00:58,874 --> 00:01:02,305
effectively. 
They are also quite reliable and that's 

16
00:01:02,305 --> 00:01:06,410
because there are managed infrastructure. 
People pay for content deliveries so you 

17
00:01:06,410 --> 00:01:09,880
can hire people to carefully administer 
all of these systems. 

18
00:01:09,880 --> 00:01:12,825
And run them well and make sure they 
provide good service. 

19
00:01:12,825 --> 00:01:18,386
But traditional models, the CDN systems 
have some disadvantages too and the two 

20
00:01:18,386 --> 00:01:23,117
real disadvantages that come to mind 
first, the need for a dedicated 

21
00:01:23,117 --> 00:01:28,386
infrastructure. 
It has to be provisioned and carefully 

22
00:01:28,386 --> 00:01:30,790
managed. 
All of that costs money, so we're paying 

23
00:01:30,790 --> 00:01:34,860
maybe quite a lot, maybe more than we 
need to for this content delivery. 

24
00:01:34,860 --> 00:01:39,490
And second, there is an element of 
centralized control law oversight. 

25
00:01:39,490 --> 00:01:43,648
Whoever has set up the CDN is really in 
control, and able to make decisions about 

26
00:01:43,648 --> 00:01:48,729
what goes on, on that CDN. 
Now that may present problems for some 

27
00:01:48,729 --> 00:01:52,702
people for many reasons. 
You might not be interested, for 

28
00:01:52,702 --> 00:01:55,462
instance, in some large internet 
companies such as Google and Facebook 

29
00:01:55,462 --> 00:01:59,063
delivering all of your content. 
And having a very good idea of what's 

30
00:01:59,063 --> 00:02:01,654
going on. 
So if you care about privacy, 

31
00:02:01,654 --> 00:02:05,408
peer-to-peer might be more interesting. 
Other people have different examples of 

32
00:02:05,408 --> 00:02:07,520
course such as downloading illegal 
content. 

33
00:02:09,630 --> 00:02:13,790
So for peer-to-peer, the real goal of 
this technology is to provide delivery 

34
00:02:13,790 --> 00:02:19,215
without the dedicated infrastructure. 
All centralized control we've seen in 

35
00:02:19,215 --> 00:02:25,060
these other systems while still retaining 
the advantages that we got from the CDM. 

36
00:02:25,060 --> 00:02:27,902
So, we still want to efficient but 
deliver content at large scale when 

37
00:02:27,902 --> 00:02:31,976
there's very popular content. 
Quite reliable in providing a system 

38
00:02:31,976 --> 00:02:35,040
which is really very usable all of the 
time. 

39
00:02:35,040 --> 00:02:37,790
It provides good performance and always 
up and so forth. 

40
00:02:39,450 --> 00:02:42,313
This all sounds great. 
The key question is, just how on earth 

41
00:02:42,313 --> 00:02:45,642
are we going to make this work? 
And all we have is a pool of clients who 

42
00:02:45,642 --> 00:02:48,762
one would exchange it for measuring among 
themselves with no dedicated 

43
00:02:48,762 --> 00:02:52,670
infrastructure to help them. 
Well, the key idea here is that we're 

44
00:02:52,670 --> 00:02:55,910
going to have to have the participants 
help themselves. 

45
00:02:55,910 --> 00:02:58,610
The participants who are called peers in 
the system, that's why it's P2P 

46
00:02:58,610 --> 00:03:03,560
technology. 
Peer-to-peer took off very shortly after 

47
00:03:03,560 --> 00:03:07,848
some of the early systems such as Napster 
demonstrated a peer-to-peer system in 

48
00:03:07,848 --> 00:03:12,202
1999. 
This was really used for stealing music 

49
00:03:12,202 --> 00:03:17,619
or downloading illegally copied music. 
it's since become defunct because it was 

50
00:03:17,619 --> 00:03:19,959
shut down. 
There was a flurry of activity on 

51
00:03:19,959 --> 00:03:23,296
different peer-to-peer systems, including 
BitTorrent, was one of the fairly early 

52
00:03:23,296 --> 00:03:26,658
ones. 
It was developed around 2001 and it 

53
00:03:26,658 --> 00:03:30,725
quickly became very popular. 
It's probably the dominant peer-to-peer 

54
00:03:30,725 --> 00:03:34,276
system today, and it's just a substantial 
fraction of internet content, so it's 

55
00:03:34,276 --> 00:03:40,837
really, see a really wide spread use. 
Okay, so let's just talk about some 

56
00:03:40,837 --> 00:03:44,520
peer-to-peer in general, and then I'll 
move on to bitTorrent. 

57
00:03:44,520 --> 00:03:47,824
So for peer-to-peer in general for 
delivering content, there are a few 

58
00:03:47,824 --> 00:03:51,160
challenges that we're going to have to 
deal with. 

59
00:03:51,160 --> 00:03:54,890
Just because of this very different 
setting in client-servers. 

60
00:03:54,890 --> 00:03:58,040
The real problem we have here is, there 
are no servers on which we can rely for 

61
00:03:58,040 --> 00:04:01,309
anything. 
Uh-oh, so this means that all other 

62
00:04:01,309 --> 00:04:05,731
communication to deliver the content is 
going to be between the participants or 

63
00:04:05,731 --> 00:04:09,288
peers. 
This is why it's called a peer-to-peer 

64
00:04:09,288 --> 00:04:12,970
system, that's the communication pattern, 
not client-server. 

65
00:04:12,970 --> 00:04:16,876
So all of these peers are going to have 
to organize themselves, somehow, into 

66
00:04:16,876 --> 00:04:22,160
coherent architecture to accomplish the 
task of content delivery. 

67
00:04:22,160 --> 00:04:25,100
Well, you might be able to work out how 
to do this if you just have a group of 

68
00:04:25,100 --> 00:04:29,165
friends that are small scale. 
But we would like these peer P systems to 

69
00:04:29,165 --> 00:04:32,350
scale up to large scale, hundreds of 
thousands of clients working together to 

70
00:04:32,350 --> 00:04:36,080
distribute content amongst themselves, 
for example. 

71
00:04:36,080 --> 00:04:41,120
At this scale, there are a few problems. 
Then I'll just go through them here. 

72
00:04:41,120 --> 00:04:45,790
The first problem is that off limited 
capabilities of the participants. 

73
00:04:45,790 --> 00:04:49,054
So, all of the participants, the peers 
are just you know, they're clients that 

74
00:04:49,054 --> 00:04:54,160
want to download the content, too. 
We don't have the luxury of provisioning 

75
00:04:54,160 --> 00:04:58,645
highly powerful high end servers that can 
distribute content to many clients or in 

76
00:04:58,645 --> 00:05:03,690
clients. 
Everyone's just a client in some sense. 

77
00:05:03,690 --> 00:05:06,825
Well, they may range in their 
capabilities, we don't have the luxury 

78
00:05:06,825 --> 00:05:10,490
of, of deciding exactly who's going to be 
how powerful. 

79
00:05:10,490 --> 00:05:14,186
So, the question then is, how can one 
peer, which might not be a particularly 

80
00:05:14,186 --> 00:05:17,714
high-end peer, be able to deliver its 
content to all of the other peers in the 

81
00:05:17,714 --> 00:05:22,030
system if they want to download that 
content? 

82
00:05:23,110 --> 00:05:25,769
That's one issue. 
Another issue is that of participation 

83
00:05:25,769 --> 00:05:29,713
incentives and peers are really helping 
one another, at least that's what we want 

84
00:05:29,713 --> 00:05:32,970
to happen. 
But if as a peer all you really want to 

85
00:05:32,970 --> 00:05:35,970
do is download the content, you know, you 
want to get a copy of this file we 

86
00:05:35,970 --> 00:05:40,434
haven't really yet worked out. 
Why it is that you're going to bother 

87
00:05:40,434 --> 00:05:43,634
sending copies of that file to everyone 
else to help them out since that's not 

88
00:05:43,634 --> 00:05:49,076
what you're really trying to do. 
And the third issue is that of 

89
00:05:49,076 --> 00:05:52,505
decentralization. 
With the managed server infrastructure, 

90
00:05:52,505 --> 00:05:56,040
you know who to contact to find out where 
the file is. 

91
00:05:56,040 --> 00:05:59,410
With a peer-to-peer system, the set of 
peers is changing over time. 

92
00:05:59,410 --> 00:06:02,308
It's just whoever wants to be involved in 
the download and spreading of this 

93
00:06:02,308 --> 00:06:06,205
content right now. 
So, it's not clear in this system how you 

94
00:06:06,205 --> 00:06:11,680
would go about finding out who to contact 
to get a copy of the content. 

95
00:06:11,680 --> 00:06:15,190
So, we would like systems that work well 
when they're very decentralized. 

96
00:06:15,190 --> 00:06:19,690
No one's in charge and all the members of 
this system constantly changing. 

97
00:06:19,690 --> 00:06:23,810
So, these are actually three substantial 
challenges to deal with. 

98
00:06:23,810 --> 00:06:26,490
Well, there are a few, there are, there's 
a rough lean approach which we could use 

99
00:06:26,490 --> 00:06:30,260
to deal with each one. 
[COUGH] So, to overcome the limited 

100
00:06:30,260 --> 00:06:35,012
capabilities of an individual peer, we 
can still enable one node to distribute 

101
00:06:35,012 --> 00:06:42,070
its content to all of the other peers by 
simply making a distribution tree. 

102
00:06:42,070 --> 00:06:46,670
This is like multicasting, a really sort 
of like what a CDN is doing over time. 

103
00:06:46,670 --> 00:06:50,310
So, if you just imagine this source node 
here has content and everyone wants to 

104
00:06:50,310 --> 00:06:53,651
get. 
Well, we could get to all nodes by having 

105
00:06:53,651 --> 00:06:57,698
the source send only two copies to these 
two nodes. 

106
00:06:57,698 --> 00:07:02,120
And these two nodes each would only send 
two copies, and they're actually, I've 

107
00:07:02,120 --> 00:07:08,420
reached all the nodes in the network. 
Let me just clean that up, here it is. 

108
00:07:08,420 --> 00:07:11,368
You can see by following the distribution 
pattern no node did a lot of work, yet we 

109
00:07:11,368 --> 00:07:15,853
got the file to everyone in the system. 
Of course, with this tree, I have a small 

110
00:07:15,853 --> 00:07:19,818
picture here but we could scale up to a, 
to a very large number of nodes using 

111
00:07:19,818 --> 00:07:25,282
this technique. 
So, and, and actually we wouldn't have to 

112
00:07:25,282 --> 00:07:28,780
go through very many levels to get to a 
very large system because of the way we 

113
00:07:28,780 --> 00:07:34,530
are growing exponentially. 
So, this kind of transfer is typically 

114
00:07:34,530 --> 00:07:40,470
done not instantaneously, but with 
replicas over time. 

115
00:07:40,470 --> 00:07:43,470
So, each of the other nodes that you 
sending to, like here a replica of the 

116
00:07:43,470 --> 00:07:46,970
file, it started off here. 
A replica is being stored here. 

117
00:07:46,970 --> 00:07:50,960
And sometime a little later it'll be 
transferred to these other two nodes. 

118
00:07:50,960 --> 00:07:55,548
Now, this approach means that in some 
sense the capacity of a peer-to-peer 

119
00:07:55,548 --> 00:08:01,209
network is self scaling with its size. 
If you have more nodes in the system or 

120
00:08:01,209 --> 00:08:04,749
you, you have more capacity to distribute 
the content to all of the nodes 

121
00:08:04,749 --> 00:08:09,198
themselves. 
So it has, it certainly has the potential 

122
00:08:09,198 --> 00:08:12,870
to be able to work and deliver content to 
all of the nodes. 

123
00:08:14,800 --> 00:08:18,643
In terms of participation incentives I'll 
note that the peers here play two 

124
00:08:18,643 --> 00:08:23,540
different roles. 
The pink arrows here like these arrows. 

125
00:08:24,850 --> 00:08:28,130
downloads to this particular node that's 
shaded in gray. 

126
00:08:28,130 --> 00:08:31,820
Let's just call this one the peer that 
we're interested in looking at. 

127
00:08:31,820 --> 00:08:35,140
So, this is really the node helping 
itself. 

128
00:08:35,140 --> 00:08:37,635
It's downloading the content that it 
wanted all of the time. 

129
00:08:37,635 --> 00:08:41,055
At the same time, this, because it 
participates in a peer to peer system, 

130
00:08:41,055 --> 00:08:45,080
this node is doing some other work. 
These are the blue arrows. 

131
00:08:45,080 --> 00:08:47,600
Oh, I forgot a pink arrow there doesn't 
matter. 

132
00:08:47,600 --> 00:08:50,966
These blue arrows from the point of view 
of that same node, were uploads to other 

133
00:08:50,966 --> 00:08:54,806
nodes. 
These uploads don’t help it directly at 

134
00:08:54,806 --> 00:08:57,530
all. 
They’re really doing other nodes a favor. 

135
00:08:57,530 --> 00:08:59,905
So, this is the dual roles and why it’s 
called a peer. 

136
00:08:59,905 --> 00:09:03,628
It’s performing, if you like, the roles 
of both the client and the server. 

137
00:09:03,628 --> 00:09:08,668
To encourage participation in a network, 
we can try and couple the two roles 

138
00:09:08,668 --> 00:09:12,890
together. 
Essentially, we can try and create a 

139
00:09:12,890 --> 00:09:18,250
system where nodes are working together 
so that one of them is sort of saying. 

140
00:09:18,250 --> 00:09:22,010
Well, I’ll upload content to you if you 
upload it to me. 

141
00:09:22,010 --> 00:09:25,482
If they both do that, then we can create 
a system that works that way and will 

142
00:09:25,482 --> 00:09:30,420
encourage a bit of cooperation and 
they'll all satisfy they're goals. 

143
00:09:30,420 --> 00:09:33,620
Of course, we just have to come up with a 
protocol to do that, and that's part of 

144
00:09:33,620 --> 00:09:36,626
the trick. 
And finally, in terms of 

145
00:09:36,626 --> 00:09:42,800
de-centralization, nodes even though the 
set of nodes are changing. 

146
00:09:42,800 --> 00:09:46,481
The, the peers in the system need to be 
able to learn where to get the content. 

147
00:09:46,481 --> 00:09:51,449
To solve this problem, we're going to use 
something called DHTs, distributed hash 

148
00:09:51,449 --> 00:09:55,590
tables. 
Some of you might have heard of DHTs, 

149
00:09:55,590 --> 00:10:00,224
maybe not. 
DHTs started as, an academic line of 

150
00:10:00,224 --> 00:10:03,424
work. 
In the academic community about 2001, 

151
00:10:03,424 --> 00:10:08,410
with work on systems called Corn and Ken, 
you might have heard of. 

152
00:10:08,410 --> 00:10:12,025
they turn out to be all of the rage in 
the networking research community. 

153
00:10:12,025 --> 00:10:15,160
There was a huge amount of work on them, 
there still is. 

154
00:10:15,160 --> 00:10:18,480
They're some of the most highly sighted 
network papers of all time. 

155
00:10:18,480 --> 00:10:22,190
Now, I don't really have time to tell you 
how these new algorithms work if you like 

156
00:10:22,190 --> 00:10:25,830
I'm sure you can read about them a little 
further. 

157
00:10:25,830 --> 00:10:29,862
Actually, in the text there's a section 
on the distributed hash tables in the 

158
00:10:29,862 --> 00:10:34,180
same section. 
As peer-to-peer systems which will 

159
00:10:34,180 --> 00:10:37,624
explain how one of them cord works. 
But what I'm going to tell you now is 

160
00:10:37,624 --> 00:10:44,440
simply what a DHT does. 
So, a DHT is essentially an index. 

161
00:10:44,440 --> 00:10:47,430
You can look this up for their name to 
find out where their peers are. 

162
00:10:47,430 --> 00:10:50,390
That's what we want to do. 
Now it's a distributed index, so the 

163
00:10:50,390 --> 00:10:54,704
index is spread across all of the peers. 
And it's fully decentralized and 

164
00:10:54,704 --> 00:10:57,620
efficient. 
These are the properties we want. 

165
00:10:57,620 --> 00:11:00,810
So, what does all of this mean? 
Well well, I guess I'm repeating this a 

166
00:11:00,810 --> 00:11:04,096
little bit, but let me just say, the 
DHT's an index that's spread across all 

167
00:11:04,096 --> 00:11:08,500
of the peers. 
So, it's distributed. 

168
00:11:08,500 --> 00:11:13,659
The index, the provides a mapping from 
say, the name of a file or something you 

169
00:11:13,659 --> 00:11:18,279
want to get to the set of peers who 
currently have pieces of it that you can 

170
00:11:18,279 --> 00:11:24,183
contact. 
And any peer who's in the system is by 

171
00:11:24,183 --> 00:11:29,650
using the DHT algorithms able to look at 
this index and find out who the peers are 

172
00:11:29,650 --> 00:11:37,730
to contact to get a copy of this file. 
So, an index is a very simple thing. 

173
00:11:37,730 --> 00:11:42,190
it turns out to be difficult to do in a 
way which is fully decentralized. 

174
00:11:42,190 --> 00:11:45,860
Even when the peers are changing it is 
still very efficient and fast to look up. 

175
00:11:45,860 --> 00:11:48,280
That's the magic of DHTs, but I won't go 
into it. 

176
00:11:50,050 --> 00:11:54,640
Okay, so let's move on to a BitTorrent to 
see an example of a peer to peer system. 

177
00:11:54,640 --> 00:11:57,392
So, BitTorrent is the main peer-to-peer 
system that's in use in the internet 

178
00:11:57,392 --> 00:12:01,670
today. 
It was developed starting in 2001 by 

179
00:12:01,670 --> 00:12:05,070
Graham Cohen. 
Here he is shown on the right. 

180
00:12:05,070 --> 00:12:07,994
finally, we found someone younger than 
myself to show you a picture of, that's 

181
00:12:07,994 --> 00:12:13,620
great. 
BitTorrent after it's introduction in 

182
00:12:13,620 --> 00:12:19,567
2001 grew very rapidly. 
And it sends a very large amount of 

183
00:12:19,567 --> 00:12:23,386
traffic because it's used to transfer 
very large files, movies for instance 

184
00:12:23,386 --> 00:12:28,970
copies of movies are moved around. 
As a result today, it constitutes, a 

185
00:12:28,970 --> 00:12:32,508
BitTorrent traffic constitutes a 
significant fraction of all internet 

186
00:12:32,508 --> 00:12:37,830
activity, so this is one of the main 
kinds of traffic on the internet. 

187
00:12:37,830 --> 00:12:41,300
And it's used for much legal as well as 
illegal content. 

188
00:12:41,300 --> 00:12:45,020
maybe I should actually say illegal and 
legal content. 

189
00:12:45,020 --> 00:12:48,852
Depending on what you think is mostly 
transferred with BitTorrent. 

190
00:12:48,852 --> 00:12:52,842
So just to switch to some of the, a 
little closer to the bit torrent 

191
00:12:52,842 --> 00:12:56,716
terminology. 
Data is delivered using these things 

192
00:12:56,716 --> 00:12:59,298
called torrents. 
A torrent you might imagine is just a big 

193
00:12:59,298 --> 00:13:02,350
gush of data that's going to give you the 
file you want. 

194
00:13:02,350 --> 00:13:06,839
The BitTorrent system transfers files in 
pieces, so it's going to break up a file 

195
00:13:06,839 --> 00:13:11,320
into pieces. 
Because by using many pieces in parallel, 

196
00:13:11,320 --> 00:13:16,110
this parallelism is what's going to give 
us a high level of performance. 

197
00:13:16,110 --> 00:13:19,386
And lets you get a fast transfer rate 
even though the individual peers might 

198
00:13:19,386 --> 00:13:24,056
not be very fast themselves. 
It's notable for its treatment of 

199
00:13:24,056 --> 00:13:27,402
incentives as part of the protocol. 
What this really means is that the 

200
00:13:27,402 --> 00:13:30,489
protocol does actually consider 
incentives and try and do something about 

201
00:13:30,489 --> 00:13:32,850
it. 
We'll see that in just a minute. 

202
00:13:32,850 --> 00:13:37,830
And in terms of decentralization. 
Some of the early versions weren't really 

203
00:13:37,830 --> 00:13:40,734
decentralized, they used an index which 
was stored on just another server that 

204
00:13:40,734 --> 00:13:45,702
you had to find and know about. 
But the most recent versions have now 

205
00:13:45,702 --> 00:13:50,162
switched to using decentralized indexing 
technology using a DHT. 

206
00:13:51,420 --> 00:13:53,920
Okay, so here's the architecture. 
Oh, sorry. 

207
00:13:53,920 --> 00:13:56,296
Actually, before we get to the 
architecture, here's the steps to 

208
00:13:56,296 --> 00:13:58,996
download a torrent. 
And then, I'll show you a picture and 

209
00:13:58,996 --> 00:14:02,480
we'll go through these steps again. 
This is what you do. 

210
00:14:02,480 --> 00:14:04,832
Step one, start with a torrent 
description, you're going to know what 

211
00:14:04,832 --> 00:14:08,294
you want to download. 
Maybe someone, your friend, mailed you a 

212
00:14:08,294 --> 00:14:12,710
link, or a description of a torrent and 
says check this out, it's great. 

213
00:14:12,710 --> 00:14:16,740
Okay, once we've got this description, 
step two is to find out the peers you can 

214
00:14:16,740 --> 00:14:20,710
contact to get the torrent, download the 
file. 

215
00:14:20,710 --> 00:14:24,678
You can do this here by contacting this 
special tracker to join the set of live 

216
00:14:24,678 --> 00:14:30,550
peers and get the list of other peers 
that are currently downloading the file. 

217
00:14:30,550 --> 00:14:33,730
This list will contain at least a seed 
peer. 

218
00:14:33,730 --> 00:14:36,978
A special node which is initialized 
through the copy of the content to 

219
00:14:36,978 --> 00:14:41,345
bootstrap this whole process. 
This is the older, centralized way. 

220
00:14:41,345 --> 00:14:46,305
Alternatively, you can use this 
decentralized, fully decentralized DHT 

221
00:14:46,305 --> 00:14:49,040
index. 
Look it up, for the, under the name of 

222
00:14:49,040 --> 00:14:53,470
the term and find a set of peers who you 
can contact to get a copy of this file. 

223
00:14:54,850 --> 00:14:58,690
Now you know who to contact the peers, 
and what you begin to do is start trading 

224
00:14:58,690 --> 00:15:02,446
pieces with them. 
These peers everyone's trying to download 

225
00:15:02,446 --> 00:15:05,206
the same file at the same time, so 
everyone trades or swaps the different 

226
00:15:05,206 --> 00:15:08,694
blocks they have. 
As you contact different peers, they may 

227
00:15:08,694 --> 00:15:12,502
have some different blocks and you may 
have some blocks they want. 

228
00:15:12,502 --> 00:15:15,635
So, you begin to trade them until you get 
the complete file. 

229
00:15:15,635 --> 00:15:20,499
And as you do so, the rule, the incentive 
rule is that you're going to favor peers 

230
00:15:20,499 --> 00:15:26,325
that upload to you rapidly. 
And choke peers that don't upload to you 

231
00:15:26,325 --> 00:15:31,655
rapidly, by slowing your upload to them. 
So, somehow you're providing a little bit 

232
00:15:31,655 --> 00:15:34,813
of feedback. 
People who were doing, who were helping 

233
00:15:34,813 --> 00:15:40,170
the system by downloading Tu, well you'll 
also provide them good service. 

234
00:15:40,170 --> 00:15:43,810
People who aren't really helping you 
going to try and cut off service to them. 

235
00:15:43,810 --> 00:15:46,393
That way if they're not helping anyone in 
the system no one's going to give them 

236
00:15:46,393 --> 00:15:49,252
good service. 
I talked about that a, a little bit 

237
00:15:49,252 --> 00:15:54,395
through the steps, but I think they'll 
become clear when we just look at them. 

238
00:15:54,395 --> 00:15:57,650
A little more talk about how they solve 
the challenges. 

239
00:15:57,650 --> 00:16:02,029
So, here is a peer-to-peer system. 
You can see, I've circled one peer we're 

240
00:16:02,029 --> 00:16:05,820
going to look at, but all of these other 
nodes are peers, too. 

241
00:16:05,820 --> 00:16:09,975
Peer, peer, peer. 
So, all of these peers are trying to 

242
00:16:09,975 --> 00:16:15,060
download the file, receive the torrent 
that is, at the same time. 

243
00:16:15,060 --> 00:16:17,290
So that they all have the same goal in 
mind. 

244
00:16:17,290 --> 00:16:21,370
One of them here, the seed peer, already 
has a, a special copy of the content. 

245
00:16:21,370 --> 00:16:23,920
That's to get the system going. 
Someone has to have the content after 

246
00:16:23,920 --> 00:16:27,165
all. 
Now, in the initial versions of 

247
00:16:27,165 --> 00:16:31,455
BitTorrent, the way it works is once you, 
you have a, you have a, a description of 

248
00:16:31,455 --> 00:16:35,556
the torrent. 
It's metafile that just says what it is, 

249
00:16:35,556 --> 00:16:40,157
what it is you're trying to download. 
In this, there is information about who 

250
00:16:40,157 --> 00:16:43,904
the tracker node is. 
If you then contact the tracker node, 

251
00:16:43,904 --> 00:16:47,670
this line here, you can get the list of 
live peers. 

252
00:16:47,670 --> 00:16:51,280
So that will have like, one, two, three, 
all of these nodes. 

253
00:16:53,190 --> 00:16:55,360
It will tell you who they are so, you 
know, who the peers are. 

254
00:16:55,360 --> 00:16:59,290
And then, you proceed with the BotTorrent 
Protocol. 

255
00:16:59,290 --> 00:17:03,448
You contact a randomly chosen set of 
peers, like maybe these peers, to trade 

256
00:17:03,448 --> 00:17:07,330
chunks with them. 
You'll ask what bits of pieces of the 

257
00:17:07,330 --> 00:17:12,546
file they have and then you will try to 
download those pieces from them. 

258
00:17:12,546 --> 00:17:15,770
And they will also be asking you what 
pieces you have. 

259
00:17:15,770 --> 00:17:18,808
And as you're talking to multiple 
different peers you'll get multiple 

260
00:17:18,808 --> 00:17:21,727
different pieces. 
And you'll be able to trade them with 

261
00:17:21,727 --> 00:17:25,690
other pieces and this way everyone will 
be helping everyone. 

262
00:17:25,690 --> 00:17:28,772
And some peers eventually will leave this 
system when they have the new file, but 

263
00:17:28,772 --> 00:17:32,452
other peers will arrive. 
Especially if many peers are trying to 

264
00:17:32,452 --> 00:17:35,386
download at the same time. 
You know, there'll be a life set of 

265
00:17:35,386 --> 00:17:40,742
people available to help everyone else. 
So, let's just revisit those challenges 

266
00:17:40,742 --> 00:17:43,780
and see how it all works. 
How we've met them. 

267
00:17:43,780 --> 00:17:47,608
In terms of dealing with the limited 
capabilities of an individual peer, we've 

268
00:17:47,608 --> 00:17:51,146
dealt with that by dividing the file into 
pieces that can be downloaded in 

269
00:17:51,146 --> 00:17:55,180
parallel. 
This peer here that we're looking at is 

270
00:17:55,180 --> 00:17:59,420
downloading from three different other 
peers concurrently. 

271
00:17:59,420 --> 00:18:02,178
That means if these other three peers are 
very wimpy. 

272
00:18:02,178 --> 00:18:05,402
Well, we're able to get a download from 
all of them so the rate at which we can 

273
00:18:05,402 --> 00:18:09,567
download is the sum of these. 
It can be quite fast how infrastructure 

274
00:18:09,567 --> 00:18:13,730
can be providing us a fairly good rate. 
Because of this parallelism even though 

275
00:18:13,730 --> 00:18:16,750
the capabilities of each individual peer 
is limited. 

276
00:18:16,750 --> 00:18:22,780
In terms of participation in centers, I 
talked about this technique of favoring 

277
00:18:22,780 --> 00:18:28,430
the peers, which are uploading to you. 
You upload to them. 

278
00:18:28,430 --> 00:18:32,470
On the other hand, we choke peers that 
aren't uploading to us. 

279
00:18:32,470 --> 00:18:37,842
So this peer here, this unfortunate peer, 
is shown as choked by all the other three 

280
00:18:37,842 --> 00:18:42,350
peers it's talking to. 
That's the stops here. 

281
00:18:42,350 --> 00:18:46,060
This is the choking. 
And this other peer must be choking, 

282
00:18:46,060 --> 00:18:49,819
because the rotten peer here with the 
x's. 

283
00:18:49,819 --> 00:18:55,784
I'll say rotten, it's not really helping 
out, or unhelpful might be better. 

284
00:18:55,784 --> 00:19:01,960
this unhelpful peer is not providing 
downloads to all of the other nodes. 

285
00:19:01,960 --> 00:19:05,070
And so, they cut it off and they're 
favoring other P's. 

286
00:19:05,070 --> 00:19:09,162
This is going to provide an incentive for 
everyone who has any content to trade, 

287
00:19:09,162 --> 00:19:12,750
actually trade it. 
If you don't have anything because you're 

288
00:19:12,750 --> 00:19:15,978
just beginning other nodes will give you 
basic level of service. 

289
00:19:15,978 --> 00:19:18,582
But your download will take a very long 
time, unless you're willing to help 

290
00:19:18,582 --> 00:19:22,848
everyone out. 
Okay, and finally in terms of 

291
00:19:22,848 --> 00:19:25,630
decentralization. 
But when we switch to the fully 

292
00:19:25,630 --> 00:19:30,630
decentralized model, the DHT index, is 
spread over all of the peers. 

293
00:19:30,630 --> 00:19:34,284
So, a small piece of it you can see here 
is implemented, as part of every peer, 

294
00:19:34,284 --> 00:19:39,862
together they're implementing. 
This DHT this overall index using the DHT 

295
00:19:39,862 --> 00:19:45,251
technology, so any peer that confined 
another peer can join the system. 

296
00:19:45,251 --> 00:19:49,399
And look up this index and will end up 
carry things different parts of this 

297
00:19:49,399 --> 00:19:53,151
index. 
So, it's fully decentralized and we made 

298
00:19:53,151 --> 00:19:57,231
our goal of knowing how to get around 
without having a well known central 

299
00:19:57,231 --> 00:20:02,743
server that provides information. 
Okay, so that's a P2P systems and 

300
00:20:02,743 --> 00:20:07,377
BitTorrent is an example. 
P2P systems are interesting to us as an 

301
00:20:07,377 --> 00:20:12,600
alternative to CDN style client server 
distribution content. 

302
00:20:12,600 --> 00:20:16,032
They they certainly have some potential 
advantages if they can do away with 

303
00:20:16,032 --> 00:20:19,790
dedicated infostructure and 
decentralization. 

304
00:20:19,790 --> 00:20:22,280
They also have some challenges to meet to 
do that. 

305
00:20:22,280 --> 00:20:25,064
It's probably not as easy with 
peer-to-peer technology as it is with a 

306
00:20:25,064 --> 00:20:27,950
managed, carefully-provisioned 
infrastructure. 

307
00:20:29,160 --> 00:20:32,010
So if we can make it work, we're in good 
shape. 

308
00:20:32,010 --> 00:20:34,634
but you know, there are still some 
problems to solve to make them work very 

309
00:20:34,634 --> 00:20:39,252
effectively. 
At this stage, today, CDN technologies 

310
00:20:39,252 --> 00:20:44,214
are probably more mature. 
Nonetheless, P2P technologies are coming 

311
00:20:44,214 --> 00:20:49,715
along, and DHT technologies, too. 
And over time, we're finding that they're 

312
00:20:49,715 --> 00:20:54,020
more widely used in systems. 
And these are the technologies that are 

313
00:20:54,020 --> 00:20:57,350
adopted in different kinds of content 
distribution systems. 

314
00:20:57,350 --> 00:21:01,839
Actually, P2P and DHT style technologies 
are used as part of the Skype system and 

315
00:21:01,839 --> 00:21:07,244
its part of Amazon's system Amazon Cloud 
Computing Services. 

316
00:21:07,244 --> 00:21:11,398
So, I think in the future what you can 
expect is more and more hybrid systems 

317
00:21:11,398 --> 00:21:16,393
which may be used on peer-to-peer style 
techniques along. 

318
00:21:16,393 --> 00:21:21,016
With a carefully producing infrastructure 
for example rather than having solely 

319
00:21:21,016 --> 00:21:24,390
owned declines to it themselves, okay? 


