1
00:00:00,025 --> 00:00:04,650
Hi, today we're going to be learning how 
to analyze our StreetSeen surveys. 

2
00:00:04,650 --> 00:00:08,362
So it's been really great to see so many 
people developing so many different, 

3
00:00:08,362 --> 00:00:12,317
interesting surveys. 
Asking about all kinds of interesting 

4
00:00:12,317 --> 00:00:16,270
things from where do you think it's safer 
for our children to play to the busyness 

5
00:00:16,270 --> 00:00:21,080
of a street. 
Which street would you like to walk down? 

6
00:00:21,080 --> 00:00:25,550
which park would you prefer to play in? 
Lots of different questions about our 

7
00:00:25,550 --> 00:00:28,282
cities. 
So, I'm going to walk you through step by 

8
00:00:28,282 --> 00:00:32,297
step how to analyze StreetSeen. 
And I'll give you one option. 

9
00:00:32,297 --> 00:00:36,617
Which is the straight forward simple way 
to analyze StreetSeen and draw some basic 

10
00:00:36,617 --> 00:00:40,081
conclusions. 
And then I'm going to explain how to, 

11
00:00:40,081 --> 00:00:44,930
conduct a more detailed discrete choice 
model way of analyzing this. 

12
00:00:44,930 --> 00:00:47,270
And this requires a statistical 
background. 

13
00:00:47,270 --> 00:00:50,510
So if you know how to use statistics, 
then I'm going to be showing you some 

14
00:00:50,510 --> 00:00:55,460
ways that you can dig deeper and create 
statistically significant results. 

15
00:00:55,460 --> 00:00:58,725
Many of you participated in this simple 
survey. 

16
00:00:58,725 --> 00:01:02,611
Bicycling preferences in Columbus Ohio, 
so we wanted to know which street would 

17
00:01:02,611 --> 00:01:07,785
you prefer to ride a bicycle down 1 or 2? 
And so one of the key things in designing 

18
00:01:07,785 --> 00:01:11,384
a StreetSeen survey is that you want to 
consider what is it that your asking 

19
00:01:11,384 --> 00:01:15,306
people about. 
So in this case which street would you 

20
00:01:15,306 --> 00:01:18,612
prefer to ride a bicycle on and then, we 
want to understand what are the 

21
00:01:18,612 --> 00:01:23,810
differences between these two images. 
And amongst the images that you're 

22
00:01:23,810 --> 00:01:27,612
particularly interested in. 
So a discrete choice model is a method of 

23
00:01:27,612 --> 00:01:31,836
describing, explaining, and predicting 
choices between two or more discrete 

24
00:01:31,836 --> 00:01:35,784
alternatives. 
So for example, in the picture that you 

25
00:01:35,784 --> 00:01:39,546
saw before we had two discrete choices 
that you can choose between, the picture 

26
00:01:39,546 --> 00:01:44,997
on the right or the picture on the left. 
And what we want to be able to do is just 

27
00:01:44,997 --> 00:01:48,702
statistically relate the choices made by 
each person to the attributes of the 

28
00:01:48,702 --> 00:01:54,270
alternatives available to the person. 
So in each case, you were given two 

29
00:01:54,270 --> 00:01:57,850
choices and you had to make a choice 
between each one of those. 

30
00:01:57,850 --> 00:02:02,074
And so we want to know which people chose 
which images and what are attributes of 

31
00:02:02,074 --> 00:02:06,225
those alternatives. 
So what's the difference between image 

32
00:02:06,225 --> 00:02:09,360
one and image two? 
And when we're doing a discrete choice 

33
00:02:09,360 --> 00:02:12,520
model we want the set of alternatives to 
be exhaustive. 

34
00:02:12,520 --> 00:02:15,496
And we want it to be mutually exclusive 
and we want it to be a finite number of 

35
00:02:15,496 --> 00:02:19,335
alternatives. 
So, for example, you could only choose 

36
00:02:19,335 --> 00:02:23,484
one of the two images. 
And that the alternatives, there were 

37
00:02:23,484 --> 00:02:27,114
finite set of alternatives. 
So, for example, you selected a number, a 

38
00:02:27,114 --> 00:02:30,562
fixed number of images. 
In my case, there were 60 different 

39
00:02:30,562 --> 00:02:33,948
images in that image set. 
And so, you were voting between two 

40
00:02:33,948 --> 00:02:38,310
images, and there were only a certain 
number of images that were in the set. 

41
00:02:38,310 --> 00:02:41,090
Although, you could vote infinitely if 
you wanted to. 

42
00:02:41,090 --> 00:02:45,474
You were limited to those 60 images that 
were randomly shown in pairs. 

43
00:02:45,474 --> 00:02:51,110
Now in this case, we want a defined set 
of variables. 

44
00:02:51,110 --> 00:02:54,892
So when you look at the images that 
you've selected, you've identified key 

45
00:02:54,892 --> 00:02:59,388
characteristics, okay? 
So in my example I selected ten variables 

46
00:02:59,388 --> 00:03:03,700
that were different between the various 
images. 

47
00:03:03,700 --> 00:03:07,110
So the amount of traffic. 
So there might be zero cars shown in a 

48
00:03:07,110 --> 00:03:12,050
particular image, up to ten or more cars. 
And so I just classify in different 

49
00:03:12,050 --> 00:03:16,840
segment the number of vehicles that were 
seen on the street. 

50
00:03:16,840 --> 00:03:19,990
Whether or not there was a sidewalk, 
whether there was a sidewalk on one side 

51
00:03:19,990 --> 00:03:23,050
or whether there was a sidewalk on both 
sides. 

52
00:03:23,050 --> 00:03:26,440
I looked at the type of parking. 
Was there no parking on the street? 

53
00:03:26,440 --> 00:03:30,440
Parallel parking, pull-in parking or a 
parking lot. 

54
00:03:30,440 --> 00:03:32,600
I looked at the character of the street 
surface. 

55
00:03:32,600 --> 00:03:37,020
So was it a well maintained street? 
Was it a poorly maintained street? 

56
00:03:37,020 --> 00:03:39,480
Was it a brick street? 
An asphalt street? 

57
00:03:39,480 --> 00:03:42,530
A concrete street? 
I looked at the number of lanes. 

58
00:03:42,530 --> 00:03:46,290
Is it a simple one lane alley? 
Is it a two lane street? 

59
00:03:46,290 --> 00:03:50,529
A six or more lane street? 
And I characterized it, each image by the 

60
00:03:50,529 --> 00:03:53,540
number of lanes. 
I identified whether or not any 

61
00:03:53,540 --> 00:03:56,840
pedestrians could be seen in the image, 
or whether any bicyclists could be seen 

62
00:03:56,840 --> 00:04:00,466
in the image. 
I look at the grade of the street so, for 

63
00:04:00,466 --> 00:04:03,900
example, is it a flat street? 
Is it a curve street? 

64
00:04:03,900 --> 00:04:07,352
Is it a hilly street? 
And then I looked at the different kinds 

65
00:04:07,352 --> 00:04:11,008
of land uses. 
So is this a street where there are 

66
00:04:11,008 --> 00:04:15,170
individual houses? 
Are there apartments? 

67
00:04:15,170 --> 00:04:18,580
Or, is it a downtown with office 
buildings, or what are, what is the 

68
00:04:18,580 --> 00:04:22,862
character of the land use? 
I looked at trees, so are the trees along 

69
00:04:22,862 --> 00:04:26,050
the street? 
Are they set back from the street? 

70
00:04:26,050 --> 00:04:29,949
Or, is it a heavily forested area? 
And then I looked at street calming 

71
00:04:29,949 --> 00:04:33,126
measures. 
So for example were there pedestrian 

72
00:04:33,126 --> 00:04:36,720
crossings? 
Were there bumps in the road to calm 

73
00:04:36,720 --> 00:04:39,822
traffic? 
And so these were the ten variables that 

74
00:04:39,822 --> 00:04:42,395
I looked at. 
In your case, you'll have your own set of 

75
00:04:42,395 --> 00:04:46,089
variables that you were thinking about. 
So when you, if you're thinking about 

76
00:04:46,089 --> 00:04:49,470
parks, or is this safe for children, what 
are the things that are different between 

77
00:04:49,470 --> 00:04:53,380
the images? 
And so you may not have thought about 

78
00:04:53,380 --> 00:04:56,667
this in this type of detail when you 
created the survey. 

79
00:04:56,667 --> 00:04:59,574
But you can just as easily go back now, 
and think through, what are the 

80
00:04:59,574 --> 00:05:02,560
differences between these different 
images? 

81
00:05:02,560 --> 00:05:07,569
In my own case, I looked at literature 
around streets and bicycling. 

82
00:05:07,569 --> 00:05:11,664
To better understand what are the 
characteristics that are in the research 

83
00:05:11,664 --> 00:05:15,580
that identify. 
What makes people want to bicycle or not 

84
00:05:15,580 --> 00:05:19,630
bicycle on a particular street. 
No, no, I'm not asking you to go into 

85
00:05:19,630 --> 00:05:22,614
that level of detail for this particular 
exercise. 

86
00:05:22,614 --> 00:05:25,664
But it's something you can keep in mind, 
and certainly you can use your own 

87
00:05:25,664 --> 00:05:29,506
intuition about what are the differences 
between the images. 

88
00:05:29,506 --> 00:05:31,675
And how do you understand the 
differences. 

89
00:05:31,675 --> 00:05:36,640
Now, you get your results. 
So when you go to StreetSeen, you can 

90
00:05:36,640 --> 00:05:39,430
simply click on the Analyze button for 
your survey. 

91
00:05:39,430 --> 00:05:43,780
And it's going to pull up these results. 
And so you'll see, in this case, that 

92
00:05:43,780 --> 00:05:48,400
this partic, these series of images were 
favored 75% of the time with the top 

93
00:05:48,400 --> 00:05:54,380
image and it goes down from there. 
Now, I'm just going to to take two 

94
00:05:54,380 --> 00:06:00,590
images, the image that was most preferred 
and the image that was least preferred. 

95
00:06:00,590 --> 00:06:03,824
And so when you look at these two images, 
what are the things that you notice that 

96
00:06:03,824 --> 00:06:06,839
are different? 
And so we can think back to the variables 

97
00:06:06,839 --> 00:06:10,838
that we were referencing. 
So in this case, the top image, there's 

98
00:06:10,838 --> 00:06:13,677
no traffic. 
And in the bottom image, there's a lot of 

99
00:06:13,677 --> 00:06:16,978
traffic. 
In the top image, this is a two-lane 

100
00:06:16,978 --> 00:06:20,502
street. 
In the bottom image this looks like it's 

101
00:06:20,502 --> 00:06:25,496
a four-plus-lane street. 
When we look at the top image, there's a 

102
00:06:25,496 --> 00:06:28,998
lot of landscaping. 
That there are streets along the tree the 

103
00:06:28,998 --> 00:06:32,510
road, as well as trees that are set back 
from the road. 

104
00:06:32,510 --> 00:06:37,720
In the bottom image yes, there are trees. 
They're set back in the distance. 

105
00:06:37,720 --> 00:06:41,230
we can see that in this case there are 
sidewalks on both side of the road, and 

106
00:06:41,230 --> 00:06:45,770
in the bottom case there are no sidewalks 
that are visible, right? 

107
00:06:45,770 --> 00:06:49,460
So you can continue to go through this 
process of looking at each variable. 

108
00:06:49,460 --> 00:06:52,882
And so for the simple analysis, I would 
ask for you to look at the top five 

109
00:06:52,882 --> 00:06:58,260
images, and the bottom five images. 
And understand what are the differences. 

110
00:06:58,260 --> 00:07:02,068
So using your ability to simply measure, 
what's different about these different 

111
00:07:02,068 --> 00:07:06,670
images, and how do they compare? 
So why are people voting for certain 

112
00:07:06,670 --> 00:07:10,576
images over other images? 
So this is just a really simple analysis 

113
00:07:10,576 --> 00:07:13,904
for you to be able to explain what the 
differences were between these various 

114
00:07:13,904 --> 00:07:17,163
images. 
Some of you may have wanted to compare 

115
00:07:17,163 --> 00:07:19,857
different areas. 
So, you might've compared different 

116
00:07:19,857 --> 00:07:22,840
cities, or you might've compared 
different neighborhoods. 

117
00:07:22,840 --> 00:07:27,460
And, so you'll want to report out which 
areas were favored more than others. 

118
00:07:27,460 --> 00:07:30,160
In my own case, this is irrelevant for 
me. 

119
00:07:30,160 --> 00:07:33,210
I did wind up picking some additional 
areas of the city, but it's because I 

120
00:07:33,210 --> 00:07:36,820
wanted to get different types of streets 
to be included. 

121
00:07:36,820 --> 00:07:41,167
But all of them for from within Columbus, 
and I don't see that it's particularly 

122
00:07:41,167 --> 00:07:46,662
valid to compare between areas, and the 
percent favored is quite small. 

123
00:07:46,662 --> 00:07:50,442
Then we have the heat map, for most of 
you the heat map is going to be 

124
00:07:50,442 --> 00:07:54,520
irrelevant. 
This really depends on having a very 

125
00:07:54,520 --> 00:07:58,810
large number of votes that are included 
in your study. 

126
00:07:58,810 --> 00:08:03,088
So if you have an extremely high number 
of votes, then colors of the map will 

127
00:08:03,088 --> 00:08:08,413
change and what will be shown. 
Is that where there are areas that are 

128
00:08:08,413 --> 00:08:12,702
most favored then the heat maps are going 
to grow. 

129
00:08:12,702 --> 00:08:16,980
And you'll see higher concentrations of 
the heat that is shown on these 

130
00:08:16,980 --> 00:08:20,626
individual maps. 
But for most of you, you have a small 

131
00:08:20,626 --> 00:08:25,339
number of votes and so it's just going to 
show each individual location as a dot. 

132
00:08:25,339 --> 00:08:27,720
And that's not going to be very helpful 
to you. 

133
00:08:27,720 --> 00:08:30,575
So, for most of you, you don't need to 
worry about this heat map. 

134
00:08:30,575 --> 00:08:34,481
Okay, now those of you who want an extra 
challenge, you know something about 

135
00:08:34,481 --> 00:08:38,697
statistics you want to take on and really 
understand things in a more significant 

136
00:08:38,697 --> 00:08:42,720
way. 
I'm going to walk you through how to do a 

137
00:08:42,720 --> 00:08:46,162
discrete choice model. 
The first thing you're going to do is 

138
00:08:46,162 --> 00:08:50,284
you're going to go the results and you're 
going to download the full results. 

139
00:08:50,284 --> 00:08:53,602
You can download it as a CSV or an XLS 
file, it's really up to you. 

140
00:08:53,602 --> 00:08:57,082
Now, I don't have time in this class to 
teach you how to use statistical 

141
00:08:57,082 --> 00:09:00,544
programs. 
But I am going to offer you some simple 

142
00:09:00,544 --> 00:09:03,291
instructions. 
So first of all, you can use a 

143
00:09:03,291 --> 00:09:07,281
statistical program, if you already have 
one, you're welcome to use whatever you'd 

144
00:09:07,281 --> 00:09:10,927
like. 
Or you can use a free statistical program 

145
00:09:10,927 --> 00:09:17,153
and the one I'm going to recommend is R. 
R is available at r-project.org's 

146
00:09:17,153 --> 00:09:20,610
website. 
And then one of my faculty colleagues 

147
00:09:20,610 --> 00:09:24,090
Professor Brighton at Ohio State 
University has created a How to Use a 

148
00:09:24,090 --> 00:09:28,925
Discrete Choice Model. 
And so I'll be providing that link up on 

149
00:09:28,925 --> 00:09:32,696
the website as well. 
And he walks you step by step, how to run 

150
00:09:32,696 --> 00:09:37,776
a discrete choice model using R, okay? 
So this is a free, simple way, and if you 

151
00:09:37,776 --> 00:09:41,440
want this challenge you can walk through 
it and see how it works. 

152
00:09:41,440 --> 00:09:44,740
I ran the discrete model, and I got the 
results. 

153
00:09:44,740 --> 00:09:49,220
And so in a very simple explanation, the 
good news is that the model results match 

154
00:09:49,220 --> 00:09:53,890
my hypotheses about what I thought was 
going to happen. 

155
00:09:53,890 --> 00:09:57,245
So for example, I believed that the 
larger the number of vehicles that are 

156
00:09:57,245 --> 00:10:00,390
seen. 
The less likely people would be to choose 

157
00:10:00,390 --> 00:10:03,880
that particular image when paired with 
another image. 

158
00:10:03,880 --> 00:10:07,840
And that's in fact what I found, is that 
the larger the number of vehicles and the 

159
00:10:07,840 --> 00:10:13,400
larger the number of lanes, the less 
likely an image is to be selected. 

160
00:10:13,400 --> 00:10:17,333
On the positive side, where you have 
sidewalks, where you have pedestrians and 

161
00:10:17,333 --> 00:10:22,430
bicyclists and where you have, trees 
along the, the side of the road. 

162
00:10:22,430 --> 00:10:25,768
Then you are more likely to choose that 
particular image. 

163
00:10:25,768 --> 00:10:31,120
And what was interesting to me is that I 
expected that parking would be negative. 

164
00:10:31,120 --> 00:10:34,081
That if you had parallel parking, that 
you would be less likely to choose the 

165
00:10:34,081 --> 00:10:36,839
image. 
Because there's a danger to bicyclists 

166
00:10:36,839 --> 00:10:41,220
from opening the door, and so you can run 
into the car door and be injured. 

167
00:10:41,220 --> 00:10:44,433
Or if it's pull-in parking, then people 
can't see you as easily when they're 

168
00:10:44,433 --> 00:10:48,914
trying to back out of that parking space. 
Or, in a parking lot that you could have 

169
00:10:48,914 --> 00:10:51,700
cars that are pulling out from a parking 
lot. 

170
00:10:51,700 --> 00:10:55,670
But in my model this was not 
statistically significant. 

171
00:10:55,670 --> 00:10:57,570
Alright?, so, that was an interesting 
finding for me. 

172
00:10:57,570 --> 00:11:00,298
So, those were the variables that I was 
looking at and I was happy to see that 

173
00:11:00,298 --> 00:11:03,540
the model matched the results that I, I 
thought about. 

174
00:11:03,540 --> 00:11:08,326
Now, the next step. 
Is, well, are there differences between 

175
00:11:08,326 --> 00:11:11,630
people who are in Columbus or another 
area? 

176
00:11:11,630 --> 00:11:15,050
Are there differences between the 
different countries that people are from? 

177
00:11:15,050 --> 00:11:19,137
And one of the things that's captured by 
StreetSeen is the voter latitude and the 

178
00:11:19,137 --> 00:11:22,907
voter longitude. 
And so what you'll be able to do is use 

179
00:11:22,907 --> 00:11:26,994
the voter latitude and longitude to 
determine the location of an individual 

180
00:11:26,994 --> 00:11:32,700
that took your survey. 
So, you can go to any number of free 

181
00:11:32,700 --> 00:11:36,200
websites that provide the latitude and 
longitude. 

182
00:11:36,200 --> 00:11:40,507
So what you can do at latlong.net, is you 
can click the LatLong to Address button, 

183
00:11:40,507 --> 00:11:44,704
and then you can enter that latitude and 
longitude. 

184
00:11:44,704 --> 00:11:47,586
And then you can find out where people 
are from. 

185
00:11:47,586 --> 00:11:50,536
From and you can enter that country or 
city or whatever it is that you're 

186
00:11:50,536 --> 00:11:54,720
particularly interested in. 
So in this example this is the latitude 

187
00:11:54,720 --> 00:11:59,354
and longitude for the Taj Mahal. 
And so you can go in and as you saw in 

188
00:11:59,354 --> 00:12:04,840
this list, I have created the gone in and 
filled in the latitude and longitude. 

189
00:12:04,840 --> 00:12:08,497
And I can see that somebody completed the 
survey from Buenos Aires in Argentina 

190
00:12:08,497 --> 00:12:13,408
from a number of locations in Australia. 
In my particular case, what I'm really 

191
00:12:13,408 --> 00:12:16,970
interested in is comparing the results 
for Columbus. 

192
00:12:16,970 --> 00:12:20,392
So, for example, many of the people who 
took my survey may recognize the 

193
00:12:20,392 --> 00:12:24,072
individual streets. 
And so may have different opinions 

194
00:12:24,072 --> 00:12:28,340
because they have a user experience from 
bicycling on the streets. 

195
00:12:28,340 --> 00:12:32,010
And so I want to understand whether or 
not what they've said is different. 

196
00:12:32,010 --> 00:12:34,290
From what the people who are not in 
Columbus have said. 

197
00:12:34,290 --> 00:12:37,700
And then I want to understand some of the 
cultural differences so are there 

198
00:12:37,700 --> 00:12:41,433
differences between what people in Asia 
might think. 

199
00:12:41,433 --> 00:12:44,843
From what people in South America might 
think and have preferences around 

200
00:12:44,843 --> 00:12:48,150
bicycling. 
So that's the next step that I'll be 

201
00:12:48,150 --> 00:12:51,500
undertaking is integrating that into my 
discrete choice model to understand if 

202
00:12:51,500 --> 00:12:55,559
there are differences. 
And this is fairly simple to be able to 

203
00:12:55,559 --> 00:12:58,170
set up. 
I can fragment my samples and you're 

204
00:12:58,170 --> 00:13:01,210
welcome to do this in your own studies as 
well. 

205
00:13:01,210 --> 00:13:06,040
so I encourage you to experiment, to play 
with the data and to just have some fun. 

206
00:13:06,040 --> 00:13:09,800
Seeing how you can use discrete choice 
models in a visual survey setting. 

207
00:13:09,800 --> 00:13:12,874
And the last thing I would say is if 
you're not a statistical person, or 

208
00:13:12,874 --> 00:13:16,478
somebody that has these statistical 
skills, you're welcome to try them using 

209
00:13:16,478 --> 00:13:21,130
the R and the walkthrough, but if you're 
not that's okay. 

210
00:13:21,130 --> 00:13:24,330
Stick with some of the simple analysis 
that I've showed you how to use, by just 

211
00:13:24,330 --> 00:13:28,610
understanding and comparing the the 
images that are in your data set. 

212
00:13:28,610 --> 00:13:30,890
Good luck. 
I look forward to seeing your results.