hello everybody welcome back hope you had a chance to for a quick break and uh uh we're going to move into now thinking about some of the material that the alumni have prepared for us for the second session for the next hour and of the 126 projects i mentioned earlier we've asked seven alums to speak today uh they're representative of uh not just the 62 percent who've gone into commercial data science careers but also of those who have continued on in academic trajectory when we lined up these invitations we were keen for them to focus on one of a number of dimensions either their original project that they'd done with the uh the original sponsor their career journey how they got to where they are now over the last few years or an aspect of their current work that they thought would be of particular interest to this audience and uh uh i think the other advantage of holding this as an online webinar is that we're able to beam people in from around the world so uh as well as i think uh a presentation from the filed coast we also have a presentation from nairobi uh amongst other places over the next hour for you um and just a reminder again uh if you have any questions as we go along please post them in the chat you may have to log in again to do that but i can see a couple of comments already in the chat uh we have indeed restarted a couple minutes late but we'll we'll try to make a bit of time um but let me start by by welcoming our our first alumnus speaker uh number murage is uh uh currently working with uh uh uh grow intelligence which is a an agriculture technology company uh she's based herself at manchester i think in nairobi it's great to welcome you nambu uh you're one of the the porsche people who are actually offered a role from your original internship uh with with tomoko and you've been with them for a while but have recently joined grow uh but i think you're going to be talking a little bit about a specific project that you've been involved with while you're at tomoko but i'm going to pass the baton over to you and leave you to talk a bit about that work welcome and thank you thank you jonathan hi everyone my name is nombre solo moraga as jonathan said i was an alumni from the cgrc last year year 2020 and i'm also a must an alumni from university of liverpool master student um so today i'm going to be talking about you uh my experience with the cdrc and my project that i have been doing with tomoko for the last year i joined the cdrc through university of liverpool i was doing my master's in geographic data science and i've learned about this uh opportunity through our lecturer uh dr uh dr danny arribas bell and alex singleton and that told us that this is probably an interesting opportunity that you should look in to apply and one of the specs that was given from tomoko was very interesting to me uh to deal with trajectory data which is a part of that had been my interest throughout the master's dissertation so um the project i'll be presenting to you today is the methodology that we derived uh on getting deriving special temporal geographies from mobile gps data and was a data-driven approach and we aptly named this project living boundaries and i hope by the end of the citation by the end of the presentation you'll be able to uh appreciate why that was so so um i think if you come over to the next slide uh when i first uh engaged with tomoko they had a very broad spec question on what they want to do with their data and were very um hands off in terms of the the discussion point that they want me to explore and one of the interesting questions that was posed to me was how we can model and quantify collective experiences of city dwellers and after that uh determined regions of shared common experience so it was a very abstract question and it was um sort of like a blank slate for me to develop my own techniques and explore new areas that they they as a company have not been um interested to explore possibly did not have the opportunity as jonathan and keith mentioned that these are sort of the long-haul projects the ones that you'd like to do but it's more of an experimentation kind of process so that was a question how do we quantify experiences collective experiences so this was the most methodology that we came up with next slide i had a data set of 1.2 billion and it came as a shock from a student who uh most of the case study uh subjects that we've been working on of the case study data sets that we've been working on i've been clean data sets with at most maybe a hundred thousand two hundred thousand data points and immediately who started my project uh tomoko gave me about 1.2 billion uh data points of gps traces that were generated about by about 6.7
unique users in new york city and this was the data set from which was supposed to um try and explore how we can map out shared common experiences so the key focus of my methodology of the approach that i took to this uh to this project was on data managing and data pre-processing it's uh as we all are aware that it's a focal point in terms of the accuracy of your analysis that you make sure the data set that you have is clean and correct and it can be able to give some sort of level of confidence to the kind of outputs that you get so i can break this down into two main parts um the first three the first three steps related to pre-processing and data margin which is on removing um erroneous data then we did a i developed a very unique um algorithm compression that the universe the company and the university were very excited to explore and then we went into classifying the data and finally i'll be able to talk about uh the intersection of gis and network graph theory which has also impacted my work currently that i'm doing in in grow intelligence and then finally i'll be able to show you some of the projects some of the boundaries that came out of this next slide so one thing that i want us to have a bit in our background when we're working with gps data is the idea of low dimensional representation and that was the main goal of the data pre-processing steps that we have this vast amount of data set we have about 1.2 billion and you want to run python scripting on that and it becomes a nightmare almost impossible you want to you'd want to first clean the data that you want to reduce it to a manageable state so this is the goal of low dimensional uh representation this goal of low dimensional representation was the key uh output of the unique algorithm h3 trajectory compression algorithm that i developed and in the next slide we want to we were taking advantage of um creating libraries that are out there to be able to consider to be able to one to to to to actually play two roles in assets um one cater to the low dimensional representation when we're looking at uh data pre-processing and then two for your privacy as we all know like data privacy and data ethics is something that's very focused and that's coming to focus and it's very central in terms of how we're going moving into data science and one of the things that we wanted to inbuilt in our methodology was how do we um sort of introduce a level of obscurity that allows us to reduce the data set as it is and at the same time cater towards um obstructing um locational privacy of the the data points that were collected during this uh do during for this data set and then the individual data data points that we have to take into account their privacy concerns so for this particular project we use uber h3 just because of the infrastructure that's currently running in the company where they're using gcp and bigquery to process their data set so it was a very seamless library to use as well as the flexibility and reproducibility of this grading method next slide so we moved from translating our data points to h3 grid cells and then finally um getting that that use a single trajectory to be able to detect stops and the reason why this was so was because um when you're talking about experiences you how you have to think about how much time someone is spending in a mean foods place yeah so some of the points and the the the the points that we had that were reduced to now the uber h3 cells we did a time threshold on those particular uh data points to be able to single out the stops within that user's trajectory and the idea is these stops represent um in a sense the proxy of someone's experience in that particular area that the if they if for example that cell if someone stayed in a cell for more than 20 minutes then there's something meaningful they're doing in that cell as opposed to just moving through a particular region so that was the ideology behind the reason why we filtered out for stops only and if you move to the next line our outcome for the h3 compression uh there's an example here is like if you look at the data set that i've shown you right now some of the data points that have been attributed to a particular h3 cell were continuously within a certain same region um of that of that particular uh cell so you you can see in in like the first nine rows that this user had a meaningful stop at that area between uh i think that's seven or six to around to around 8 00 a.m the morning and all those data points had been summarized to one data point to be able to take the the highest and lowest maximum and minimum time thresholds of entering and exiting that cell and that was the whole point on how we were able to translate and maintain the special integrity is the special structure there of the trajectory path of that user and maintain it but still reduce the data set to a very interesting um and manageable uh representation of the same data so if you go to the next slide these were the results we we were able to summarize about um 300 390 million data points uh to about uh 31 million data points for well almost close to half a billion for 27 for 11 357 million data points to about 31 million for october 25 million for november and 23 million for december so we achieved very useful um compression and a 92.7 compression rate across all three months and the interesting thing about this was that um we were able to we were able to reproduce this this this um this algorithm on another data on several other data sets within the organization taking advantage of uber h3 which allows for the predecessor reproducibility of those grid cells so the h3 trajectory compression was uh developed both in python and in in sql for them for this particular use case because i was dealing with such a vast size of a data set i i implemented it i was using the sql implementation but with parallel computing on like spark nodes you can be able to achieve the same in python next slide so then we go to the meat of the of the process of the of the project which is on the combination of natural graph theory and gis and one of the things that was very exciting for me was to be able to um it was was to be able to understand and map out user to user to place experiences and translate that to numerical figures that can be able to be quantified as an edge in a network graph so another concept that i would like to highlight in terms of um the the project was the idea of incorporating the idea of special uh special interdependency which is the idea that um two places are connected to each other based on the commonality of sets of visitors over a period of time so taking that understanding of these relationships between um venues or places based on how people a common set of people are visiting these places as well is the core fundamental of this of this project that was trying to translate a very abstract question of mapping out common user experiences and actually translating that into a network graph of who in which their edges the edges between a user and a place is signaled by the distance uh of that location as well as the temporal signatures so one of the things that um we came up with or from just from xa i think the good example that i can give was that this idea of user-to-place relation uh speaks to the idea of a semantic proximity so we all know that tabla's law is on nearer things are closer similar things are closer together than things that are far away and that idea that um past researchers have implemented uh distance decay functions on a network graph so the contribution to the towards this project was introducing temporal signatures as proxies for user experience therefore bringing up the idea of semantic proximity so in this particular use case i incorporated um average duration visits as uh and summarized that to as a as a factor a weight factor on the edge list that was uh that was enabled us to come up with an uh with our with our communities so in the next slide or so this is the visual representation of that those individual user-to-place interactions and some arrays in those individual user-to-place interactions into a in into using network uh community detection in network crafts into one geography that you can be able to say that there is a common set of people that visit this place and this their shared experience so in the next slide we this was the output we got a cluster of several nodes on our network graph and at the moment it wasn't meaningful because it just looks like a cluster of nodes but the good the interesting thing about this that every node here actually has a real world representation h3 grid cell so when you look to the next slide we were able to take this with this interesting node collision and map it out to the actual real-world cells so in the demo [Music] if you go to monthly profiles sorry um please just go to the monthly profiles yep october november this was the output we have self organizing maps and a methodology that allows us to develop um geographies that are based on user experience that have no real delineation on pre-prescribed boundaries so when you if you zoom in you can be able to see what what as in as as highlighted by various other researchers that these irregularities partial and temporal character there's a regularity to the special and temporal characteristics of human movement so when you explore the profile of october to the profile of december understanding that there were different sets of users who contributed to this data who matched up to almost regular um boundaries and geographies so if you can be able to see like in manhattan that um long strip there the regularity of the profiles from october to december october to november saying that we had about like five geographies naturally naturally are delineating ourselves across different sets of users and one thing that's also interesting about this um self-organizing uh ideology of map of mapping is that the people and how they experience their city tell you how the the natural boundaries form and and i think that we had theorized and exploring further is that smaller geographies actually give you an intuition to the distribution of of of amenities as you can see in manhattan the distribution of amenities is very close-knit and the geographies are smaller that basically means that anyone who happens to fall in like the pink cell there um these are the extents to which we theorize they'll move around that area of manhattan because previous people on previous experiences moved around the city in this in such a manner another thing that was very interesting was the high stability of some geographies at the very bottom or left of of the of of new york is um the area near j.f kennedy airport and that geography across all six all three months have been very stable and even the with the boundaries delineating themselves accurately to show that um that area is very stable the distribution of amenities is very precise and if someone happens to be in that area these are the extents of boundaries we we think that they may be able to um explore and move through so we had several um profiles i did profiles for october november december and also did profiles for weekend to weekday and yeah there was several profiles for weekend to weekday and as well as day profiles for how people move in the afternoon to evening and one of the interesting things that we were able to see and that is backed by literature is that broken bronx a is a very the the geographies developed are very large and it actually aligns up with some of the use cases studies that have been mapped out in urban planning in new york city that the the area has uh its distribution of amenities is not that precise so people have to move through a wide space to get to whatever they need so this was the project i'm still ongoing we we had a collaboration with tomoko after my dissertation to try and um work towards publishing this methodology or some part of it and it was a very interesting experience it was very uh fulfilling and it allowed me to get into a very interesting space on gis and network graph theory and that's some of the work that i'm doing in my current um in my current company where i'm looking into geospatial semantics and uh gis workflows so it's the com the interesting intersection of network graph theory and natural language processing and trying to see how we can be able to use ontologies and semantic layers to introduce a very interesting uh question answering workflows for gis yep that's it from me um excited to have any questions that's great thank you so much nombo that's a really fascinating uh worked example of the kind of long-haul project that uh martin and ke and keith are talking about earlier on how you quantify collective experiences and what fascinates me is that this has both kind of academic and commercial but also policy impact potentially as well so um i i had a couple of quick questions um i'm conscious that we're we're close to time on this but i i couldn't resist asking uh one one is you know you mentioned your shock at discovering the 1.2 billion data points
that you had to kind of come to terms with uh for unique users in new york city i'm just curious how well your university experience kind of prepared you for working at that scale you know making sense of large and complex community data um yeah um so this is one thing that i actually mentioned to um uh dr danny the data sets we've been working with in the university are at most 200 500 000 rows and they're usually very clean and we appreciate that sort of the experimentation factor to it um but i must admit uh the company really took time in um helping me sort of orient myself in in terms of industry because industry will not work at um experimental data levels they'll be working at actually the first data set i got was 2.9 just so you know and then i was like let's reduce it to 1.2 we'll we'll see how we can work with that yeah yeah i mean someone's just commenting on the chat i'd love to see multiple spatial methods with use cases you know it's so it's a great worked example as i say um i mean i was also interested in the kind of the the what you thought the impact that your work had had on the kind of attitude and behavior of the business to new data science projects do you think it opened their eyes to some extent to what was possible yeah it did and actually if i can answer your question in two parts um the company was very impressed with the kind of work we were able to achieve and like i mentioned the h3 trajectory the h3 trajectory algorithm is actually in use inside the company and i was very excited to know that my work that i thought was just a dissertation work has actually been used in the industry but more so even also the impact and feedback that i gave to the university in terms of how do you incorporate more real life work experiences in um the university set up in in learning because it really dives you into the deep end really quick yeah and real quick yeah yeah and there's a question finally on the chat from samantha's is uh have you tried alteryx great software to support big data if you want to go bigger bigger than 1.2 billion that is they also support spatial data oh i will i will definitely check it out fantastic this is the benefit of a network community here offering offering suggestions on on future work so non boom thank you so much for that really a fascinating case study thanks so much indeed uh and i hope you can carry joining us for the rest of the event um we're going to move to our second alum speaker and in this case we're i'm delighted to welcome alec davis uh who is a data scientist at pets at home colleague of martin who we heard from earlier on today and alec is going to talk us a bit through actually his original project uh his background as a as a phd student in geographic data science at liverpool and how all those pieces connect to where he's ended up in a present in terms of his work at pets at home so uh alec welcome and over to you uh thanks jonathan and thanks for the invite today um i'm not sure if i can much the introduction martin's already given to me but uh i'll give it a go so yeah i'm a data scientist at pets at home to give a bit of background i was in the first cohort of geographic data science master's students at liverpool university and then went on to do a phd in geographic data science i was funded through esrc but my phd was part of the cdrc as well so had a lot of interaction with the cdrc um and i was based in a geographic uh data science lab so a few people have already been mentioned like uh danny rivers bell not singleton um alex was my secondary supervisor on my phd and mark greene was my first supervisor just a background of my phd it was mainly around using new forms of data and data science to explore health in new ways but i also have interests in terms of research in in the retail environment as well so yeah um so moving on uh i'm going to talk about the project i did uh then i'm going to talk about the impact it kind of had on my journey at the stage of as a master's student at the time i'm going to talk over the benefit um and then go on to the progression and just briefly mention the future as well um so yeah um talking about to start the project um so the project uh title was how does competitive presence influence uh the performance of click and collect sites uh it was a 2016 um master's dissertation and a partner with sainsbury's to give a bit of background online grocery really excited to take off towards the end of the the 2000s towards 2010 uh there was having investment from some of the big players in in the in the supermarket um industry um and the the main original um investment tended to be in in home delivery uh there's a lot of research particularly in france um around something that's called the last mile costs because there's quite a heavy additional cost of delivering from startup customers homes so there was a need to sort of reduce that cost make it more efficient and also um removed remove the potential for failed deliveries which are quite high cost so so these are sort of the benefits from from a from a retailer of launching click and collect um but also click and collect um although home home delivery online meant uh that the the customer didn't have to go into a store and have the inconvenience of things like queuing or searching for stuff um there was an additional new inconvenience of having to wait around for a delivery so what click and collect allowed was for someone uh it was typically if they were on a commute or or something similar they would be able to just stop at a dedicated um site typically in the car park immediately load the shopping into the back of the car and then and then drive off uh it didn't it wasn't necessarily um focused on on cars uh you there were there were examples of clicking collect sites um hats sort of train stations across across the the market um but it primarily focused on on on cars um so in order to sort of look at um click and collect um there's a need need for some form of catchment so sainsbury's it was at the time it was a new new offering to sainsbury's um they put quite heavy investment in it and opened a number of sites but they wanted to know uh things that impacted the performance uh particularly on competitive presence in order to analyze this uh catchments needed to be created and typically the most basic level of catchment would be a linear linear catchment where you just draw a buffer around a point uh which would be the store's location and it could be 5 10 15 kilometers anything that's determined by some form of of research it could be typical drive time um but the problem with the linear buffers is it doesn't account for things like uh pedestrianized areas um one-way systems uh bodies of water that mean that that it's not a case of that issue catchment the catchments vary considerably with the road network and things like attractiveness so uh the project um the first step was to build a bespoke set of catchments uh the method ended up being a hoof model which is a type of uh spatial interaction model um specifically a gravity model where you have an attractiveness feature which is typically the size but it can be a composite index of other um pull factors um and the and it also includes the rogue network as well so it's a it's a delineated catchment and it's more accurate to to an actual catchment in real life um so once we had that um we're able to explore explore um first of all spatially uh rural urban differences to see if anything unexpected happened in certain areas uh you typically expect urban catchments to be smaller um so a rural catchments to be larger but this this allowed for an examination of that once the model had been tuned and then extending beyond that we could we could then answer the question of how does competitive presence influence the performance and we looked at three sets of characteristics so competition store characteristics and geodemographic or socio-economic factors um we looked at them in isolation uh on a proxy for uh performance which was demand and then we combined them all into into into models as well uh in order to look at the effects once all the features were included uh together so uh in terms of the impact they said on my journey it um there it was a lot of at first so it's kind of previously been discussed so far but um it was my first opportunity to to apply what i've learned in in sort of modules in the class um so a liverpool gmail undergrad uh alex singleton taught um a module that included spatial interaction models and then in the masters part of the masters at the time and when i was teaching during my phd uh les de lecia he uh taught spatial interaction retail catchments and that kind of thing so i had a good understanding and knowledge and i was able to actually apply it and then the second point is highly linked so it's the first opportunity to to use um real data um on something that isn't just for a market you know it matters it's not just a piece of coursework it's there's a there's an industry or a partner who has a vested interest so it's that first real project to get stuck into beyond this it was my first research paper first conference presentation and and this really helped my development during my phd knowing what to expect once uh research is done and then finally i've got a point in there purely because of my background in geography it's an opportunity to focus on something that's truly geographic um without the geographic aspect of catchments yeah the the analysis doesn't really work so a lot of data science you might just have a geographic feature um but this is this is uh highly focused on the geography and then the statistics comes after um so so it's a really good one uh for me as a as a geographer um or as a background in geography anyway um so i've got quite a few uh benefits on here these are sort of five areas that are perceived to the the most beneficial as for me uh when i was a master student but there's there's also some um inclusion here of benefit for for the research body and also and the partner so something that's probably overlooked a bit is is the first step is you have to apply and you have to in my case and i think a lot of cases you have to have an interview and it's quite technical um that's not to that's a comparison to anything you've done before and and as jonathan's sort of showed that there is a a lot of uh transition from the masters into industry and you really need to have some experience to be able to do well so this is the first experience that anyone will probably get of something that's technical uh and proving you you know the technique and you can explain how you've used methods i think that's a really important skill that that's probably overlooked that the scheme offers and beyond that there's the technical approach so like i've already mentioned you're applying the techniques you've learned on real-world scenarios and you have to become the expert which is something that you do throughout the phd and i've found in my experience in industry so far um you've spent a lot of time learning the skills so it's now your time to say this is a method that should be used this is the method that shouldn't be used for example why huff would be used over all the other models um so that's it that's an important skill uh particularly in academia when you defend a new paper to reviewers um but also an industry when you're trying to prove that you your method that you're suggesting is the one that should be used um and then there's a benefit to the to the partner that you get masters level students to focus on a problem they have invested in interest in doing well because it's fundamentally going to be a score or a mark towards their their overall grading of the masters um beyond this as well you've you've obviously got um supervision in my case uh danny rivera spell um but also uh les de lis as well um so really good uh level of experience from both of them um so it's beyond just a masters student because you there's obviously support there from the lecturers uh researchers um and then it's already been mentioned quite a lot so i'm not going to go over too much but there's a facilitation of new technologies new data sets in particular with this one there's data sets such as uh geolytics retail points which is an open data set um and and that was a new technology data set for me and getting used to the caveats in the data cleaning data getting it filtered down to what you need um so it's it's something beyond the classroom which is a really important um skill that that is sort of driven from this there's a project handover um so a lot of the time i did it myself and i started a lot of teaching um code will be written or gis will be sort of done in a way and you'll know you hot you'll have quick fixes or version one version two final version all that kind of stuff but with this uh you're forced into sort of using a proper data science style of working um and there's a repository handover of code and documentation so it has to work you can't just go at this point you have to do this random uh bug fix or whatever you you're producing something that can be repeated and that's a really important skill down the line um obviously there's documentation documentation that comes with that and then it extends to research output so i did the dog poster that was part of the scheme and then uh presented at jezrook in 2017 i think it was and then published the paper so beyond the documentation uh the company can can see a lot more detail but also the research area benefits because there's a case study with real data there as well and finally there's a networking of just getting people in the same room and people as part of the project i went to sainsbury's quite a lot but invited matt who was at sainsbury's who was their partner contact at the time up to liverpool and it's getting people like martin in the room with with people like les de lecia who is got a wealth of research in the area so it's facilitating that um this is probably my last detail slide um so uh in terms of what happens what's happened since uh the project um the the project that was turned into a paper and it's published in the international journal of retail and distribution management so if you if you want to read any more information uh beyond what's on the cdrc project uh archive there's a lot more detail there in terms of click and collect um there's a new sort of uh factor involved in click and collect now where it's beyond just convenience and it's turned into um safety as well um because of the coronavirus pandemic um so this is sort of meant that a lot of a lot of retailers now have to offer this as a standard offering to still well with restrictions to be able to still operate from retail parks and reduce the strain on on distribution so uh essentially it's kind of taken off uh massively recently um beyond what it saw the growth that already happened uh and then personally it's the main level expertise um and that allowed me to uh when teaching and demonstrating on as part of phd on undergraduate master's modules i had real understanding of how the what i was teaching can be applied um i went on to use further c uh well i use cdrc data sets in my thesis so beyond just the cdlc masters scheme and using further consumer data through cdrc and then the the understanding of the data science project and how that works as well um it was just really important for both academia and and in industry so just finally uh hopefully there's many more um great masters uh dissertations i'll look through the projects list for this year and some some great ones on there so it's quite it's really exciting and i just want to take this opportunity to thank cdrc um it's been it's been a great experience for my masters and then uh beyond that my phd cdrc data sets and the network provided has given me a wealth of experience and enabled me to really build some a strong cv and some really good research and experience so yeah i just want to say thanks to everyone involved that's great thank you so much alec and it's a really nice journey presentation uh taking you from that original master's work through and the academic experience and the commercial experience um just just to remind of those of you looking at the chat you might just need to refresh your web page and put your name back in again if you if you if it has gone offline but again please do ask any questions as we go along on the chat i had a couple of points i wanted to raise alex with with your uh presentation you're different from many of the other uh colleagues we have presenting today in that you you're also a phd student and i was curious that yeah although you'd had the previous experience as a master's student with the uh scheme do you think that uh working on the phd affected the way in which you were able to work with business um i think the difference with the phd is there's not necessarily the contact uh direct contact with the company but definitely uh some of the secure data sets that i used um are in um sql databases and it's not something that's typically taught um in geography um but it's a key skill in industry everything's in a sql or sql type uh database so that that experience was was really important yeah because i one of my thinking about some phd students here in oxford that one of the challenges sometimes is that they they kind of revert from a more practical engagement with the company the masters and become very academic in the phd and perhaps lose some of that applied perspective potentially in in what they're doing um i mean the fact you're working on clicking collect i think is also very interesting and as you say in that final slide it's coming to its own during the pandemic because i was just curious what what's your sense of how the insights you were able to bring to bear for sainsbury's have actually helped the business over the past year in understanding you know click collect um so at the time when it was done uh so 2016 uh the the the goal of the project was also to be able to build something that could be usable in the next five years um so i suppose that kind of ends sort of now but that was very much the the um idea from uh matt who was the the contact at sainsbury's um yeah i don't know in terms of how that's affected it uh and what's what's going on because i haven't got any view of that but uh the the division was definitely there for it to be used yeah long term yeah yeah i was just thinking this earlier that the value added in terms of early insights into this this developing kind of area of distribution but alec thank you so much for indeed for that um and uh as i say we might have a few questions on the chat as well a few bit later on but uh but thank you for your contribution brilliant um we're going to move on now to our final uh presentation of this session and uh this is uh if you remember my slides earlier on this morning showing you the um the companies that have been most engaged in this the cdrc mds scheme over the years the movement strategy is featured in those top three and so it's quite fitting that we have a couple of contributions from uh colleagues who are working within movement strategies who were part of the scheme and uh christian tong who's now a senior consultant uh within movement strategies has actually of course come full circle because he's now in a position where he's uh co-organizing the involvement of the firm with uh the mds scheme so this is an ideal kind of closing of the loop if you like with some word of mouth effects and so on and he's joined in this presentation by christopher belmont who is a graduate analyst at movement uh and who was involved in the scheme in 2019 uh as you'll see from the bio um christopher train as a sociologist saw the area of his ways and became a data scientist instead but uh they're going to do a joint i think presentation uh for us to to conclude our second block of sessions so over to you uh thanks jonathan i guess uh and thanks to everyone as a kind of prelude to our talk thanks to everyone at cdrc uh and university of liverpool from my perspective for everything that's been done uh for me over the last few years i guess my focus today is going to be on how much the cdrc's helped me but actually um now i'm sort of four or five years at movement strategies um how much the cdlc has helped movement strategies as well um we are a commercial uh body but really do have a focus on research and the research done by the cdrc really is important to us as a company and so just a quick uh introduction for anyone on on the conference today who isn't aware of movement strategies um we are uh founded in 2005. we're about 35 people although growing quickly since we've just been uh merged with ghd who are a sort of ten thousand person strong professional services company and um we're the sort of world leader in people movement and crowd dynamics um and we have a real specialism in what we call movement analytics uh one of our sectors and the focus there is really uh you know any type of interesting data that can understand people movement so whether that be gps uh spend wi-fi uh computer vision um or or ais and any other alternative new data sources that we can find and what i found about movement strategies is that as a geographer and geographic data science student uh who came through liverpool and it's really interesting to work with people from all different backgrounds whether that be data scientists and and people who actually probably put me to shame these days like cristobal who i'm joined with today or criminologists and urban planners a real wealth of uh insight and different perspectives so as i say we were merged in or sorry formed in 2005 and our traditional business is around crap crowd dynamics consultancy so helping to design deliver and operate uh some of the biggest or most complex venues and spaces uh in the world whether that be a hundred thousand people in a stadium whether that be uh you know specific religious sites like the holy mosque in mecca where we see incredibly complex movements or or more recently um in light of the being a lack of crowds and designing social distances uh solutions for venues so trying to maximize the capacity of spaces um you know whether that be offices or return to commercial spaces or or stadiums and major events uh and such like um and so as you can imagine we used hell of a lot of data uh in our in our world and on the back of that we sort of joined or created our us our sub business movement analytics which as i say focuses on uh all types of interesting data whether that be cellular data with our partners o2 um telefonica um or wi-fi we're sort of cisco uh official cisco partners so whether that's uh smart wi-fi or or smart cameras via cisco meraki and then alternative new data sources but the use cases there really range from anything from uh you know whole whole country transport networks understanding and where people board say the west coast main line and where they are light and where their home and work locations are and retail use cases so where are people spending money um how are they spending money uh what are they spending it on um we partner with visa um so we uh i think the last estimate i heard is that visa is 95 of the uk's uh plastic spend so so card spend um so having uh their data is incredibly powerful for understanding what people spend and where and also the city-wide scale as well so if you're a business improvement district we've got a hell of a lot of data that that probably tells you how people move through that space and also uh you know how they got there what they do when they're there and sort of their behaviors throughout their uh customer or visitor journey and so with in light of this um combination of data sources we we combined this uh and blended it into a dashboard output so that cities or retail or property owners can now access these and understand the whole consumer or visitor journey so whether that be on a very quiet day what's happening in say the centre of liverpool or whether that be a really busy event day where we've got say um you know 90 000 people at a stadium how does that affect the the sort of spend does that mean that residents aren't spending money within the city or even visiting the city that they that they live in a really useful tool for uh planning uh and and sort of uh economic understanding so you know at the moment incredibly powerful for understanding uh sort of uh economic uh uh sort of re revamping or trying to get our high streets uh back alive after the tough few years that we've just faced uh during the pandemic and and so just quickly going over this uh sports and events transport cultural healthcare education as i said effectively anywhere where there's large amounts of people or um crowded spaces and and really just again just flying through this in the interest of time but um working with some incredibly high high profile clients i think the reason that i'm touching on that today um is because as someone with four years experience i think the the kind of exposure to clients uh from movement strategies sort of via the cdrc scheme um has been you know absolutely great um you sort of look at the people who are in my contact list uh from four years and and it really is uh you know not to be uh sniff that it really is a huge list of very uh high profile venues and and clients that we've worked with which is um you know all you could ask for as a as a graduate or sort of someone quite young in their career and you know that all stems from the cdlc giving me the opportunity to to sort of uh open the door of movement strategies and so the cdlc um app movement strategies we've participated in 2015 and and someone in the audience might be able to correct me but i believe we've won prizes in five out of six years whether that be first or runner-up prizes or or prizes in the poster competitions um at current three former students are now full-time employees and colleagues at movement strategies so um you know not just for uh for the research benefits but actually for finding uh really you know talented colleagues it it's really important to us as a kind of entry it's a sort of hand pick uh you know good good minds from the universities and and and that's kind of reflected in the way that we approach the scheme you know we're really keen on training and mentoring and proofreading and we're sort of lucky that in-house we've got adjunct professors uh and kind of people who are research based as well um and and again ongoing support from 10 000 people globally now that we're part of ghd and so for example last year we ran an ais shipping project and we have people who are uh you know experts in shipping and and that project uh proved really viable for for understanding the impact of brexit on shipping lanes in the english channel so example previous years i think without going into the details here you know you can see just just from looking at these headlines uh not not the prettiest slide but but everything there from from shipping data you know gps data has been a big focus for us and and i think uh we were always sort of on the fringes in the early days of the types of data we were using because gps probably wasn't commonly used in the commercial space or sort of consumer space um but you know looking at uh the presentations today i can see that there's now more gps uh data sets coming through i think at the time uh when i did my thesis uh we sort of said there's a gps data set uh there's i think in the re in the in the probably the region of of the other data sets we've talked about today but quite daunting uh to look at uh as a student um you know that billion data points uh numbers certainly does scare you when you've been using the nice tidy clean data sets um but we do uh that that's why we're here to support and we're really keen to to help our students get through that because the likelihood is and some of us have probably been through it before so in terms of my journey um my project was a sort of blank slate almost um but the idea uh was to try and um work towards proximity based passenger sensing and what that meant was um tfl at the time had a great understanding of um who was uh joining the underground because you tap in and tap out by the oyster card system but if you uh boarded a bus there was no real way of understanding where people uh alighted and you just leave the bus whenever you like and so we use the tfl bus api and the gps data set uh in conjunction and said that if you're within close proximity have a bus stop you would have a level of certainty that a person would near that bus stop but actually that wasn't sufficient to prove they were on a bus but but as we went further along each route and if you were within the sort of temporal bounds and spatial bounds of of the bus and continuing on the route at the same time we would get increasing certainty as you moved along that route that you were actually on board the bus and so once we managed to prove that that that was a sort of valid methodology we then started to look at actually um if i'm a transport operator or a sort of city planner what's really important to me well it's understanding that the first mile in the last mile so not only now uh with the gps data could we understand where people were lighting also what they were doing in the last mile so where they were going to spend uh what they were doing whether that be work activities um leisure activities but also the first mile the first mile is often overlooked but um you know really important uh as a sort of uh study area you know are people uh you know disadvantaged because they're not within close proximity to these bus stops and so the value i did was was methodology for processing this raw gps data set and inferring public transport and last mile activity um i would say that that the value um overall from a personal level uh is is working with movement strategies and i don't know i'm sure i haven't been here four years that movement strategies are pretty happy to to to take on colleagues as well um as part of the scheme and that's definitely sort of value added from from the corporate perspective so my journey um again just the the diversity of uh experiences i've had along the way um after joining uh or sort of before even finishing my dissertation i was working two days a week with movement strategies certainly uh very rare around amongst my peers at the time who were sort of in a in a graduate world wondering uh whether the grad scheme that they picked was right for them i feel uh you know incredibly lucky to have missed the grad scheme and definitely accelerated my career and that that's 100 on the back of the cdlc and movement strategies offering and so after after joining as a consultant i worked as an analyst with our with our telefonica 02 team on tfl edmond and so estimating demand matrices for the whole of the transport for london's network whether that be hgv vehicles uh the tube network tax season and all other types of vehicles i think 2020 was the number at the time um as jonathan said i'm a cdlc mentor and uh you know attend these conferences uh when invited to to really become an advocate for the scheme i can't really give it enough praise to be honest um but been really lucky to work with some uh very very talented people as a mentor in fact cristobal who i'm going to hand over to in a second um i sort of tried to mentor but actually his skills are uh way beyond the level that mine will ever be so really exciting as a kind of advocate for the data science industry to see how quickly things have progressed in sort of two years since uh you know i finished mine two or three years and so in most of the time i'm actually not in the data science industry as such but um within the sort of people movement world so still using lots of data and as i've said really lucky to be on uh the client on some interesting projects with some really big clients so the likes of manchester united internationally known but also major events like the world cup commonwealth games um and huge transport infrastructure uh like manchester and stansted airport as well and so so drawing my uh input to a close i'll hold over to christabal and just again thanks thanks to everyone at cdrc and liverpool and i certainly don't need to to name drop people but they know who they are and yeah thank you and well hello everyone and thank you christian for the nice introduction actually i didn't know anything of sql er before doing the project and kristen was the one who taught me that so i'm very thankful for that well and my experience about the cdrc project basically i found out of the scheme when i was one of my classes one of the teachers encouraged us to look into this for looking for an internship so i just applied online wrote a cover letter and sent my application to moving strategies and actually was christian that was on the that was the one who interviewed me and i guess he liked the interview and then i started the project and the project was initially on understanding the impacts of network disruption on mode route and travel time during the using gps data but at the end the scope of the project became narrower over time especially because i needed to understand how to work with a special temporal data and i didn't have much of the skills for that i was doing a mastering data science but it was not like geographic data science so that took me some time you can go to the next like question so um at the end instead of focusing on the effect of disruptions on individual behavior i focus on difficult disruptions on the transport network itself and for that because i didn't have like a nice data set about when disruptions occur what was the scale of those disruptions i i just used a black cap protest in elephant castle in london as an experiment so i took like two weeks of data which was still a lot of data or gps data and compare the days with and without disruptions and for that my project was based basically on that paper the spatial generalization and aggregation of basic equipment data of data which is basically you have the dps traces you create individual journeys from that from those journeys you take some characteristic points you cluster those bones take the center or of those clusters use that as a like a seed for creating burnout cells from that you divided territory into these boron cells and you use that to aggregate the movement data between those cells so the final product was to create what i did was create a web app done with tiny to explore the data which is the that image you can see there so with that you could see the difference in differences in traffic flow and average speed for each link between the days with and without disruption and next slide please present so the value added to the business from this process was um well basically i a lot of the skills that i got through the project it became useful for different aspects of the business nowadays so as you will see more like in my career path it's now divided between like two branches uh more like front end development and also a data analysis yeah so i learned skills for creating a web apps and i have done some shiny apps for some other data products of the company but on the data analysis side i'm still working with gps data and now i kind of develop a better workflow that the one i did for my thesis so now we have um from the gps data we kind of get the journeys from one side we have also the stops on the other side and then we have also like this kind of flow maps you can click i think that's a video question you can click on play and that's the outcome of the flow map so you can basically see the traffic flow for the different links uh depending on the mode of transport port for example at city or a specific part of the transport network um so the use cases of how to use dps data we're still working on that that's of course an ongoing project but very interesting indeed and then my experience the career after cdrc basically after my internship i started working in uh with the movement strategies and as i said before i just became like divided between two branches that i do at the same time so one is the front-end development that i started doing with chinese but now i'm working in a project with the o2 which involves working with react which is another framework for doing that and on the other side in the data analysis site nowadays besides keeping working on the gps data i'm also now working with the mobile phone data from the o2 network to try to understand now real-time traffic analysis for the main votes in the uk so that has been my path after cdrc which has been of course kind of like changing in terms of coming from sociology and now getting more into the data science part so these are two of our projects this year i think one of them is still available so we're using computer vision algorithms to automatically detect uh people with restricted mobility um and then the final one is inferring mode of transport from gps data so you can see that even from my project in 2017 and cristobal's last year we're still trying to find uh you know interesting use cases uh using our gps data sources in the interest of time i won't go into those thank you fantastic thank you very much indeed both of you i mean they're both really great use cases that you've described and i think uh you know we're talking about this ladder of engagement i think movement strategy is very much at the top of that ladder in terms of uh closing the loop on uh on moving to commissioning new projects and so on and having you know three or more students in the business and five out of six years getting prizes i think it's a great uh credit for the for the business as well and i'm conscious of time i do did have one question i wanted to ask christian in particular i was just curious what it felt like being on the other side of the fence i suppose moving to a kind of discussion negotiation commissioning role for projects uh i'm particularly now part of ghd you know what that meant you had to do in terms of internal marketing within the organization to convince some of the merits of of continuing with this kind of association uh well in some ways it speaks for itself um doesn't take much convincing uh we probably start with probably ten out ten ideas every year and i'm refined it down um but but honestly these are real world business problems that we're trying to face they might be client driven and certainly the ais project last year was uh driven by ghd's need to understand the the impact of wrecks in brexit on the european shipping lanes so real world problems that are live and happening right now and that our students sometimes brighter minds with more time than we have to answer questions that we really need to answer ourselves um but yeah on the other side of the fence it it's great um it's scary as i say uh you know cristobal last year uh i'm interviewing him and he's and he's already uh heads above me at the interview phase so by the time we actually got to some coding he's teaching me a lot rather than rather the other way around but um yeah it's nice to be on that that side of the fence now yeah but i think it's all about leadership isn't it that that goes against the territory of being being a leader is actually bringing on new talent and so on so i think again closing the loop in that sense as well is is really uh interesting i like that i love that phrase brighter minds the more time that we have that's a great one um thank you very much indeed both of you and that brings to close our uh our second session we've ever run slightly uh we'll get things back on track with a restart at two o'clock we have two further alumni presentations and then a panel discussion with uh three great panelists uh to conclude the uh the overall event if in the meantime you have some time and uh you know having a bit of a late lunch or you want to have some lunch while you're doing it we have set up a zoom link for those that would like to social network we're conscious that this platform doesn't allow as much interaction as you might have on zoom with kind of private chats and so on uh but do feel free to uh find the zoom link in your original email click on that there's a chance to say hello to quality people that you might know and we'll keep that open for the next half an hour or so but we'll be due back uh for the the final uh slot of the event um at uh at two o'clock so we'll see you then thank you
2021-05-10