Erez Dagan The Key Technologies for Unlocking AVs at Scale

Show video

Thank you for the opportunity to speak at the Car  of the Future summit by Reuters. In my short talk   today I will share with you what we at Mobileye  identify as the core technological enablers of   autonomous driving at scale. At Mobileye we have been harnessing computer vision   and machine learning to promote driver  assistance and road safety since 1999.   Today our three major business pillars are ADAS,  the bread and butter, crowdsourced AV mapping,   which targets autonomous driving, driver  assistance, and smart cities markets   and our self-driving full-stack solution targeting  the consumer AV and mobility as a service markets.   The combination of all three cortec engines is  imperative to unleashing AVs at scale. The first  

element is the RSS formal safety model. It's an  explicit digitally interpretable and enforceable   model for the safety of the decision making of  the autonomous vehicle. The second principle   is the realization that the AV-ADAS divide  is not scope of capabilities, not a different   scope of capabilities but rather the mean time  between failures in executing those capabilities.   We are relying on what we call  True Redundancy™ - truly redundant   perception subsystems which perceive the  environment in parallel to one another and   allowing us to both perceive the environment  with low failure rate as well as prove   that our system has lower failure rate of that  of a human driver which allows us in turn to take   the driver into a eyes off mind off position.  The third element is enabling or unleashing   seamless geographical scalability.  Autonomous driving has to be deployable  

anywhere and in order to make that a reality  we've designed a crowdsourced mapping solution   which utilizes the plethora of driver assistance  cameras that are traveling out there to harvest   information and build automatically high refresh  rate AV maps to cater for that interest of   seamless geographical scalability of the  autonomous driving system. What do we mean   by True Redundancy? Having two perception  subsystems, one comprised of cameras only,   and the other one comprised of radar and lidar  sensors only, both creating a comprehensive   view of the environment covering the four elements  that constitute the environment the road users   the road boundaries the road geometry and the road  semantics. Being able to do that independently in   two subsystems gives rise to both the very  low failure rate or high mean time between   failure that our system yields as well as to the  provability of our systems - our ability to prove   that our system is exceeding the capabilities of  a human driver in terms of a failure rate. Having  

two independent subsystems proven on order of  magnitude of 10,000 hours each would give rise to   a proven safety case of the perception system, of  the overall perception systems which is the square   around 100 million hours of driving. Coming  to design a computer vision subsystem   which meantime between failures is at the order  of 10,000 hours is not a trivial undertaking. It   entails a combination of several standalone  mutually informative computer vision cores.  

One of these cores is the vidar which produces a  high angular resolution and high accuracy depth   perception around the vehicle utilizing the  multiple cameras placed around the vehicles and   their overlap regions. This standalone computer  vision engine is of course independent of the   more classical pattern recognition methods that  we deploy in the surrounding of the vehicle which   perceive the vehicles based on the objects of the  environment based on their appearance rather than   their depth. Another such standalone comprehensive  perception engine is designed to directly yield a   semantically explicit representation of the host  surroundings. As you could see in this slide,  

it outputs a top view of the environment road  users, road boundaries and even the velocity of   vehicles around the host. The resulting computer  vision subsystem is actually a product of   itself. This is the SuperVision™ product which we  designate as a premium driver assistance solution.   It is a stand-alone end-to-end stack  comprised of computer vision REM™   and mobilized driving policy in RSS layers. The  Road Experience Management technology or REM   is mobilized crowdsourced AV mapping technology  which is designed to seamlessly enable AV   everywhere. The technology is comprised of three  core stages - the harvesting, the aggregation  

and the localization. By harvesting we refer  to the process of perceiving the environment   through single camera equipped  vehicles or driver assistance vehicles,   identifying the road geometry  and road semantics around us   in a way that allows us to transmit with very  low bandwidth this information up to the cloud.   We interpret the road geometry and road semantics  out of the video stream, pack it in small packets   of 10 kilobytes per kilometer and shoot them  out to the cloud for aggregation of a map.   The second stage, the aggregation of the  map is taking these snippets of information   turning them into a coherent map of the  environment - AV map, which contains the   road geometry and road semantics with a very high  level of accuracy as well as landmarks which will   then be used to localize the vehicle within the  map, which brings us to the third stage which is   localization. An autonomous vehicle or a premium  ADAS vehicle such as the SuperVision system  

would consume the RoadBook by pulling it from  the cloud, identifying landmarks in its vicinity,   localizing itself inside of the RoadBook and  then gaining that electronic horizon with   rich semantics that the AV map has to offer.  Having the crowdsourcing technology rely on   camera only agents is critical - it's critical  for the scalability that we just mentioned   but it is done while not compromising on the  accuracy. The accuracy of the map is actionable   we can drive a vehicle, we can demonstratively  drive a vehicle based on the on this map.   The road geometry nuances as well as the semantic  nuances are well captured to provide a full view   of the electronic horizon as it's called. The map  is not only accurate - it entails rich semantic   layers which leverages both explicit and implicit  cues captured by the crowd which allows us to   generalize to the driving cultures and traffic  rules across the globe. Such semantic information  

for example could be the common driving  speed, the stopping points within a junction   through which human agents are investigating  the junction without taking unnecessary risks   and could also be the association of different  traffic lights to different lanes. Our   large harvesting fleet today allows us to harvest  0.7 billion kilometers of road globally, 8 million   kilometers of road covered daily. By 2024 based  on the set of agreements that we have with our   partners we foresee one billion kilometers of road  to be covered on a daily basis. Truly redundant  

perception system in combination with the AV  maps yield a robust model of the environment,   covering the road users the road geometry the  road boundaries and road semantics. A faithful   model of the environment is not sufficient and  to safeguard the vehicle decision making from   causing an accident we have formulated RSS - an  explicit model for a road user's duty of care.   Beyond complying with the RSS contract in our  own systems we are driving its standardization   across the globe. The resulting  robustness, geographical scalability,   safety and agility of our AV system is not  theoretical. It is clearly demonstrated through  

our AV deployments in multiple cities across  the globe, starting in Israel, Munich, Detroit   and shortly upcoming in Tokyo, Shanghai, Paris  and New York. Thank you all for your time today.   Look for our AVs on the streets and feel free to  reach out to continue the conversation. Thanks   Erez for that presentation that was brilliant  if may I ask - how do you envision the arrival   of autonomous vehicles playing out in other words  how long until we will be able to actually ride in   or purchase a fully autonomous vehicle and what  needs to happen before that dream comes true?   Great question so there are two streams  that are going to bring about autonomy to   the end users - first one is robotaxis where  a user simply can ride an autonomous vehicle.   We are planning to launch such a service  already by 2022 next year and we see the   industry probably opening up in a larger scale  for robotaxis in around 23 24 time frames.  

The important next phase of autonomy which already  has some sprouts is the introduction of consumer   autonomous vehicles. Right now it's  consumer autonomy in certain restricted   function restrictions or OBD restrictions such  as on highways or traffic jams and ultimately   this will get us point to point autonomously.  Now as for the question of what needs to happen   to make that a mass-market reality - of course  there are several factors I think primarily I   would denote the regulatory discussion that needs  to take place, I think I also regarded it in my   talk - as for the RSS that contract between the  autonomous vehicle and the society that's using   the street along with that autonomous vehicle  - there has to be a very clear interpretable   contract of what is a safe negotiation of  road users. Right now this contract is very   vague in many ways. It's common practice of humans  but expecting computers to take part in that  

contract we really need to formalize it and make  it something that that the vehicles can rigorously   follow to have that market acceptance and all of  the safety and efficiency benefits that autonomy   can bring in. I think the second challenge that we  need to overcome is a geographical scalability -   how do we introduce autonomy everywhere not just  in the pre-designated geo-fenced areas or roads.   Okay, and for instance, while the  world waits for AVs to arrive,   are there other ways in which the industry  and the public could benefit now or in the   near future from the technologies being  developed for self-driving vehicles?   Most certainly - actually Mobileye takes pride in  the fact that we trickle down - a lot of the value   that we produce while developing the autonomous  driving technologies we trickle them down into our   driver assistance proposition. For example, our  mapping was originally intended to cater only for   autonomous vehicles and today is already marketed  as part of premium driver assistance systems, and   even the RSS model that I talked about which is  that contract between the vehicle and the other   road users is already being deployed as part of a  driver assistance system, so imagine a world where   this contract that the autonomous vehicle that  was developed originally for autonomous vehicles   can now safeguard the conduct of human drivers  so a human is fully in control but there is   a very clear notion of what is the  boundary line between assertive driving   and dangerous driving and the vehicle can inhibit  certain actions to make sure that the human driver   does not overstep and violate that important  safety contract, so that's another example.  

There are many other examples that concern  our perception systems that evolve in giant   leaps towards autonomous driving and  these values were migrated into the driver assistance arena offering advanced  functionalities such as animals detection,   free space detection, the REM mapping that I mentioned earlier and then that safety contract of   what actions are safe versus actions  that are not safe on the most   comprehensive sense. Okay, another thing,  you mentioned REM, Mobileye seems to be   taking a clean sheet approach to mapping for AVs.  In what way do REM and the resulting Roadbook differ from other mapping solutions?  Excellent, so first and foremost the   fact that it is crowdsourced allows us to  benefit from a fleet of millions of vehicles   equipped with a single camera, no other sensing  device, in order to continuously update the map.   A map - critical attribute of a map - is that  it has to faithfully reflect the reality.   If there is a change or a construction  in the road that happened five hours ago,   the only way to get a map which is faithfully  reflecting reality is to have a crowd-sourced   approach to it, scanning the roads continuously  and updating our understanding of the road   structure. So that's the first element -  the fact that it's crowdsourced allows that  

high refresh rate and seamless geographical  scalability anywhere that our driver driving   assistance cameras are traveling is being  mapped on the way. The second element of the map   has to do with the second unique aspect of our  mapping approach has to do with the semantics   that we are extracting. When you are  driving on the road there are explicit cues  such as the road lines and the lane  marks sorry or the pavement structure or   the traffic signs next to you which are very  explicit but there is implicit semantics   that is very critical to the decisions while  driving and I'll give a few examples - the   association of which traffic light belongs  to which of the lanes seems like a very   simple assignment problem that humans solve very  easily but when you really try to make a computer   resolve that problem in real time you run into  difficulties. What a map can do is encode this  

information from the crowd that travels the road  and understand which lane corresponds with which   of the traffic directives - that's a very critical  element. Another nice example is when a human is   investigating a junction, entering a junction to  investigate whether a vehicle is coming, there is   a very specific pattern of how do we proceed into  the junction while minimizing risks and maximizing   our visibility and that pattern of the places in  which human drivers are stopping within certain   complex junctions is very valuable semantics  that we can harvest from the crowd and then have   very useful for the autonomous vehicle. So the  approach we call them not HD maps but rather   AV maps - this category of the crowdsourced  rich semantic maps. Okay, but Erez, Mobileye is   not the only company pursuing self-driving  technologies, right, how does Mobileye's approach stand out from other well-known players  in this field? What factors give Mobileye an edge   over the competitors? So I won't repeat - I'll  just denote the ones that are already mentioned   so the REM mapping is one critical factor, the RSS which is the decision-making or driving policy contract of safe negotiation, that's a  key factor that enables both the safety   as well as the agility of our vehicles, so  sitting in the vehicles you will experience   a very human-like drive because  the boundary line between assertive   and dangerous is very clearly formulated and  we can maximize the assertiveness and agility.   Another factor which is critical and  very very fundamental to our approach   is the True Redundancy - the True Redundancy  between having two perception systems completely   independently perceiving the world and giving us  both high robustness of our environment model, the   environment model is correct to a higher degree  and with a lower failure rate and in addition it   gives us the ability to validate our perception  system with a much more pragmatic approach since   because we decouple the system to two independent  systems we can validate each of these systems   independently to a much lower validation criteria.  If one system fails, each system fails every one  

thousand hours, both systems will fail every  one million hours. And that's a very strong,   strong indicator of robustness and validation  method. Okay, well thank you very much Erez, I really appreciate your time. Unfortunately, we've  now run our time and I wish we had a bit more to   carry on and now we're gonna go to a quick five  minute break and we'll see to just to our last   panel Boosting Safety, Integrating Sensors  and Technology. Thank you very much Erez

2021-07-08

Show video