Panel: Data sharing in financial services: Unlocking new value with privacy-enhancing technologies
hello and welcome everyone to this session my name is nick lewins and i lead financial services at microsoft research today's topic is data sharing ecosystems powered by privacy enhancing technologies in just a moment we'll have a panel of industry experts in conversation on this topic but first we're going to kick off with some visionary words from bill borden who is microsoft's corporate vice president for financial services over to you bill thanks nick great to be here to talk to you about a vision for financial services so post pandemic financial services institutions are moving full speed ahead to take advantage of a host of long-term benefits presented by digital transformation to be agile build resilience innovate and grow this has created an unprecedented demand for all kinds and forms of data and cloud technologies infrastructure as a service platform as a service software as a service solutions with data analytics and machine learning at the core are all top of mind data is on the move from thousands of on-prem data centers to a small number of hyperscale cloud providers bringing huge opportunities to replace the industry's data copying batch processes for in-place sharing of live data in the cloud looking forward we see where the industry is fundamentally transforming along three dimensions first all interactions are delivered digitally and every aspect is extensively personalized based on a deep understanding of customer needs and in service of their financial well-being that includes things like embedding human sales and service interaction via digital so we see a renewed focus on personalization of digital experiences integrating human collaboration into digital online sales and service of course digital interactions create a data set that can drive better and better experiences in what we call a digital feedback loop second data in ai data and ai will be applied in every decision process of the organization implementing digital feedback loops that drive continuous optimization optimization for customers employees products in operations norms for responsible use of ai will have made a common place to use sophisticated ai in more heavily regulated use cases such as credit risk assessment and financial advice rapid advances in natural language processing will make it possible to parse and reason over complex concepts found in regulation contracts and legal interpretations and thirdly we see the embedding of financial services inside the experiences of other industries such as retail logistics and supply chain this means payments and similar financial processes are embedded in the digital experience that the customer is actually trying to achieve this is because the financial processes are usually considered in the eyes of the customer friction that is necessary to achieve something else so the endpoint of digital transformation of financial services is that this friction will be removed and the financial service will be entirely integrated in real time into the customer experience or business process that it is serving so in summary we're seeing a future that looks something like intelligent open banking on steroids where humans incorporated into the experience are on demand all of these trends speak strongly to the central role that data will play in the future of the industry this includes customer data that you are the custodian of data generated by your business data sourced from your ecosystem and open commercial and alternative data sets i'm really excited about what the industry will build and what new technologies will be invented to allow data to be used for insight while protecting data rights and privacy of the data subjects and data owners we are working closely with microsoft research and its industry research team to drive exploration of new technologies that can provide breakthroughs in all of the areas required to deliver this vision of the industry it's a very exciting time for tech and a very exciting time for financial services as we heard from bill the digital economy of the future is built on data and interconnectivity last decade's major trend was the ubiquitous adoption of cloud connected devices and it could be well be that the equivalent in the next decade will be the ubiquitous adoption of cloud connected data sets these data products will be shared within an ecosystem to create truly frictionless experiences to create ai and ml insights that power new products new experiences and protect against societal harm such as money laundering and financial crime in microsoft research we're running a portfolio of initiatives to bring forward this vision including strong cryptographic protection of sensitive data elements in cloud data estates ai for quants and responsible ai in the financial industry and of course the topic of today's session policy-based data sharing powered by privacy enhancing technologies this research is driven by the realization that data liquidity of the type we're talking about will only ever happen to the extent that it's possible to protect privacy to use data within its approved purposes to protect commercial interests and licensing of the data to comply with bank secrecy laws and with customer expectations on privacy making these assurances is not easy when the data is copied from one industry participant to the next as is common in the pre-cloud industry architecture as the industry moves to cloud there's the opportunity for in-place sharing in the cloud and cloud-native data management practices to replace data copying between market participants we're seeing these themes come up in multiple industry contexts and not just in financial services and as a key part of numerous policy initiatives globally for example there's the open banking movement and its extension into embedded finance the international data spaces and trusts projects for european data sovereignty the future of financial data intelligence sharing in the uk and the singapore banking association's trusted data sharing framework as this new data economy is formed one of the key pieces will be technologies that can provide the required strong privacy guarantees these are often called pets or privacy enhancing technologies they're a family of different technical methods that provide strong guarantees against information leakage they're necessary to move beyond the procedural and contractual controls for example data use contracts attestations and access control lists which are the current best practice for controlling data usage between organizations luckily there are a number of such technologies that are emerging and are a focus of effort in the research community broadly and in microsoft research specifically these include trusted execution environments which protect data by performing computation inside an isolated part of secure processes microsoft has a number of leading products in this space under the name of azure confidential computing and as an example we've worked with the royal bank of canada to create a virtual clean room solution for data sharing then there's differential privacy in which enough statistical noise is added to query results in order to limit aquaria's ability to infer specific data values we have a strong collaboration with harvard and open source solutions in this space and there's homomorphic encryption which is computation on data whilst it's still encrypted we have perhaps the most widely used open source sdk in that space and finally there are a number of cryptographic schemes such as secure multi-party computation and zero knowledge proofs which allow joint computation of results or proofs without sharing the underlying data with a number of implementation sdks available for from microsoft research and others in this space so to dive deeper into the use cases and the research status of pets let me introduce our panelists for today professor florian kirschbaum from the university of waterloo computer science department who is an acm distinguished scientist and program co-chair for the pets 2021 symposium and dr cedric mombre pets researcher from the emerging technologies group at swiss multinational bank ubs who holds a phd in mathematics from eth in switzerland so now on to our panel questions this first question is for cedric what use cases for pets are most commonly discussed in the industry and what would it mean for industries such as financial services if these methods can be deployed at scale ionic thanks a lot for the introduction so there are currently many use cases using various techniques that are being discussed in the industry and as you can imagine data is one of the most important strategic assets in financial services so it not only supports the competitiveness of existing business areas but it also creates new opportunities as a part of insight-driven business models and and value chains so the challenge in financial services is that this data often includes sensitive and private information so this means it has a high risk profile and there are also many data privacy restrictions that apply and so protecting this data and at the same time extracting value from it is a key objective in financial services so when it comes to concrete use cases in the industry i think we've seen several ideas being uh that have been floating around for quite a while now and these ideas employ techniques such as zero knowledge proofs and differential privacy and secure multi-computation as well as many others and when it comes to the potential use cases i guess these use cases aren't only limited to the financial services industry but are also applicable in other places as well so i guess one common use case relates to the analysis of data that's distributed across data jurisdictions so these data jurisdictions they can be well real jurisdictions or kind of internal ones defined by regulations so to speak and in certain cases the data cannot move across these boundaries imposed by these jurisdictions and so the idea here is that using privacy enhancing technologies you can actually still do an analysis on this data uh combine it in a private way across these jurisdictions and gain insight from it so let me think quickly sorry about that um so one idea is to use private set intersection um and apply that to different business areas that have possibly a common set of clients and so you can use private settings section to figure out which clients are common to two different business areas and based on this intersection you can do something like a joint sales campaign and so another idea of using private sector dissection is to compute something like the overlap of of customer bases of two companies that are planning on doing a merger for example and so we see that many potential use cases are based on the fact that regulations impose these restrictions on data movement and in certain cases you don't really need the data itself but just some kind of aggregate statistic uh from the combined data set so that's one common use case in the industry i guess there are also many other use cases that rely on differential privacy and synthetic data and so these techniques they add noise too and also generate statistically similar data sets and by doing this you can actually process and analyze the data in a different data jurisdiction so this means that you could develop quantitative models with synthetic data from other jurisdictions of course you have questions regarding things like the utility versus the privacy of the data sets but in general these are valid use cases that are currently under investigation and so zero knowledge proofs have also been a big topic in the financial services industry so for example with zero knowledge proofs you can demonstrate that a certain number lies within a certain range and you can do this without exposing the value and so this allows you to prove that certain financial conditions are satisfied which of course has implications for many other things so for example if you apply for a mortgage um you could prove that your salary is within a certain range which would then be a statement of proof of your credit worthiness for a bank and these zero knowledge proofs they can also be used in other cases within banks for example the the know your client process that's a lengthy process within banks and in this process the banks have to verify that a potential client um well well they have to verify the identity of a potential client and actually show that this client has um will or satisfy certain attributes and using zero knowledge proofs you can actually validate these attributes and keep the client's data private at the same time so we have another really nice application there zero knowledge proofs so when it comes to secure multi-party computation there are also several ideas related to this that are have been floating around um i guess here one thing to note is that with secure multi-party computation the data processing is split between multiple parties but the inputs of the parties is kept private and so this allows for data collaboration and using secure multi-party computation you could do interesting things like compute aggregate risk exposures of multiple financial institutions in order to get a system-wide view on the total risk exposure or you could do other things with secure multi-party computation for example um secure trading so that's also an interesting topic uh secure multi-party computation it allows that you it well it means that you can actually match buy and sell orders of financial instruments without any party ever seeing any prices or volumes and so only the outcome of the transaction has been revealed to the parties and i think that something similar has actually already been been done in a in a practical example so there's a famous example of a secure auction in denmark and you can try to take that same idea and extend it to other use cases so as you can see there are many ideas for privacy enhancing technologies that are floating around and i only mentioned a few techniques but of course there are other ideas related to trusted execution environments and and homomorphic encryption and i guess it's up to the financial firms to well figure out how to employ these privacy enhancing technologies uh to get the most benefit for their clients thank you uh florian would you like to add anything on use cases cedric i think you gave a very good overview thank you very much for that i i want to add a little bit of a systemization to it as researchers like to do and i think that we can see that privacy enhancing technologies are applied alongside the entire data science life cycle so we see that things are employed at the face of data acquisition data collection where most of the cases that cedric now mentioned were being deployed but they're also privacy preserving and enhancing technologies are used later down the road when we actually train models to predict um or when we actually use machine learning models or other kind of data prediction models in order to do that so we actually see privacy enhancing technologies um in in the entire data science life cycle fantastic so florian next question is for you what are the hard research problems that need to be tackled across these spaces of pets and which are the most active and exciting areas of research that you've seen to solve those hard problems well well thank you nick for this great question so fundamentally i think one of the big problems that we see in privacy enhancing technology is is that we really do not really know what privacy actually means so we have been talking about and cedric has been talking about a lot about technical privacy so data minimization and that is certainly the concept that we need for technical privacy on the other hand we have legal aspects where in a lot of cases the question is where is data being processed and actually cedric talked about circumventing legal uh regulations with privacy technology so this is another interesting aspect of whether or not these things could meet and most importantly what we are currently seeing is there is a very important user aspect in in privacy so for a very long time privacy researchers have talked about the only way privacy enhancing technologies are going to be adopted is by regulation however what we are currently seeing is that there actually is a lot of pull by the users to implement privacy enhancing technologies in systems and companies are trying to actively shape their privacy perception by employing such technologies and this is a fundamental shift because we now no longer as technologists have to implement certain regulations um or or try to circumvent circumrelations but on the other hand actually having to cater for a user's need and we really don't know what that users need mean so so i can give a couple of examples so we've been talking about differential privacy right so differential privacy protects an individual in a data set however what that specifically means for example if we apply machine learning models whether or not the entire data set has an impact on the the inference i want to make or only the individual is something we really don't always know and we cannot really determine our priority so we really don't know whether or not differential privacy is a technology that can protect us in in that way and it is not a technology where we know whether or not users understand it or users can appreciate the consequences of differential privacy so so uh having a more holistic view and adjusting the privacy technologies to the needs and uh expectations of the user are really what we need to tackle in privacy enhancing technologies in the future yeah it's certainly true that modern concepts of privacy go beyond just keeping information secret which has kind of traditionally been the focus of cryptographic researchers and so on and extending the concepts such as whether consent has been granted for the use of a particular piece of data for a particular purpose and so i'm very keen to see the the research community start to focus on strong controls for that element of privacy as opposed to just um you know cryptographic methods or methods that promise strong degrees of of data secrecy um and so to flip that question around the other way florian which of these technologies do you see as closest to broad adoption in practice in your opinion given as you said you need to consider the specific use case when you're contemplating which of these techniques in isolation or combination could solve for that particular use case so there are a couple of things to add to this the first thing is we've always as i said we've all been token and there was this folklore knowledge that privacy is not really something that is being picked up however we're currently seeing that most of these really advanced cryptographic uh and other techniques that cedric are has been talking about are now being used more and more and what i believe that we will actually see being deployed is a combination of techniques so to give you an example we already have and microsoft actually is actively using differential privacy local differential privacy in microsoft windows in order to collect data so we have a use case of data collection again and we have a technology that we can use uh however the in order to still get the benefit of the data the mathematical protection the guarantee that differential privacy provides needs to be actually circumvented needs to be reduced to a almost unimportant quantity however if we now combine this with trusted execution environments homomorphic encryption or multi-card decomputation and other forms uh of encrypted computation uh we can really actually leverage the benefit of uh differential privacy and the other technologies so what i believe is that uh in terms of future deployments what we're going to see is uh the different privacy technologies including as you mentioned consent or user perception playing off of each other and having technologies that that complement each other and give a greater benefit than each individual technology of itself uh and uh in such a way us being able to get the the best out of all of them uh rather than uh taking an aspect in order to to even pin down further on that is that we see we have these encrypted computations and we have consent and we have um output privacy by differential privacy but actually only their combination is really capable and complements nicely to each other in order to provide us protection fantastic florian would you like to add anything on the hard research problems that you're seeing and which of these methods are closest to a broad adoption in practice sorry that was for cedric okay um yeah i mean i basically agree with everything that florian has said um but also from um from an industry perspective um [Music] i guess one of the research problems that we don't have enough knowledge of is pretty much related to the practical implementations of these technologies and um i'm not really talking about the security of the underlying protocols but more about the security of the practical implementations of certain use cases so i'm thinking about certain situations where users can in some way game the system so to speak for example if you're a company and you have a competitive knowledge and you want to let users access it you might do this in the form of an encrypted model but then that raises the question of model information leakage attacks right so i'm talking about model extraction and other possible security threats that might arise from this uh well from this deployment and so i guess that's a big question that arises when we try to deploy these technologies and of course every financial firm is always interested in figuring out what can go wrong and that's pretty much where the industry desires more information or more research thank you again cedric these were very good comments i i want to relate them back to what i said initially if we're thinking about privacy in later stages of the data science life cycle as you mentioned inference and protection of models generation of models we still have a definitely several research challenges but get but again as i said uh in the other comment is if we combine different privacy technologies uh we might actually be able to tackle these challenges as well now that i'm i agree that all of these challenges are solved uh but i can give you some highlights for example on the ones uh that you mentioned so we can use technology like again encrypted computation all three forms that nick mentioned at the beginning in order to protect the inference and if we want to protect against queries and leakage from the inference we can use other techniques that for allow allows for example tracing of models such as fingerprinting and or watermarking or other types of technologies that enable these types of things so overall we do have a a range of technologies in the entire space of data science in the life cycle that we can hopefully employ in the future to protect also these use cases that you rightfully mention it sounds very much from what you're saying florian that um there's not one silver bullet privacy enhancing technology where you can just tick the box and just so just apply this one technology to my uh my algorithm and it's gonna solve all of these privacy protection problems that it's much more a case of you need to understand the use case and you maybe need to design a solution that comprises a number of these different privacy enhancing technologies to provide the specific privacy protections that your use case requires is that the way that you're thinking about it i i fully agree nick you also have to understand that people are still researching what data science is and the data science process changes all the time and we're inventing new data process technologies and very often privacy is an afterthought that needs to adjust to these new developments enhance them and make him private and secure and so this is indeed uh something where we need a lot of different types of technologies as you said in order to combine them that's a really interesting point um just by combining all these different privacy enhancing technologies kind of like building blocks you shouldn't forget that you're also increasing the complexity of the system right and that's when human failure might start to creep in and so you the challenge doesn't become easier but it does enable many new applications and it should be worth it yes perhaps it's a job of the future the uh the privacy privacy enhancing technology engineer or architect or something that we're going to need a lot of those people in the industry to deploy these technologies as we move forward well i'm interested um what has attracted each of you to spend your careers looking into this space so florian would you like to go first and just tell us give us some insight as to why you've chosen this space as a space to spend your career focusing on uh it it's certainly a very interesting area and i enjoy um the aspects of these different things that play together here it was a little bit of an accident for me as well but as it is often is with a career but one thing i particularly like is to have an algorithmic challenge and in security or in privacy we very often have an additional complication to the algorithmic challenge which is to prevent something unwanted from happening and to incorporate that into an algorithm that otherwise achieves the desired goal is a very very interesting kind of challenge that always fascinated me and cedric what what what attracted you to work in this space um i kind of see these techniques as a way to give control back to the user or client right so uh it's a way of counteracting potential misuse of private data and even though these techniques they're still in well in an early stage when it comes to deployment uh they enable collaboration across various players across various companies and industries and my hope is that this new form of collaboration or this enablement of collaboration will lead to new applications that kind of make the economy much more efficient and and then everybody benefits fantastic another question for you both um to start to close out the session if people are interested in this space and i'm thinking here both people in the research community and people on the mall consumption side people from financial services organizations and other industries if they're interested in finding out more about this space what conferences resources would you highlight for people to start to get a better overview in this space and become involved in the community uh florian do you want to kick off there sure happy to so so fundamentally i believe this is a challenge generated by computers and it has to be a challenge solved by computers or humans using those computers so so i very much believe in in the fact that computer science will have to solve uh the privacy challenge that computer science generated and uh as such there are a lot of interesting kind of venues one which you already mentioned the the pets uh conference which is the leading conference specialized on privacy enhancing technologies it's a very open community that welcomes practitioners and researchers it is technically focused but also in technology it's open from for a lot of different type of aspects of of computer science from human computer interaction uh of course over cryptography but also into machine learning and others so there's a really a confluence of technology that is coming together and you can get a very good overview of what the current technical challenges are and on the other hand there are a lot of interesting organizations that have taken a lot of effort into collecting information on the subject in germany there is of course always the famous una penguin discharge land starting centre lisbeth which cedric for sure knows uh which has collected a great source of information and has actually made an effort also to systemize it to some extent and there are very many other sources and many other privacy commissioners which have done a very good job at providing an overview of the challenges and information that are out there cedric would you like to add to that yes um so one thing to note is that the regulators they've also kind of jumped up jumped on the topic and for example the financial conduct authority they've done hackathons based on this on privacy enhancing technologies there are other organizations out there like the world economic forum the eu they all have working groups devoted to this topic so you can find a wealth of information there with those organizations and maybe on a personal note what i like to do is i like to use the archive so that's an online repository for scientific preprints and just by looking at the latest publications there and and reading the abstracts you can find out a lot of information of what's going on in the field and so i think that helps immensely yeah some of the resources that i point people to uh the summary reports from the world economic forum on pro use of privacy enhancing technologies in financial services which is very good summary and the royal society report from the uk which also provides a great overview of these technologies and their strengths and opportunities and also i saw that the uk information commissioner's office recently came up with a pets adoption guide which is a nice play on words and that that's a great resource as well for for understanding these technologies and the ways that they can be deployed well i'd like to start to close out here by thanking you both for spending the time with us today and for providing such a fascinating insight into the future of data sharing with these privacy enhancing technologies it's been a most enjoyable conversation i'm really excited to see how this technology changes the world in the years to come thanks again to both of you
2022-02-02 20:40