What's the next big thing in tech? - XConf SEA 2021
Here we have this keynote talk from Somwya and Thao. Somwya is a passion that data engineer is more than a decades of experience in software engineering field, and most recently in the big data space. Her expertise and interest lies in Bachelor of engineering, predominantly in data architecture, data platforms, and distribution systems. She also recently led a large scale project in the healthcare sector, building a data platform to Apple, health, space, age, size for the citizens of Singapore.
So myself, I'm actually using the app that is powered by the platform that Sonia and her team has been building. And I'm sure many of us are using the same tool. And of all Thao, Thao started at ThoughtWorks works as a software developer before working as a tech lead and technical principal leading delivery teams. Her mere experience and interest, lies in the creation of large scale distributor applications as well. And outside the work, actually Thao is a very nurturing teacher. So she taught me how to do her stand, and right now I'm actually practicing every day at home to keep myself calm and sane during such lockdownal situations.
I'm very grateful for Thao [INAUDIBLE].. So let's all welcome Thao and Somwya for our keynote talk of today. Thank you so much Dongyu. Let me just share my screen. Bear with me one second. All right.
So let's get started. And thank you everybody for joining us today. And for the first talk of today, and my colleague Somwya and I will be talking about some interesting technology trends that we see across ThoughtWorks. Clients and projects in Southeast Asia.
So this talk aims to bring in some trends and reputation to your attention. And so due to time constraints, we won't be able to go into too much detail, but the hope is that if you find any trends useful to your contacts and situation, you can then take a deeper look at yourself. All right, so let's get started. A little bit about the ThoughtWorks technology right at first because that's where our trends and themes were derived from. So twice a year, ThoughtWorks publishes the technology right out to look at what we observe in the last six months in the technology trends that our teams and our projects are using across the whole world.
And based on our team's experience, we document what we think are interesting in the world of software development. So within the radar itself, it is categorized in a few dimensions. The quadrants denote the different types of technologies, and the rings the notes and recommendations for them. So in the center you see the adopt ring which represent the technologies that we think is proven and it's mature to use. And then the trial ring is for technologies that we think is ready to use.
But you need to try them out in your project to decide the impact. And then the assess ring and things we think have a lot of potential, you should keep an eye out for-- but they still have a little bit of maturity to do. And then finally the hold ring, the outermost ring is for things that we haven't had good experience with. And so therefore we are calling them out to warn you in case you decide to use them in your project.
So as a head of technology for ThoughtWorks Southeast Asia in the last three years, I was lucky enough to be part of the technology collection and reflections of this region. So in this talk we'll be concentrating on some common themes that we saw coming out of the Southeast Asia technology radar in the last year. So what are these themes? So two key themes that we see emerging based on the technologies and the techniques that our teams are introducing into that project. So the first theme is around the continuing struggle to move to the cloud. So we all moving to the cloud is nothing new, but the importance is on how you move to the cloud to ensure you fully utilize the cloud features in order to have resilient scalable systems that allow for a fast turnaround of your features to meet your business goals.
So I'll be talking about some key struggles and the tools and the techniques that will help in this journey. And the second thing that we have identified is around data privacy, which is a key concern in most organizations and especially doing data analysis. So my colleague Somwya, will be talking about some techniques our teams have explored to help a client in analyzing the sensitive data. So let's get started with the struggle to move to the cloud.
What are the struggles? So the first one we see is around the question of lift and shift all re-architecting for cloud-native. So let's take a look at this in a little bit more detail. Let's consider the cloud migration spectrum. So on the one hand, you have left and shift, which means using the cloud as a housing solution.
We'd replicate the same architecture, the same security measures, and the same operating model onto the cloud. And depended on your systems architecture, you might be able to leverage off some capabilities for example, if you are moving your services and your application onto VMs and then you used the cloud providers databases in which case, you might be able to start to utilize the high availability and certain scalability of the database services. And for the VMs, you might be able to have-- VM snapshot and change the VCPU, but that's about it. The utilization is pretty low. On the other hand, we have cloud native systems which are systems that are architected to fully utilize cloud capabilities. What does it mean? I'm just floating a definition from the Cloud Native Computing Foundation.
So these are systems that are loosely coupled, that are resilient manageable and observable. And when we combine this with robust automation, they allow engineers to make high impact changes frequently and predictably with minimal downtime. So that means in your move to the cloud, our recommendation for organizations looking at the cloud migration journey is don't just settle for lift and shift, but to formulate an organization migration strategy. So with the aim to enable your teams to deliver features frequently and safely, so some key considerations to think about is around, one is around having very clear objectives for your systems. What do you want your IT applications to do to meet the business objectives? So if you plan to expand to another region for example, are your systems able to cater for the load? And then, work out the key systems that need to be performing and their dependencies so that can help you prioritize your migration.
And then also very important, is around how you want to structure your team and how you've built your capability for the cloud as well. So for example Pinterest has quite an interesting approach. They invited a specialist consultant in each team who was able to provide the relevant tools and knowledge to help the team make the right decision and ease the transition over to the cloud platform. And then finally, based on your migration priorities and your team structure, you might want to start looking out what systems, how you want to refactor and re-architect your system.
For example, your apps need to be able to work with the elasticity and ephemeral nature of the cloud. So you may need to consider re-architecting your system to-- so that your services can run in parallel, in multiple instances to allow for scaling all making them stateless. The other thing is to think about how you can break your application into smaller components of services so that you can slowly migrate them over to the cloud.
So in the next few slides, I'll be talking about some useful tools and techniques in our technology radar that could help with the journey to refactor and re-architect the system. So the first thing I want to bring to your attention is the four key metrics, which is the result of the state of deadbolts report. This report is a multi-year research concentrating on high performing organization. So what this research has found is that there is a direct link between organizational performance and software delivery performance. So that means how fast the organization can innovate can pivot and can cater for the changing in the market meet is directly dependent on your software delivery performance, how fast can you bring changes to production in a safe and sound quality-- with quality.
So what are these called four key metrics? So the first one is around lead time for changes. How long does it take for you from code commit to production? Deployment frequency. How often can you go to production? And then time to restore. If a service go down or if there is a failure in your services, how long does it take for your system to restore to normal operating status? And then change failure percentage.
Out of all the changes that you have, go into production, what is the percentage of failure? So these are really powerful metrics for you to assess your system to see how it performs in terms of the ability to push changes to production and the quality of the changes. So going back to your cloud migration journey, these are great set of metrics for you to assess your existing systems to see how they perform and then identify where the key issue areas are, where are your bottleneck? Is because of the way your system is architected that you're built time takes a very long time? Or is it because of the process that's introduced, that is introducing delay to your system. So it is a great way to look at it and then really change what really matters.
So then based on for your cloud migration journey, you can start looking at this metric and identifying where you want to start refactoring and rewriting as you migrate. And then again, once you're in the cloud, based on the metrics that you've had collected before migrating to the cloud and then after migrating to the cloud is also-- they also serve as a good key indicator of how well your migration has been. Are your systems performing better now that you own the cloud based on these metrics? So they are great and powerful metrics to start breaking into your project and your systems.
OK, so another technique or another part in that I want to bring in is around the zero trust architecture for the cloud. So as we migrate to the cloud, our thinking on security will vastly change because on-prem security and on-cloud security are very two very different things. So we see with our client is that often-- we still see organizations making very heavy investments into securing their assets by hardening the virtual walls around the system, and still adhering to certain static security processes based on static resources that no longer is the reality on the cloud. So of course, network perimeter security is still very applicable, but it is not enough. So zero-trust architecture is a shift in paradigm in terms of security architecture and how we strategize for the architecting of our systems. So we need to look at implementing zero-trust security policies in all assets from devices to infrastructure to services to data.
So we can use practices such as securing all of the access in communications regardless of network location. So I just want to call out here is that Google's BeyondProd is a great introduction into cloud native security, and it does have some good coverage around zero trust architecture. And the tools suites and techniques enabling zero-trust architecture is also growing vastly in the industry as well. So I just want to bring something to your attention here as well. So the first technique that we see gaining in popularity is binary attestation.
It's a technique to verify or region of binaries and images before you execute or deploy them into production. It is important to show that they are from a trusted source. And then, the other technique is around secure enclaves. So if you do have to execute certain sensitive part in your application, for example, you need to load a private key to decrypt data in your application, you don't want to load these private keys into your common memory space, you may want to start loading them into a secure enclaves where the memory is secure and no other processes can have access to this service detail.
So increasingly, a lot of the cloud providers are catering for this. So, for example, Azure and AWS and GCP are already introducing in confidential computing features for you start utilizing and take a look at to introduce into your application. And finally, I want to talk about security policy as code, as another technique.
So this is a technique where security policies for example access control, very limiting policies are treated as code. This means you can keep these policies and divergent control. Automatically validate them, deploy them to production and monitor their performance. So tools such as open Policy Agent and Istio are a great way of-- are great tools because they offer a flexible policy definition and enforcement mechanisms.
And they really help enable your security policies as code. So it's just a very quick overview of some of the key tools and trends that we've seen in this area. And so I'm going can move on to the next struggle that we see, which is automating compliance and security for your cloud environments.
So now that you've started moving your system to the cloud you've putting in the right practices for your systems to be resilient and scalable, the next challenge we see is that gets in the way of teams being able to move fast is the length of time spent in security and compliance audit. So let's take a look at this in more detail as well. So let's consider the security and compliance spectrum. So on the one end, we have security as an afterthought. This is where we do security at the end.
This is a very common pattern that we see across all of the client. You do the security, we do our development work and then at the end, we bring in security. And that means that thinking about security at the end might leave some blind spots for you, and you may not have thought about. So there might be some risks, like risk of introduction introducing issues, and then therefore more issues might be found. And that could result in a lot of rework.
So dependent on the complexity of your system and on the issues that found, you might have to leave a huge amount of code, you might have to introducing new features to adhere to certain compliance requirements that had been missed. So it is a very long and intensive process. We've seen teams spending between a few weeks or a few months just to cater for the security audit. So on the other end of the spectrum is the technique called continuous security. This is why we built security in throughout the whole development cycle. And with the constant assessment of security for every teacher that we develop, in every story that we write, this will help us identify security issues and risks a lot earlier on.
And then can rectify them earlier as well. What this means is that there will be less security and audit findings and with the automation that we put in place for the validation. It is also easier to demonstrate compliance through the automation artifact. So I'm just going to Zoom into the continuous security a little bit in more detail. So it's also known as bringing security in, and what it means is that we bring security forward.
We start from requirement gathering doing analysis. We start at identifying compliance and security acceptance criteria for the future. And then to technical design, we start looking at threat modeling and architecture review to identify exposures, and to coding where we ensure we take into consideration secure coding practices such as OWASP top 10, dependency scanning, and automated testing for our security acceptance criteria. And also for security and compliance rules as well, it's also important to codify them and also validate them on every test run in the CI pipeline. What this means is that when issues are identified at code checking, they're much faster and easier to fix than at the end once you've completed your whole system. And so when it comes to software release, it is also considerably faster and easier to pass compliance.
And security audit as well, because the artifacts from your automated tests can be shown for demonstration. So with the rise of the cloud and infrastructure as code, I just want to call out that infrastructure as code should also be treated just in the same way as the application code bringing in automation, bringing in security just as we do the infrastructure set up. So in this space, I just want to zoom quickly into the security and unit testing and automated security check for your environments in the cloud.
The tools in this area is actually maturing quite a lot and that enables a lot of automation for us. So the first thing that our teams have found useful is the tool called tfsec for terraform, which is a great static analysis to which scans terraform scripts for security issues. And it also checks for violations of AWS, Azure, and GCP security recommendations as well. So a great tool to plug into your CI pipeline and start identifying issues on that early on. Another tool suite that we have is the dev-sec.io. Our teams are starting to use this a lot as well.
It's an open source tools suite for server hardening and security. So it checks to your server at your service and they set up against, for example this benchmark, and it's got profiles of various different types of service for example, Linux, or engine X and so on. Dev-sec tool is built on top of Chef inspection, which is another great declarative infrastructure testing tool that can be continuously run against a provisioned environment, including production. So these we tools we are combining can really help you in automating your compliance and security for your infrastructure in the cloud.
So I have spoken about some key struggles we see moving to the cloud. And finally, I just want to close off with an example from Netflix this cloud migration journey. So Netflix took seven years to complete the cloud migration. And in this journey, they completely re-architected the whole system. They broke down the monolith up, into hundreds of microservices.
They introduce in continuous security and self-provisioning, and have the automation to enable continuous delivery. They also brought in new ways of looking in with the devops mind set, team structure, and the culture in operating and collaboration within the organization as well. So at the end of this journey, they have one of the world's most well known and resilient cloud infrastructures that allow the business to be very successful. And as you can see in this monthly streaming hours, as they move over to the cloud, they are already seeing the benefits and already able to stop stealing much earlier on as the journey grow that if it also grow exponentially as well. So I will conclude my segment here, I will hand you over to Somwya, who will be talking about data privacy.
All right. Thank you Thao, that was very insightful about cloud migration. The next topic that we are going to look at is how can we analyze data without compromising on privacy. So the trend that we are looking at here, moving on, is privacy preserving data analysis. If you move to the next slide there, yeah. Sorry So let's start with what is data privacy in the first place? Data privacy has two aspects.
One is personal data privacy and organizational data privacy. Now in today's world when you go out to any of the online registrations or we are talking about banks and hospitals, a lot of personal data is actually collected. Could include your address, could be your email, phone number, your health data and you're just giving out a lot of information about artists and so many of this. So it's important that as part of data privacy, we kind of secure in this data, keep it confidential so that it doesn't get into malicious hands. That's the first aspect of data privacy.
The second one is organization data privacy, which involves each of our organization deal with a lot of financial data, lot of intellectual property that's kind of very sensitive to the organization. So both these, are two aspects of data privacy. Personal data privacy and organization data privacy. And given the criticality is why we see so many regulations in the last decade, where GDPR from Europe, PDPA from Singapore, PIPPA in the health sector that aim at protecting the data confidentiality and immutability.
All right. With this let's move on to actually see what is the technique of the data. So if you want to go back one slide, it's privacy preserving data analysis. This includes a set of techniques that aim at computing on top of the data without revealing any sensitive information. Let's see what this involves.
Moving on, we're going to talk through a few techniques on our radar which is in the assessment mode that Thao called the homomorphic encryption, secured multiparty computation, and differential privacy. These storms are daunting, don't worry. The aim at giving you an information about what these are on a high level. Talk through how they work, and in terms of conceptually how that can be applied, and close off with an example around all these techniques. And we will not cover them up in the session because of the time duration.
Let's start with homomorphic encryption. So homomorphic encryption as term suggest homo is similar, morphic is shape. So this is a class of encryption methods which enable us to actually compute on top of encrypted data without having access to the raw data. Now when you decrypted, you actually get the data result as it would have been computed on the raw input itself. Let's look at this example.
So we have to raw inputs. And I have these inputs I want to share with an external body. So I actually encrypt the inputs, create values if EV1 and EV2 out of them. And the external body can actually sum it up on top of the encrypted value to derive encrypted value 3, which is an output.
And they give it back to me, then I decrypt the result. And the result that you see here is actually equal to if we had some dot, dot initial numbers, isn't this interesting? So let's look at what a real world example looks like. Moving on to the next slide, so in a typical setup in any organization, we have a lots of teams and lots of organization. And each team, has one or multiple data owners who are responsible for the data.
So this data owner wants to leverage a third party for actually performing certain operations on the data, what can they do? They can actually homomorphically encrypt the data sets. Send it off to the third party cloud service provider who can work off the same corporate data, send them the results in encrypted form data owner can actually decrypt it back with their secret key. And this is a pretty much a flaw that we can leverage for many applications. So in the next slide, we will actually look at what are the applications that we are looking at for homomorphic encryption. So predictive analytics on medical data even in some of our products we have done a proof of concepts with homomorphic encryption. Privacy preserving customer analytics.
So we can personalize based on the data without knowing that actually that's self. E-voting is an also where we don't need to know the individual words, but we get a sense of what is the result underneath, right? Secure multi-party competition. We talk about this very soon. And quantum safe, if it's also said that the strength of a homomorphic encryption will remain intact when quantum computing kicks in a decade or so, and it should stand strong.
So the libraries here, IBM and Microsoft have been working on a lot of contributions in this area with HElib and SEAL. Policy being one more very important open source contribution in this area. So do check these out if your interested here.
Moving on, let's look at the next topic, which is secular multiparty computation. If you are to just organize and from the dome, if you understand, this is a set subfield of cryptography as well. Then multiple parties can come together to perform a joint operation, computation on the data, without knowing the input data itself and still generate the correct output. So let me clarify with a very simple example. So if you have a data set A, say this is a financial sensitive data in an organization, and we want to leverage three parties to look at computing on this data.
So this data is tokenized into input shares, and each input is given to each of the parties XYZ. And these are encrypted. And by encrypted, these at homomorphically encrypted.
So by then as we have seen, the parties can actually perform these operations on encrypted data itself giving the outputs in the form PQR, which are encrypted as well. Now when they return it back to the data owner, they can reconstruct this data, add them together to arrive at the final competition result which is, if I had performed the function on each A itself, the same result is obtained here. So how is this possible? This is possible by the parties adhering to a protocol whereby they are able to communicate with each other. And also following a set of instructions or they are participating as part of the distributed computation.
So what are some the applications of this? So moving on to the next slide. Very interesting applications that we see here are an end-to-end encrypted relational databases, where SQL credit computation is done in a distributed fashion using secure multiparty computation. Similarly, statistical analysis in R also uses MPC underneath to secure the intimate results. E-Auctions is a fantastic example of people coming together in participating in a function, but they still don't know the inputs that are relevant to the auctions. Teleconferencing in terms of securing white calls, we don't need fully trusted servers in the network. We could still secure all using MPC.
This is just a subset of libraries that you can look at, there are plenty of them out there. Well, pyTorch, we have crypTen, a TensorFlow has Df encrypted, and the lot of MPC frameworks. Something like MPC is OK, is a good place to start.
All right, moving on. We are looking at the last topic in this space, which is differential privacy. Now, differential your privacy.
The idea behind this for any valuable data analysis to happen, we really don't need the individual data points, rather we are looking at the distribution of the data to be accurate. Let's look at an example. So in a typical health sector, many of us own smartwatches. And this smartwatch actually collects a lot of personal data. Could be of activity data, could be your sleep. All this is personal data that's collected.
Now see, there needs to be some analysis that should be done on this data. From the contributors device, how can we send it off to the servers of our security. So the second step that we can do is, we look at the salient points in the data, and we add some noise to these salient points. Now let me say noise is-- this is very mathematically curated noise. So let's not get into the detail, but we are adding some kind of randomness to these data points.
And by randomness, we are making sure that individual data points are not identifiable. But the distribution is still intact. All right.
So from the smartwatch, these randomized data points, which are colored and low are sent to the collection server, that-- I can reconstruct the stream stored up in the database, and this data can then be sent out for any kind of analytical use case without fear because that is no way you can reidentify from this data. So one differential pricing has done here is, to make sure I can still perform valuable analysis on the data without knowing the individual data points. That's kind of the beauty of differential privacy. So let's look at some of the applications in this area. Any kind of privacy preserving statistics or machine learning, secure on machine learning, differential privacy can play a huge part.
Customer behavior, Apple uses in its iOS phone in order to understand what emoticon like a particular individual is using. It uses differential privacy. So it can do behavior and analysis without knowing the individual emoticons versus secure sharing of demographic data.
The US Census Bureau has used it in 2008 to share census data using different policy. And they recommender our systems you can build with differential privacy as well. So the libraries have plenty of them out there again. And Google, Apple, and Facebook have been huge players for the last decade.
We have Tensorflow piracy from Google, differential privacy libraries from IBM and a recent one from Facebook all Opocus.ai. So with this, let me move on to our slide that can conclude referencing a research paper on the same topic. How can all of these be put together to actually form a solid concrete intelligence lifecycle? So from the right side, Jane is our algorithm owner, and say she has an untrained algorithm. So what she can do is homomorphically encrypt this algorithm, so that she doesn't want to reveal the algorithm to the rest of the world, but engage multiple parties in actually training this algorithm.
This is called federated learning, where people can still participate in training your models with their own heterogeneous data sets, and without knowing their intermittent algorithm. And that's done using second multi-party computation. Right? This is secured AI. Now, this trained algorithm can then be run on hospital data, medical data, patient data from clinics.
And the output of this model again can be sent back. But prior to that, there will be still data about the actual data setting as part of the model output, which can be obfuscated using differential privacy. And it can run through layers of anonymization, and a trained algorithm goes back to the owner. So here you see, the algorithm owner that doesn't see the data, the data owner doesn't see the algorithm and this cycle is pretty much secured.
So with this, I want to conclude that in our day-to-day applications, it's extremely important with data by using the consideration, leverage these techniques and many others in the space to make sure that in transit addressed, and during competition privacy is taken care of. Thank you, and [INAUDIBLE] Thao for the closing remarks. OK, cool thank you so much Somwya. So just closing off, so Somwya and I spoke about some very interesting trends that we see arising from our project in Southeast Asia. So now I would like to invite you to also think about the trends in your organization and in your projects. So for your organization, you can consider building your own technology radar.
And that is a great tool that's available on the front website that you can go in and actually start building your own technology radar. Doing this will help you objectively assess what's working and what's not working within your organization. It it's also a great way for your teams to pull that innovation across the whole organization, and also start doing experience sharing on that experience using different types of technologies. And also having a holistic view of your technology radar also gives you a complete view of your technology portfolio as well. And then that can then help you decide your capability strategy, your technology strategy that can then help your organization to be more successful.
And then within the project itself, as a take key in a project. I would recommend that we run regular technical retrospective to assess the technologies and techniques that you use within your project. So what you see here is something called the Comfort-O-Meter, something that my colleague from ThoughtWorks Stefan Dowding came up with to run with his teams. Is a great framework for the team to look at and categorize the technologies in different areas based on the comfort level and the dependency that you have on the technologies. So the teams can then be very intentional about what is not working that you should stop looking at removing from your project, and what you are seeing as a good thing that should be coming into your project.
So, again, do consider. So I think it's important to keep track of all of the technology trends out there in the technology world, but also it's equally important to see what is working and what is not working within your project so that you can then make the right decision at the right time. So if you have any further questions, please raise them in this lecture channel for our talk.
Sammy and I will be there for the next 30 minutes to answer any questions that you may have. Awesome thank you so much for spending time with us. SAMMY: Thank you. Thank you. Thank you so much Thao and Somwya, what a wonderful.
2021-06-06 01:37