Ten technology trends that you need to know – Bharani Subramaniam and Vanya Seth – XConf India 2022

Show video

We are all meeting after two years. And Vanya and I were thinking like, what should we talk about? And we decided, let's do a tech trends talk. And then we said, OK, let's do 10 trends. So the way we went about picking these is that we wanted to be like here and now.

These are not far fetched tech trends. So these are not something that's going to be available, let's say, years from now. In some sense, these trends are disrupting our day to day life, right? A few of them, you may be aware, some of them could be new.

Our hope is this is a mixed audience. We know that. So we curated in such a way that there is something for everyone in the audience. Right so let us know, give us feedback and then happy to talk. So it's going to be a lot, so I'll pace it through.

So the first trend is scripting the kernel, right? If you take any serious system software like think of kubernetes, for example, or let's say browser or let's say a serious gaming engine with a physics engine in it or even trading platforms, right? Like somebody would innervate core things in the center, right? Like and let the innovation on the edge happen, right? Because not everyone can solve every other problem. So you solve something core in the center and let the others script them, extend them through APIs and things write all these systems, software that I spoke about, have that you have JavaScript for browser, you have operator framework for Kubernetes and so on and so forth, right? but let's say if you want to do something serious with your operating system, let's say I have Linux and I want to do something extra than what is available. Let's say I want to observe a running application in production. Right we want to do beyond logs. We want to observe it.

And let's say I want this feature to be somehow. What if this is available in your operating system, let's say, if you come up with that idea. So the only way you can make this work is, let's say you supposed to know somebody in the Linux community and this person is sold to an idea. Think of how many ifs and then this person convinces the rest of the Linux community. And if you're lucky, in one year it will be available in your mainstream. You know, upstream kernel, but you have to wait for five years before you can use it because it takes five years for the distributors to take a test at scale it and then it's available.

That's that's too long a time. Right like if you want to extend something as systemic as your operating system, that feedback loop is five years. OK, I'm going to change the gears now. We all know, like Linux is like I don't know, it's a treasure trove of Easter eggs. If you look at the Unix, like how many of people here have used tcpdump or heard of tcpdump show of hands? Awesome that's a good number.

So you know that you can actually give an expression to the tcpdump and sniff what's going on in the packet. But the designers of the TCP dump, the way they built it is they didn't make it like, oh, just an expression evaluator. They actually built a really optimized VM inside the kernel. Right but the problem with this kernel VM is that it will only do networking and it's very limited because you don't want to like crash the kernel and do something exotic. So what? So this is the realization. So you already have a highly optimized VM inside the kernel for you.

It's just that it is limited. It had a lot of constraints. So few smart people. So that VM is called Berkeley Packet filter, BPF or short.

A few smart people decided let's extend this, let's drop some of the constraints. Let's give more flexibility so that, you know, you can actually program this and make it useful. And that is what is E as an extended BPF, right? Nowadays, people you don't have to say BPF. If you say BPF, it means the same thing. So what happens now? Let's say if you want to do something new that's not available in the Linux kernel, you can script it and you can script it in a number of languages because BPF is a backend. So if you have an LVM that can compile to BPF, you can script that.

So what used to take five years now can be literally done in a day, right? So that is a big leap. And if you are following whatever I was saying, you should have a concern because we all have programmed. Imagine your first time experience writing JavaScript for your browser. We all know that experience. You may do something.

You may put a loop. You may crash the browser. And we have always seen like too much memories taken by this app Warning from the browsers. So the kernel community is very aware of it.

Right so there is no way you can write a VP of program and compromise a kernel. So the way it works is there is an explicit verifier and only when your program is safe enough it is actually accepted. Otherwise it is rejected. Your program does not even load. So it is safe.

The concept of scripting the kernel may not be very. Like first choice, but it's possible to safely extend and write applications at the user level, something which was only possible with kernel extension, kernel module, which takes years to do before. Right so that's the trend I want to highlight. This is if you ask me, OK, what can you do now? Right pretty much all the layers of the operating system can be extended, probed.

You can have your own code in it. This is a snapshot of Brent and Greg's book. You can go check it out that a number of tools already built for BPF. And if you're wondering, OK, what is it for me? I'm an application developer, so we have a talk later today called future of service mesh. So BPF is already disrupting how service mesh actually functions, right? Like how to do it. Like, I don't want to give up too much of this talk, but please watch out.

And this is not only a Linux thing anymore. So BPF is now a foundation that are efforts from Microsoft to implement this in the windows as well. Right so eBPF is now literally the Super power to script your kernel, right? So that's the first trend.

I'm going to change gears now and talk something about data. Right, like. Pretty sure every one of us are building like systems, platforms, data pipelines, like almost all apps are now like super hungry for data. Right you need like you stopped saying about GBS of data. It's all terabyte or petabyte like.

Depends on how serious you are crunching data. But they all have one problem, though, right? Like I used to joke about it, right? Like the bad systems have become so performant and optimized that you don't even realize, but the fundamental database has not evolved at all. What I mean is, let's say I'm a user to a database.

And I issue queries the database response. If I'm an app and I issue a query, the database responds, but an Apple ID user is not the same, right? Like the app is not going to change the shape of queries like 2 times a day. That's not going to happen. But the DB has no knowledge of that. Right? think of it this way.

Like you walk into a tailor shop. If you visit the tailor shop once a month, it's OK to be measured. Let's say you walk into the tailor shop 10 times a day and every time you measured, that's how the DB actually behaves.

Because every time app issues the same set of queries, it has to keep redoing. So things like this we keep redoing every single time we make a read, right? Do we all agree? Like, I mean, maybe you're using a different database, it's doing some other clever tricks underneath, but they all have this problem. People get away with partitioning and this and that. But end of the day, when you a read, most of the computation happens and you end up redoing the same thing. Right so the question is all this is fine, but I don't do this. I'm smart, right? Like I'm going to put a cash in front of it and solve all the problems.

We do this, but we all say this about cash, right? Like if you have cash now, we have two problems and this is how a developer reacts, right? OK cash, you have number two problems because I need to invalidate stuff. But if you go talk to an Sri, right, someone who runs, they will see cash as a different way. Right they will think of. If you have a cash invalidation, you will have a thundering Herd problem. What this means is, let's say you have, I don't know, a few celebrities in your system or some leader who's followed a lot or some really hot keys.

And when that key is invalidated, you're going to be getting hundreds of thousands of requests a second, and it's just cash. If it is not there, the people aren't there. That's not going to happen because the GDV is going to be busy answering people and then going to have more users and more users and you have this thundering Herd effect.

So the cash does not solve the problem of recomputation on read. So the idea is like. OK, I have a database.

I have a, you know, an application. Is it possible to say to a database? OK, here is the shape of the data. And irrespective of my source, can you do this transformation automatically and keep it right? Why not? You can ask this question, but it is really hard problem like if you are from database community. OK better need is a fool. He doesn't know about materialized views.

Right in some ways, what I'm talking is the problem that materialize views in a way tries to solve. But the problem with most of the implementation of materialized views is that you have no idea when your data is invalidated or what data is it reflecting. You have no control.

It's a Black box, you kill it, you create a materialized view that joins all your 20 tables and does your aggregation or whatever you want to compute. You have no control of it. It's a Black box.

So and if you're from a functional world, I will do this with memorization. I'll do with continuation. Not not possible. Because let's say if you have a bag of data collection of data, and if a small portion of the data changes, you think of what all computation you do, right? It's going to create a number of intermediate stages.

They're all going to go inconsistent. So you cannot. Very trivially compute the Delta for what you want. That's the point I'm trying to convey is that this is a really hard problem. Luckily, there are people who have been doing research on this. So this is the first iteration came from Microsoft Research.

The paper called niet. A few people discontinued. And then some of them continued researching.

This is called a differential data flow, right? Basically, it lets you do differential calculus so that your computation can be incrementally changed when a small amount of your input changes. You don't have to go and recompute the entire thing. But the problem with this library is it's not a problem. It's just that it is written in Rust and not every data engineer is comfortable with Rust right now.

So that is a bit of a learning curve here. So people did realize that. So they built a SQL layer on top of it. Right you don't have to work directly with the differential data flow. You can write SQL 92.

You can create views the way you used to create views in a database. But unlike other traditional materials, views materialize. The company will use the differential data flow and compute the output based on the input. Right so that's the illustration. You see, you who could do Kafka or Postgres or any number of databases, you can publish it.

And then this will use a differential data flow for you to compute the exact nature of the shape of the data that you want to use. So no more waiting Tens of second on a read your reads instantaneous. You can have a really interactive real time analytics with powered by differential data flow, and you're assured that the output is theoretically and also verifiably correct. So that's the key. Good all right.

Act all right. Can you hear me? Nice and well. Excellent so even before I start talking about this trend, I want to show you something.

Have you had this sort of a reaction when you have laid your hands on your cloud cost bill? Show of hands, quick. Show of hands. So you relate to this emotion or have you had this reaction from your clients when they saw the cloud year? Excellent now, I'm more happy about the fact that, you know, some of you have seen your actual cloud castle.

And I'm going to get to that in a minute. Even but before we start talking about this trend, right, we need to understand that there are two specific themes that are underpinning this. The very first one is how does an organization or a company choose to spend their money? And the second one is who owns the cloud costs right now to think about who spends the money for an organization? Let's take a step into on prem role.

I'm sure a lot of you here are still on prem and the dualities. But there is a procurement team or a procurement department that's sitting between the product teams. And all of these sanctions and approvals. Right to get the hardware acquired and provisioned. Right it's a long, sluggish trail that happens. I'm sure a lot of you can resonate with that.

That's the reality of life. Right but but now, if we take a step forward towards the cloud, right. We have seen that in the last decade or so. A lot of organizations have started moving their workloads to the cloud.

Right there is an increased awareness about devops, right? We want to bring the developers and the operations people together and create that sort of a momentum and give the power of creating and provisioning infrastructure right into the hands of the developers. Right in that case, you know, it's sort of fair to ask, is procurement still in charge of spending the company's money? Probably not. Probably not, because now the responsibility or the power lies in the hands of the engineering teams. Right they decide when they want to provision infrastructure or when they want to, you know, create something.

And that's their free will. Right so if you see it's very simple to say that the DevOps and the cloud have broken the traditional procurement model. Right we have moving from a CapEx heavy system where it was a capital expenditure for an organization to acquire hardware to an OpEx heavy model where it's an operational expense, you know, for the teams to start incurring their cloud cost. And there are, you know, it's happening at the free will.

It's procurement is not probably looking at it right now. Right it's fair to say that. But there is a big but here.

We still see teams, you know, product teams, engineering teams not taking complete ownership of their cloud cost. Right they are not taking account of when they are making infrastructure decisions. What is the financial implication of this? Right how is it going to introduce new costs? For me, they are not thinking about this. I believe that there are probably two teams under that as well. Right first, you know, we have come from a procurement background. So there is still that thought for us that it is somebody else's problem.

Probably somebody else is managing the cost. Right so that's the mindset part. Right and second is that these teams are not measuring their cloud cost in accordance to their business outcomes.

They are not associating these two on the same plaintext. Right and I'm going to steal a dialogue from the Spider movie and say that, you know, I'm sure you have heard this a gazillion times with great power comes great responsibility. Right now, the power of provisioning infrastructure is in your hands.

You have that power. But are you exercising the responsibility to also manage your cost responsibly? It's as good as saying, I am going to exercise my right, but don't ask me about my duties, right? That's not sort of fair. Right and that's where we believe that product teams, engineering teams should start to take ownership of their cloud cost and start asking themselves very hard questions on what is the cost of business transactions that are happening in from their product. Right for instance, imagine an e-commerce sort of a setup, right? There are a lot of business transactions in an e-commerce setup. There is a customer acquisition cost, right? There is a cost when a customer places an order.

So there are lots and lots of business transactions that are happening in that system, in that enduring system right now. Do the teams know what is the cost of acquiring a new customer? What is the cost? Cloud cost here? I'm talking about the cloud costs when a customer places an order. Right and that's where our advice and this is not a trend that's far fetched.

It's happening right now. We encourage that all autonomous teams manage and their cloud costs and create this vicious loop of feedback for themselves. The first thing is to observe, monitor, measure against their business outcomes.

As I was mentioning earlier, right? You own your cloud costs, but also look at what is the return on investment on your product. Right every time you add a new feature, how is it yielding value in regards to the cost that it's incurring to yourself? Right the second bit is you optimize the rates and usage, right? You bring this process of keeping a tab and monitoring your cost as part of your software development lifecycle. So there are a lot of interesting tools like infraco start I/O that is also featured in the last techradar that Thoughtworks produced, which helps you evaluate what is going to be the financial impact of your cost, right? In terms of cost, add the Terraform code level. So that's a very interesting.

Thing right. And one more important thing about the observed parties, I'm sure a lot of you are aware of how you put dashboards on Gravano and what you want to see and monitor something continuously. You want to put it up on the dashboard, right? So here also, we encourage you to think about putting up your Cloud cost right on your dashboard so that you can observe that, so that you can monitor that at all times. Right and last but not the least, you know, talk about continuous improvement in terms of operations and think about the right mindset where you are thinking about your cloud costs and associating them with business outcomes. So that's the first trend.

Second one is privacy preserving record linkage. Now, this is what this one is sort of my favorite. So I'm going to spend a little bit of time on this. Right but before we start going deep inside this, let's start with a question. If I were to ask you, what is the burden of lifestyle diseases like diabetes in the city of bangalore? What is it that you're thinking? What's going through your mind? Probably I have to get data from different types of medical institutes, right.

Because it's possible that I, as a single person ended up consulting for different doctors and for different medical institutes. So you don't want to double count me or triple count me, right? So you're probably thinking about that now. If I ask you that, how does it correlate between what people eat and the significance of diabetes? So I want to hypothesize what people eat has an impact on their likelihood of getting diabetes, right? So now I'm talking about more data.

I'm talking about probably including data from some different sort of organizations who understand this. And similarly, if I ask you, what is the relationship between how much people exercise? So now you see that there are a lot of data points which belong to different institutions or data custodians right now. That's where it is fair to say that data sharing at scale is a heart problem. Right and if we sort start talking about, you know, industries which are regulated in nature, I'm sure a lot of you are also working with health care domain. So you would know that patient confidentiality or, you know, keeping the data safe, the PII data save for in such industries is of extremely high importance, right. Because they are regulated.

You cannot just share unencrypted data in any format with other organizations. Right so data sharing at scale whilst maintaining privacy, I would say is an even harder problem to solve. Right and that's why this specific area has been a center of focus.

Right there is a lot of research that's happening in this specific area. And one of the techniques in this specific area is privacy preserving record linkage. Hi It has been an area of research that is still research that's going on this specific topic. But this has been implemented. This has been implemented by one of our teams in Singapore. I'm not I cannot give you a lot of details, but it has been implemented.

It has been implemented with the help of several research papers that were done by the academia. So it's still it is now becoming a reality slowly. And before we deep dive and put this all together of what this exactly is, let's break this down into simpler concepts. The first one is what is record linkage? Now there are two databases and I want to identify the same entity across these two databases. And you know, the constraint that the data is not available in unencrypted format, right.

It's still not directly accessible in a way that you can make sense of what's happening. Right so I need to be able to identify that it's the same person. Bernt is the same person in the other database. And it's possible that the spellings are, you know, captured wrong. It's highly likely. Right? right.

So that's the part of record linkage. I need to be able to deterministically say that this person in database is pointing to the same person in the database. B next concept that's important to understand here is that of a Bloom filter. So bloom filter is a probabilistic space efficient data structure.

Right I'll explain that in a bit. But let's take a set S with n elements set as with n elements. Now I take these elements through standard edge Mac hashing functions like sha256, Sha 512, so on and so forth. Right and I execute these Sha functions on top of this set of elements. And every time I do that, I get a position value. I'm sorry.

I get a position value and that is the position value that I turn on in my bid vector area right now, something very simple. So now you can see that if there is a large, complex data point, I have now represented that in a space efficient manner in a bit vector array of ones and zeros. Right and I can take that through a series of functions. So every time I get a position value, I toggle that on in my betvictor array.

Now, having created this data structure, there are two specific operations that I can run on it, right? Which is about set membership. I want to know whether an item bar is part of this set or not. So this particular bloom filter, this particular bloom filter that come on, this particular bloom filter that you see here is some value. OK we don't know what this value is, but it's some value. Now, I want to verify that the bar as a value is part of this bloom filter or not.

So what I'm going to do is I'm going to take a bar, run it through the same series of hashing functions with which I created this bloom filter and try to see what are the bits that are toggled to one. OK, now I saw the first bit. I know that it's four. I checked with number four. It is still zero, right? Similarly, I go to the second function. It's six, I go to the sixth bit and it's one.

But because the fourth bit is zero, I know that this particular value is not in the set. Right? so this is where a Bloom filter shines and says that this particular value is not definitely not in the set of elements. All right. Now, let's take a look at the comparison for Foo here. Again, you don't do the same set of steps, take it through the hashing functions and I get bit number 1 and 6.

I can see that a bit. One and 6 are toggle to 1 for Foo as well. Now here, I'm saying that it's likely that this particular element is part of the set. I'm not saying it deterministically because it is still a probabilistic data structure, not a deterministic data structure.

I can possibly say that, yes, it exists in the set right now. You can then ask that there is a high chance of false positives, right? It's possible that, you know something is not in the set, but then it gets identified as being there in the set because it's a probability that we are talking about. And that's where we talk about the similarity coefficient, right? Similarity coefficient is a simple mathematical formula that allows you to verify what is the similarity between the bloom filter that I'm comparing against and the item that is under consideration? Right so you can see from this that our team ended up using this jugaad index, right? It's a simple formula of area of overlap divided by area of Union.

And from a computer vision perspective, a full overlap is an excellent match. The other one is a good enough match, and the third one is in a poor match. And based on my use case, based on my sample size and population set, I can set a threshold value, right? And any positive match that is above the threshold value will be a match. And anything that's less than that will not be a match.

Right so you see how we can in a very interesting way without knowing the actual values based on just the bloom filter data, compare data sets across different organizations while maintaining the compliance and regulatory aspects of sharing that data. Right again, if I know that there are two databases, I want to compare the spellings of Smith and Smith. Right very simple difference. But then it's possible. Right in medical systems specifically, you know, these errors are very common. Right some institution will record it as smith, some will record as some of it is and some what not.

Right so how do we disambiguate that? This is a very simple example. As you can see, two bloom filters are being compared to bloom filters that belong to two different organizations. And I am trying to ascertain that are these same people right? Are these the same entities? And there is some preprocessing that has happened here. Thus, the word Smith has been broken down into by grams. Right depending upon my use case, I can also choose to go into try grams and others. But just for example, these by grams are being compared, right? For instance, if you see here, the difference of the spelling is at and my so in the first bloom filter you can see that bits number 3 and bits number 28 are toggle to one.

Right and in the second, spelling bits, number 3 and bit number 8 is toggled to one, right? So there is a lot of similarity, but there are a few bits that are different. So depending upon the coefficient value that I have set for myself, these two entities can match, right? Now putting it sort of together, right. If this is the overall architecture that has been followed, there are two sites, site and site B having two different databases, two different data custodians. There is some grabbing happening.

In the previous case you saw there was a diagram set up. There are some pre-processing happening. Bloom filters are being created for all the personally identifiable information. Right and then there is a trusted site. See where this linkage is going to happen. Right so in this of a setup, there always has to be a trusted site c, where the linkage is going to happen.

And then after this, a link table is going to be created, which is ready for consumption. So this is how the view is going to look like. There are two data custodians.

These are the two different tables. Of course, it's going to have a lot more data just for the purpose of explanation. It's kept to simple. There are Bloom filters for the encrypted values, right? And now I'm trying to figure out whether these two match that these two belong to the same people. And based on my bloom filter similarity coefficient threshold checking, I'm going to say that yes, probably they belong to the same person. And now I have created an interesting link table which is ready for consumption.

So in my mind this is a very interesting breakthrough because this allows research to happen. This allows innovation to happen, because we all know that value can come out of data only when we are able to join it. Right but while at the same time maintaining the privacy and confidentiality and compliance related aspects as well. Hey, so in say Thoughtworks and thoughtworks, India, at any given point of time, we would have hundreds or 120 projects going on.

And we have this privilege of seeing the pattern across all. Nowadays most of the projects are like platform, like the old abused term. So we end up talking, what's the biggest challenge that you face when you're building the platform? So we ended up talking, let's say, for in a one hour time, we ended up talking about authentication and authorization for like 40 minutes, right? Like it's I see a few laughs like, yeah, so you may ask me this is 20, 22 or you still talking about. Are we not gone beyond or ten? An Aussie sad answer is no. So if I ask you like what are you using for authorization? What would be your answer? What do you use in your platforms and your projects? Quick look. OK what else? So the thing is this.

OK, maybe Kellogg does a lot of things in one, but a lot of people think that auth two is an authorization framework. Right like this is a testimony that we really suck at moving things. Your authorized to enter is what to. You're not authorized to do anything after you enter. Right or to is basically an authentication protocol and they created open Connect after that.

So the point is. Authorization still is a really hard problem to solve because the kind of things that we now do in platform is also not trivial. All right. Like you need to give some credit. So we're going to talk about Zanzibar and I'll talk about it. Right so so this is a really tough problem, right? Like few people solve this, we re-implement items.

There are like any number of rule based authorization frameworks or there let's say I want you to ask this in your head. Do I have access to wallet? Ask this to your authorization system. I'm sure your system will tell you Yes. No, maybe. Right like that's not a problem.

Let's just add to the problem. Do I have access to money? Now, you may think logically, right? I intentionally pick this example. Money is inside my wallet. Obviously, I have access to money.

But in your complicated world, this may or may not be true. Right like an on this suddenly becomes a graph traversal problem. IAM access to and is connected to be am I connected to B whatnot? Right I think so far is good. Most of the authorization frameworks and systems out there will answer all these questions for you. But let me complicate one step further.

Tell me everything that I have access to. All right. Can you get this answer in a performance scalable way so that you have a very responsive app? Right like I'm sure all of us use take banking. Retail, right. Like their menu will be so complicated, it'll get lost. And then you go click on something.

They will tell you you don't have access to this product or you're not subscribed. That's because nobody is asking this question at runtime for you because it's extremely costly. And let's say I have 100 services to offer. If I go ask my authorization system to have access to this, do have access to this, it's not going to scale. Even if you build like a patch over it and make it a bulky pin and send it, it's going to loop internally. Right this is fundamental problem in most of the authorization systems.

So stay on that problem. Have you seen this dialogue pop up in your day to day life? You have done an awesome presentation in your Google slide and you want to send somewhere and this pops up, comes up like you are about to send to somebody and you're not given rights. This does not use to happen before, right? Even inside Google. So what I'm building up to is that Google told us really, really well for their systems at scale and they actually wrote a paper about it.

I think this came out in 2019. The paper is called zanzibar, right? It's built on. Now Cloud Spanner. It has a lot of systems and Google being Google Zanzibar as a paper is awesome, but the actual system is internal to Google, right? There is no way you can if you're building an awesome product, you can't use Zanzibar because it's an internal service.

But the industry took some time like, say, two or three years to realize what's the potential of Zanzibar. That is quite a lot of tools right now who are building Zanzibar inspired product. Think think of the Hadoop paper, right? How we would actually open up the whole big data momentum and the ecosystem of open source. Similar thing is happening in authorization.

Like if you watch the news every other day, somebody will implement Zanzibar. We just cherry picked one. It's called spice SDB. That is a company called Odyssey. There are a number of solutions. We really like this because of the way they actually modeled.

You can read more about space in their website. But the point I want to say is that the new alpha authorization system, people talk about separating code and policy. Right you've heard that advice before. Keep your policy of who can do what or what should be done.

Separate from code. But these systems take that asset further. Right like your code is separated from policy. But then the data that the policy depends on is also decoupled. Right and that is what enables these systems to answer queries like what I gave is called an ACL filtered list. So if you are building a rich user interface and you want to show only products that have access to you need to do the ACL filtering.

And that's possible because of this decoupling, right? Your data, your authorization data is decoupled and actually maintained in a graphical form. So that such questions, that scale can be answered efficiently. Right so that's the key.

Watch out for these. And these I mean, we say this, right? What should be in the platform? Don't stuff everything under the platform. And authorization is a very good citizen to be inside your platform. These are interesting choices out there right now. OK so let me change gears now again. Few product people in the room, I suppose.

So we. We have very passionate technologies. Right like we will talk about if you give us something, we will tell you. Engineering excellence matters how you build, software matters. There should be a rigor in it and you need to have a test coverage in it. It should have a clean code.

All that says fantastic write like Ken beck, the person who came up with XP after 20 years of xp, he had this like he recollected what is the next iteration? And he came up with this 3x model, right? And that came back 3x model. So think of you in one of these three phases. If you are a startup, you're probably in the Explorer phase. And you're trying to dig up as many holes as you can to see where the gold is. Right to see what would be that big idea for your company. And then maybe if you have found a few and you are in this expand phase where a few of them is succeeding, where everything is haywire, I'm trying to fight the fire in production.

I'm fixing things. But something is working right, then is your face where extract. I know this is working. I'm no longer a startup. I need this rigour, I need process, I need repeatability, I need automation, all that to scale because I know my idea is succeeding. I'm making money.

The message I want to send is all of what I spoke about rigor, culture, engineering, culture is all awesome, but most of them apply to this. These to mostly to extract but to some amount to expand. Right but if you are an explore phase. And you need to know which phase you are in.

So if I ask you like I have this shiny idea and you want to test this in the market, what would you do? What would you do? Next big idea. Test it. Prototype it.

But the point I want to convey, I mean, I gave up in the hashtag, so prototyping makes sense here. But at times, prototyping is too costly, too slow to give you that quick feedback, because these are just ideas. And ideas live only in thought line. So if I tell you my idea to you, how would you respond? There will be a bunch of people who will say, wow, Bernie, I've been waiting to do this, do this for like years. It's glad you have thought of it.

This is going to be a big hit. Go build it. You've heard. And then a few set of people will say, this is not going to work. You're too crazy.

Market is not ready. Like but the point is, all of our ideas in thought line, right? This gives nothing to you. So the question is what? What other tools do you have in your toolbox? Like, how do I test my ideas and get really validation from the market that it will work? So this is. You might have heard this term get market data. OK this is like this. If I'm coming up with this new tech, let's say I invented blockchain, it's like me now I've invented blockchain.

Now I want a job. I put up a job description. I want people with two years of experience on blockchain. Doesn't happen, right? This idea only lives in your head and you are asking and advising others to get market data for a product that doesn't exist, that has not been built, it is not even physical. So market data for a new idea that's in your head is like asking for a unicorn does not exist. Right? so what do you do? So this is the quote from Alberto.

He says that people will open their mouth very easily. They will say Yes or no. Very easily, because it means nothing to them. There is no skin in the game. The real question you need to ask is, will they open their wallet? And that is your data. If you if you know that if you're testing with the market and say somebody is ready to pay for an idea that only exists in your head, that means you're on to something that is worthy.

It may still fail. So we have research that says that 80% of new ideas fail, but you have a better chance of succeeding. Right and this technique of prototyping, it's not prototyping.

It's a stage before that. It's not new. All right.

So I want to take you back to the era where IBM introduced. People might have known this, right? Like when IBM introduced their first personal computer, they had a keyboard and no one knew how to type. Right no one at that time, no manager that they are trying to sell this to, knew how to type. And then they thought, OK, if only I can give them a voice to text.

You will hear talk later in the day about how difficult it is to do voice to text. But this was '70s. They neither had the tech nor the hardware to actually do this, but they thought, OK, I want to solve this problem because nobody will buy my machine if they don't know how to use it.

What if I gave them voice to tech? So they asked managers to dictate all day, right? But they didn't build anything. They only had a long keyboard wire to the next room. So somebody is hearing this person talk and then it'll show up in their system. This was the first this is probably the birth of prototyping. And then and then they interviewed the managers. And the end of the day, will you buy this product if I built voice to text? Pretty much 90% said Yes before they walked into the room.

In the end of the day, almost everybody said no. And the reasons were like, I'm super tired talking all day. Most of the work I do is confidential. I can't dictate this, so this is going to be useless.

I would rather learn how to type, right? So this is I want to say that this is just not in 70s or 80s. Let's say this is an example of cars direct. They want to test the hypothesis of is there a market for second hand cars in internet, right? Like so they built this web page. They put all the product catalog of all models of car and there is 0 in inventory. They want to say, oh, well, people buy like they had four sales in a day.

They shut down that website, went actually to buy the car and deliver to validate that, OK, there is a market here, right? Like so that's another prototyping. I think the most famous one is the case of Tesla. Right like Elon Musk literally ripped out the engine and said, OK, imagine I'm going to put an electric engine and it's going to go 0 to 60. I'm not build a car.

I've never built it in my life. But if I build it, will you buy? Like, I think most of the people said you're crazy, but a few people actually gave him $5,000 to say, OK. Take this. If you build it, let me know.

Right this is literally how Tesla was born. So if you think, oh, will this work only for Tesla, will this work only for IBM's and cars direct of the world? I think there is something in here. That's where I want to tie it back to the convex model. Which stages of product lifecycle you run, right? Just because we have tried and tested engineering principle does not mean it's a cookie cutter approach and you have to take it and apply it for all phases.

Right? all right. So clean software supply chain right now, if you talk about what are the challenges that we are as humanity are faced with, I can very easily say that climate change is one of the most important challenges that humanity is faced with. Right and that's the reason why businesses are under tremendous pressure from the governments, regulators, investors, customers, consumers, right.

To reduce the carbon emissions and have a positive impact on the planet. Right now, all of us are techies here. Right and we could have this question. What role does technology and technologists have in this entire equation? What can we do about it? Right and how can we be part of this solution. And not part of the problem? Right before we get to that, let's look at some interesting numbers, right? Internet soon will be responsible for 1 billion tons of carbon dioxide. That's 10% of global electricity usage.

Right data centers consume 2% of the world's electricity. Now, do how many data centers are there worldwide across cloud providers and the privately owned ones, less than 10,000 less than 10,000 entities are spending 2% of world's electricity to looks like a small number. But if you compare it, you know, it's an astounding number. It's an astounding number.

A very simple thing. If you leave a browser tab open, I'm sure all of us here are guilty as charged. We had like 30 or 40 browser tabs open at the same time, right? Leaving a single browser tab open for a duration of a time consumes more energy in producing a piece of newspaper. Right so so we cannot just, you know, be like an ostrich and put our head in the sand and say that it's not our problem and it's somebody else's problem because we are part of the problem. Right we have to be now also part of the solution. And that's where, you know, coming back to the question that I first asked, what is it that as technologists that we can do? So the first step is we need to start measuring, right? Because it's very true.

What we can't measure, we can't improve right out of sight, out of mind. So we need to start measuring the energy consumption of our applications. We need to start measuring the carbon emissions of our applications at various levels.

Right what is the network cost of it? Right what is the energy consumption at the database layer? How does our database query perform. And what is the energy consumption that it's making? Right? is it a full dB scan? Is it a partial dB scan? Right now? This is just hold on to that, Ted, and think about it that today all of you must be doing database query optimization, right? All of us are aware of that fact. But the hat that reveal why we do that activity is reducing the latency, right? Reducing the latency is at the top of our mind because we want to be able to serve our customers the results in a very fast fashion. Right now, the invitation is instead of just wearing the latency hat, also where the sustainability hat, right? Think about what is the energy consumption of this query.

Can I bring the energy consumption down in any way while I tune my query? Right simple examples. There are two microservices. They are heavily dependent on each other.

There is a lot of interaction, there is a lot of Data Exchange. All of these are energy consuming activities, right? Can I co-locate them or even best can I merge them? Right so now we use to think about microservices. I need the best possible slice according to domain driven design, but now I'm wearing a different hat. I'm thinking about domain driven design, but at the same time I'm also thinking about how can I minimize that Data Exchange over the wire. So as to reduce the emissions that are coming out of my application? Right so I don't think we are talking about doing any rocket science here. We are just talking about doing the same set of trade offs that you evaluate it from different lens.

Now to think from a sustainability lens as well, right? So the invitation to all of you is include sustainability in your value definition. Make sustainability a first class cross functional requirement. Right? when we talk about sustainability, right? Trade that off with accuracy, do you need 100% accurate results or can you do with, you know, approximate computing? So that's the idea, right? Like nothing new, but in your entire path to production, whatever optimizations that you are doing today, wearing different hats, thinking about performance, thinking about scalability, availability now also bring another elit into the system and say, I'm going to value sustainability and keep that as a first class, you know, NFR into my system. So whatever decision I'm taking on a daily basis, I'm thinking about it, right? I'm actively thinking about it. That's a very simple ask for all of us here right now. Having said that, you know, tech industry as a whole is definitely accelerating towards sustainability.

Right in the. Green Building Council has developed a green data center rating system, which allows and ensures that all the data centers are definitely working at some resource efficiency. Right? similarly, choosing cloud stacks, choosing cloud stacks and algorithms that are going to reduce the carbon emissions. Right technologists like us can leverage resources like green algorithms website. It's a fantastic website. You can go there, you can put your specifications on what is your hardware specifications, what are your lines of code? And it's going to give you what is the carbon emission for your particular piece of code in three months.

And that's a homework for all of you to figure out what the definition of three months is. Right and these algorithms help us find out that what is the carbon emission that I am accruing for a given logic. Right as I was mentioning earlier, can I trade off sustainability versus, you know, accuracy? Can I make do with approximate computing? If my business use case allows for having approximate answers, can I do that right instead of always going to what's the 100% accurate answer? Similarly in from an innovation standpoint, right, there are a lot of physical processes out there in the world, right. Like testing of cars.

Can I think about technologies like digital twin? Right and there is a lot of innovation that's waiting to be happen in this space so that I can reduce the waste in the physical world. Right, with the help of technology. And similarly, can I codify my carbon emissions as my fitness functions? Right I'm a big fan of evolutionary architectures. Right so any where any opportunity that I get that you should make information visible for the developers, you should make information one iterable for the developers.

And therefore, if again, we can keep our dashboard where we can see our carbon emissions right into our face. I think as a team it makes awareness and it urges us to act, right? And that's about it. But do how many trends that we have covered so far? Anybody knows the count seven. Exactly so in the interest of time, we are dropping trend eight, 9 and 10. And these were the three trends that we wanted to talk about.

Additionally, besides this, and I just want to do a special mention to superstock there. There's a trend over here, which talks about protocol over platforms, and that's exactly the concept on which, you know, NDC is built. Right? so I invite all of you to think about all of these remaining trends as well. And if you want to know more, catch us in the alley and we are happy to talk about it. And last but not the least, as Bernie said in the beginning, that these trends are not fetched. Right they are not something that's going to allow us in the future.

They are happening today. And I'm so hoping that when you go back from here, all of you can take something from here and start applying in your daily business. Right that's the ask and expectation from you all. Thank you.

2022-09-05

Show video