Technology Radar — preview webinar Vol 29 Eastern

Show video

- Hello, everyone. Welcome to the sneak peek Tech Radar webinar I'm Shraddha, your host moderator for this tech webinar, volume 29. This is a preview webinar. That means we will be discussing a few of the technology blips from the upcoming Tech Radar that we find quite interesting, from the Tech Radar which will be released on the 27th of September. What is the Tech Radar? The Radar is a document that sets out the changes that we think are currently interesting in software development, things in motion that we think our clients should pay attention to, and consider using in their projects. The Radar is our opinionated feedback about technology.

It's about tools, platforms, languages, and framework. This is built bottom up. And what that means is that experience from projects and feedback about these technologies are collected from the teams working across the globe in Thoughtworks. And that is 51 offices in 18 countries.

So we work with a wide and diverse range of clients. And our experience with these technologies are brought forward, and some of them make their way to the Radar. So we have with us today Scott and Silva as a panelist to give us a peek into some of the blips and themes of the next Tech Radar.

I invite Silva and Scott to please introduce yourselves. Silva, if you could go. - Hey, thanks, Shraddha. Hello, everyone. Silva here. I'm tech principal. I've been with Thoughtworks, India. And I'm a member of a Doppler group, which is responsible for putting this Radar out every six months. And for the day job, like you know I help my customers in North America in building products and solutions, especially in the emerging tech space. That's about me.

Scott, over to you. - Hi, I'm Scott Shaw. I'm lead technology in Thoughtworks, Asia-Pacific region. I've been part of this group, the Doppler group, that puts the Tech Radar together, for I think since the third issue, since the third volume.

And we're on volume 29 now. So it's been a little while. And it's the most fun part of my job. I love talking about this stuff and learning about it, too. - Awesome. And I'm Shraddha. I'm a data scientist and moving more towards AI research and reasoning aspects. I'm currently also playing the role of the global data community leader at Thoughtworks.

So about the rings of the Tech Radar, we have four rings, as you can see in the diagram, adopt, trial, assess, and hold. Adopt, these are technologies that we have had good experience with. And if it matches your use case, we strongly advise you adopt these and use them wherever applicable in our projects.

Of the trial, these are technologies that are worth pursuing. We have tried these in production. Projects that can handle any of the risks associated should try these technologies. Assess, these are technologies that we have evaluated and show some promise. However,

we may or may not have tried these in production. But they do show some promise and worth assessing. Hold, these we ask you to tread with caution. We have not had good experience with these technologies. Now for today's webinar, we will discuss two themes that are quite popular now, a large number of large language models and how productive is measuring productivity. Post that, we will also be discussing about a few blips that we find interesting. And these are related to vector databases,

securing the path to production and infrastructure security automation. So that's it. Let's start with our first theme for today's sneak peek, a large number of LLMs. Silva?

- Right. Thanks, Shraddha, again. So this may not be a surprise for many of you, if you're following what's happening in the tech landscape these days. So the modern breakthrough that we are witnessing in the AI space, and many of them are powered by these large language models, or short form LLMs. And there are many of those such large language models that we are seeing for like different purpose and built by different organizations out there.

And these are the ones that are powering OpenAI, ChatGPT, or Googlebot, or Meta's Llama or Amazon's Bedrock. And all these LLMs and the tools that are emerging in this space, and the products that are built on the space in using these tools and LLMs, they pretty much occupied a good amount of our discussion during our Radar blip collection, and many of them made into Radar as well. And we know these LLM tools are capable of solving many of interesting problems, especially in content generation, whether it is text, image, or voice, or even code generation and summarization and translation.

Especially in these areas, we are seeing interesting ways to solve problems using these models and tools. So we'll be seeing some of the blips about whether it is prompting techniques or how we can make utilization of these LLMs in a self-hosted fashion, or how are the databases that are doing tuned or tweaked for these products, like vector databases, how they are applicable and useful even today for building, you know, interesting applications. So, yeah, let's jump into some of the blips, starting with the prompting techniques. So the first

one is we are calling ReAct prompting. And this is in trial as a technique. So basically this is a technique which promises improvement in performance and reducing hallucination, which is a known problem in the LLM space. I'll not go deep into what hallucination is, but basically this technique allows us to operate in a fashion where the model itself reasons and takes a next action. And this happens in a continuous fashion, so that there is a guided path for arriving at the final result or the answer from the LLM. So this is useful, especially for plugging in external sources or knowledge bases during the reasoning and action phase.

And hence it can result in more reliable or factual responses from the LLM. So probably one particular thing that we'll call out, which is closer to this technique, is a recent announcement from OpenAI. They're calling it as a chat function calling. So basically it's a technique where it allows us to teach the model about a function call with its signature.

And the model can figure out the parameters for those function call. And it allows us as developers to plug in the response for that function call, so that way we have opportunities to inject those values which are factual, with factual content, during the phase of LLM, in finding the right answer. There is an excellent blog post from OpenAI about it.

I'll let you folks go and find it if you are interested in this space. Moving to the next one, like I mentioned, LLMs are finding space in a variety of use cases. But running a LLM needs a significant resource like GPU and CPU needs, right? And so that's where it may not be suitable for everyone in that fashion. And there has been a lot of attempts to make this LLM suitable for self-hosting, whether it is for privacy needs or very specific niche domain needs, or security needs, or even for trying it out in our developer workstations, even before we start making use of this LLM out in the wild. And the specific technique that helps in doing this is something called quantization, which is essentially a sampling mechanism, which allows us to reduce the number of parameters of the model, so that we can fit it in the constraint environment, like say the limited number of CPUs and memory that we have in a commodity hardware or a laptop, or sometimes even in a small form factor computer like Raspberry Pis. And there are, you know, frameworks or tools like GGML or Llama.cpp,

which helps to do the quantization for these models. And there are some interesting bindings like Python bindings, like C transformers, which allows us to rapidly build solutions, using the quantized models that are derived using these tools. OK, so let's move to the next section, which is, again, another technique which is related to the prompting. So we know this LLM is about-- the default interaction mechanism that is enabled for this LLM is through the language, the natural language, right? So where we can interact with it through like a chat interface. But at the same time,

we have been finding increased success, if we are able to prompt using structured languages like a JSON or EML, and as well as forcing or nudging the models to respond back in a structured manner, again as JSON or EML, or even Markdown for that matter. And, again, by doing this, it helps to again nudge that model to give the answers in the right direction, without letting it to diverge too much, as well as reduce hallucination again. At the same time there are other category of models, for example, the code generation models, which is not much about natural language but more about the language syntax, right? So even there, one thing that we have found interesting is like now even small comments, natural language comments, annotating a code or a function, helps models which are behind offerings like Copilot to respond with better results.

And so, yeah, two categories, but where we mix the natural language and the structured language in the right fashion, resulting in better results with these models. And moving to the next one, which is a technique about guiding models to be autonomous, and this is, again, an assess, something that we've been trying out in some interesting use cases. So basically it's a technique where we make the underlying model to understand the goal that we are trying to achieve, but let the model to figure out the way towards it. And so sometimes this is useful, where the problem space is wide open.

And there's not like a predefined path towards it, right? And there are multiple frameworks that are emerging in this space, like AutoGPT and GPT Engineer, or few frameworks which helps us to build applications where we can guide the various models that are out there. And this technique has been used by some of our teams for implementing a customer service or client service chat bots. And interesting bit is like through these autonomous inner steps, the models can be also made to recognize its limitation, where it cannot find a meaningful answer on its own. And for the use cases like the client service application, it can redirect the customer request to a human, so that it can be better addressed instead of the model trying to hallucinate on its own. But at the same time, this technique is still maturing. It is not something that is proven, and it is going through an evolution.

And mainly there are a lot of cases where it has met with high failure rate. So there are startups who are building products with this technique. There are startups which started with this but pivoted to something else. So we invite you to assess this with caution. So those are a small peek into the blips that we are highlighting in the LLM space right now. I'll stop with that,

and more details will be in the Radar when it is out. For the next part I think I'll hand it off to Scott to talk about productivity. - Well, I think before we go to the next theme, there is a question that we have on hallucinations and testing of LLMs. Are there techniques

or tools already for testing LLMs or oracles? - Yeah, OK, so testing or grading this LLM is again another broad topic where there are a lot of tools coming up in that space. But this testing is very different than test-driven development or unit testing for that matter, right? So because there is no deterministic answer for any of these problems, and the same model can respond with different results over the course of time. At the same time, there are tools emerging in space. And in fact,

some of those tools I think made it into some of the content, if not a build on its own. Anything that you want to add, Shraddha, or Scott, on top of it, feel free to do it. - I think we have Promptfoo is one of the blips on the Radar this time around, which is a testing tool for LLMs. So there's a variety of tests you can apply to completions. And

I'm interested in how well it works. I think there was a lot of skepticism in the group when we discussed that one. But they claim to get good results.

- Yeah, yeah, testing for LLMs is very different, even from the machine learning models, especially like supervised learning, where we know what the correct answer is. LLMs are generative models. So in that sense, it's interesting how this testing space will evolve, I guess. Cool, so there is a lot to discuss around the LLM space. And I'm sure this is going to be quite a theme that goes deeper into probably in the coming webinars as well in the next year.

But let's go to our next theme, which is how productive is measuring productivity? And Scott will guide us through this. SCOTT SHAW: OK, thanks, Shraddha, and I will probably touch on LLMs. I will, in this as well. So the idea of measuring productivity of developers is an idea that comes around perennially. I think it's often non-technical managers are frustrated by the fact that they find it difficult to measure whether-- or to compare one developer to another or to compare teams to one another. And I know one of the first things I did when I joined Thoughtworks was to help Martin Fowler analyze some of the coding samples that we asked our applicants to make, and to see if there's any correlation between lines of code and quality. And there wasn't, predictably.

Then, but we're in the midst of a wave of interest in developer productivity right now, some good and some kind of suspect. I think remote work, the need to cut costs, and the emergence of LLM coding assistants, these have all contributed to people's interest in being able to actually measure how productive software developers are. And one thing that's generated a lot of controversy and commentary and discussion is an article about a proprietary offering from McKinsey. The title of the article was provocatively named, Yes, you can measure developer productivity. So there were a lot of interesting things in that article, not all of which I agree with. But

I think if you want a really great response to that article, which was written by Kent Beck and Gergely Orosz, and talking about some of the risks, I think one of the problems that we have is that productivity-- we use the word in two different ways. In one sense, we say we know when we feel productive. We know when we've had a productive day. And this is kind of a qualitative use of the word. And then we talk about economic meaning of productivity, which is the amount of output produced by a given unit of work input. And that's easy to measure when you've got a very clear output and you've got a very clear unit of work.

And it's easy to measure, perhaps for things like sales teams or for some groups of people who are doing repeatable tasks. But it's much more-- to talk about the output of a developer, that's far removed from the impact that it's having, that that coding might be having on the business. And probably people don't really care about productivity. What they care about is what kind of impact developers are having on the business. And in fact, you might have very high output, like lines of code, commits, all these things that people might measure directly of developers, and you could have very high output but actually have very low impact on the business. And that's a risk.

And inevitably when we create these targets, we create these metrics of productivity, they become targets. And then developers learn to game them. You can make lots of small commits if you're being measured on pull requests, or there's a variety of things. It always ends up being a game. And so but I think we do actually want to measure the impact that engineering is having on businesses. And so I think we still need to try to measure engineering effectiveness.

And there's actually been a lot of research in this area, and at places like Google and Microsoft Research has produced-- that Uber, Etsy does, all of the big internet companies have been doing empirical studies on what makes developers productive. And what they've found is that it's conditions. It's not the developers themselves. It's the conditions in which those developers work. And those things we can measure. So

we know that conditions that create a state of flow, that allow them to achieve a state of flow, or that provide short feedback loops, or that reduce the cognitive load on developers, all contribute to productivity. And one of the interesting tools-- we should go forward on the slides. One of the interesting things that we've blipped, that we've had so far limited but very good experience with, is this product called DevEx 360 from a company called DX, where some of the people from Microsoft Research that did some of this research and found that these three dimensions of developer productivity were critical.

So DevEx 360, it's a way to cleverly collect information from developers about how their work is going, how is their flow going. How long did you have to wait for that build to take, and how long did you wait for-- it asks these questions throughout the development life cycle. And that gives you-- and then that's collected into a dashboard and gives you a real-time view of how effective and how well your engineers are working. Are the conditions good for those engineers to feel productive? So I think this is a much more effective way. It gives you leading indicators. It gives you a real-time view of productivity.

We'll go on to the next slide. Platform engineering is one way that we know we can improve the productivity of developers. And there's something called platform orchestration that's kind of emerging.

People are going beyond simply offering platforms as a service. And the idea is we'd like to be able to publish contracts for engineering platforms, be able to publish contracts and then have a self-service way for developers to submit code or workloads to the platform. And there are a variety of tools like Cratix or Humana Tech Platform Orchestrator, or KubeVela that operate on this principle, that have a published contract, receive configuration files, and then they orchestrate the provisioning and deployment and things that need to happen in the platform. So that's kind of a space to watch. But I can't really talk about this without talking about large language model coding assistance.

If we go on to the next slide, of course, Copilot is a big one. And the way we write the Radar, we use GitHub. So we all write it in our little pieces, those of us who contribute to it. And we combine, and we do that in GitHub so that we can collaborate easily and track versions. And many of us use Visual Studio code to do the writing.

Well, we had Copilot actually activated many of us, just because it was in Visual Studio code and we used it for coding. But it was activated on Markdown as well. So a number of us experienced Copilot while we were writing the Radar this time. And sometimes the sentence completions it gave me were really uncannily accurate. But the longer the paragraphs went on, the more nonsense it became. But so part of the Radar this time around, it was actually written by Copilot, sentences here and there. There are some other coding assistants that have risen up to challenge Copilot.

Tabnine is probably the one that's been around the longest. Of these, one of the things that these alternatives, Codeium and Tabnine do, is filter out some of the code in the training sets, the open source code that would potentially lead to copyright infringements. So the non-permissive licenses like GPL has been filtered out of those training sets. So that's one of the things. They also offer, Codeium offers a self-hosted version.

So there's some alternatives. I've also heard anecdotes that Codeium may be less intrusive. But that's probably up to your particular experience. One thing I wanted to say about this is I heard an anecdote just yesterday of a team that was puzzled, because we get reports of Copilot, say, doubling developer productivity. And there was a team using Copilot, and they found that their actual throughput was only increased by about 10% by using Copilot.

And this flies in the face of some of the other published data on Copilot and the level of productivity it offers. But what you have to consider is that coding is really a small part of the software, of the entire cycle of software delivery. Maybe estimates are like 25%, 30%, something like that.

And so if you say on the coding part of the work, I'm being twice as efficient, I'm still only increasing the overall throughput of my delivery team by about 10%. So I think we have to be realistic in our expectations of what these things can do. But I think there's tremendous potential in the entire software delivery life cycle.

And that's why we've put Kraftful up there. This is a large language model tool that helps user experience researchers draw insights from user research. And so I think those are the areas that we're really going to see the biggest impact of large language models, how we break down business requirements and task things out and so on. I think that's all I wanted to say at the moment. I think that's definitely a space to watch.

- Thanks, Scott. Absolutely, large language models are making their way everywhere. And I think you very rightly put it, coding is a small part of the whole software delivery cycle. And that's the part that people tend to miss out, not think about when thinking about how much productivity is being improved with these tools.

We do have a question. How are IDP-- and I'm not sure what the full form of IDP is over here. But how are IDP and developer experience in building the right understanding of the business value rather than productivity focus? - I think, if I'm understanding that question correctly, it's entirely possible that we could be very productive and not produce any business value. Measuring business value is difficult.

But I think if you read that article by Kent Beck and Gergely Orosz, actually there are things that you can measure. And there are examples that Gergely puts in there of what he's used. But things like, sometimes you're able to measure changes in conversion rates of customers, or revenue, and/or internal impacts, voice of the customer studies internally, and things like that. There are ways to measure business impact. And I think we should be focusing on those over how many lines of code or how many commits or how active somebody is in the code base.

- Now let's move on to the next. We finished with the two themes that we had. And we have a few blips to discuss which we have grouped together. We have the next one on vector databases. And Silva will walk us through it. SILVA KATEIAN: Well, vector databases does have some association with the LLMs, but yes. So we saw like how LLMs in general, what are the different techniques and different ways to run them, et cetera, and how it is making an impact in different kinds of applications, as well as in software development itself.

These vector databases, again, it's not new. These databases have been in existence as a concept for a long time. But thanks to the rise of AI and LLMs, there are a lot of these databases. And their usages, we are finding it interesting. So we have picked like three of those vector databases. But maybe I'll talk about how exactly these databases operate, or work, right? So basically vectors are nothing but a brief summary of like what really are those vectors, like basically you take the content, and the content again can be anything, like product attributes, in case of retail product, kind of application, or a text blob, or an image, or anything.

And we apply a transformation or embedding function on those data. And we result in these vectors. And these embedding function themselves are the actual models, the ML models that are being applied on this. But this vector representation has some interesting usages, right? So these vector representations allow us to find similarity such as the nearest neighbor or the distance between two content that we want to compare, right? And traditionally, for comparing, or doing these kind of operations, we have to write a piece of code, most often in Python or it can be done in different languages.

You have to write a piece of code to do these comparisons. And now, with these databases, it allows us to abstract that and to use known syntaxes like SQL in case of Pgvector or through known APIs or defined APIs. So basically these vector databases act as like the long-term memory for these AI models, for the particular content that we want to process with this model. And these three database themselves are very different, though. Like first one, if you take Chroma as an example, it's an open source vector database that's written in Python, probably.

But it has got bindings for Python and JavaScript or TypeScript. And it is useful, it can run in like two modes, either as an embedded mode in your Python application, like SQLite, or in a standalone mode like a server, like where you can connect from multiple apps or multiple processes. And it allows us to store the data and the vector for the data, so that we can perform these operations in the database itself. Pgvector is a interesting extension, Postgres extension, which does the same. But since it being in a Postgres extension, it allows us to keep the actual data and its vectors in the same place so that we don't have to carry the data across like multiple stores.

But since it is in a known same place, it allows us to perform these operations with SQL, which is a known syntax, as well as query the result, which carries both a distance or similarity, as well as the actual data itself, whereas if you are looking for an option, which is hosted or managed, Pinecone is the answer there. It is a cloud native vector database that is fully managed for you by this company behind Pinecone. And it helps us to do the same operations that we discussed. So essentially, all of these databases really make it easy for us to find those similarity searches and retrieve data based on the vector distances, and simplify the part of building applications using the model and the operations that we do with the content on the model. And these products and these tools are available today for us to make use of it in our applications.

And some of our teams are exploring these, and it increases the pace in which we can build applications using the models. - I do see your question, Vimal, maybe this is something we can come towards the end of the session, because it's a little on the reasoning abilities of LLM. So before we do like a general Q&A, let's go through the blips that we have. The next one is securing the path to production. Scott, if you can walk us through this.

SCOTT SHAW: We always have a number of blips related to security, we find in the process of developing the Radar. We have to be very selective, as we had something like 350 items that were input to us. And we only have room for about 100. But because security is so important and it's something that we often need to call people's attention to, we feel, because we see so many abuses in the field, we always end up with quite a few security blips. And I think it's a little bit of a boost if it's security-related. So if we go on to these, I wanted to talk about a few of them, security-related blips that are on the Radar this time around.

And the first one is a re-blip, actually. It's one we-- this is the second time. Because we feel like it's such an important topic, it's the second time I think we've had this on the radar. It's this concept of Zero trust security for CI/CD, for your continuous integration and continuous delivery pipelines.

What we've noticed is that there's been a big shift in the way that people, the build systems for software have moved from self-hosted things on prem, which would have been dominant even maybe five or six years ago, to almost entirely people are using cloud-hosted pipelines as a service now. So services like Buildkite, Circle CI, have become very popular. And so most people are using these services for lots of really good reasons. And of course, GitHub Actions is the big one there that has really changed the way that people construct their build pipelines.

But what we found is that they're less attentive perhaps, when they port these things over from on prem to the cloud, to a cloud-hosted service, are maybe less attentive to the security aspects of their build pipelines than they should be. So build systems make a huge attack surface, because if there's plenty of things to be compromised, and if a compromise occurs, that's an opportunity to put malicious software into deployables that could go anywhere. I think SolarWinds was the poster child for that happening. And it woke a lot of

people up to the importance of this. So when we talk about zero trust security for our CI/CD pipelines, we mean allowing only as much privilege as you absolutely need to, and trusting your network, your endpoints, your external services, trusting all of these things only to the extent that you need to do the work that needs to get done. Some of the things would be having ephemeral task runners, so rather than having a process that's long-running, that runs a task over and over when the pipeline asks it to, you would spin up a container or an instance just to run that task and then destroy it afterwards. And if you're doing that, it ensures that you're starting from a secure starting point every time, and any compromise will be destroyed as along with those instances every single time. That's a good practice for security in general. You might need to monitor and apply security patches to your build system, just as much as you would your production system.

Your build system is a production system and you need to treat it that way, so making sure that you apply those patches as soon as they're available. And one big thing is running your build processes under a specific identity. So rather than having a process with a tremendous amount of privilege that can invoke actions on other systems, you should run your build pipeline with a specific identity assigned to that build pipeline.

And then you can protect the resources it needs to access based on role-based authorization. So it accesses those things under a particular role. And then we're able to control the privilege that it has and prevent any escalation of privilege that might occur, which brings up the next flip, so using OpenID Connect for GitHub actions. Now, this seems this is a way to provide that identity. So this seems like an obvious thing. And it's not a

particularly new thing. GitHub offers this, a way to do federated identity for your build tasks using OpenID Connect. But it's something that we find people aren't taking advantage of. There's three ways, three things that people do typically. One is, if you have to access some cloud resources from a build task, they just put those cloud resources on the open end.

That's the easiest thing, quick, expedient, and tremendously insecure, yet I see people doing it. So you need to protect those somehow. So the other thing is that they provide the build task with secrets necessary to log in and access those cloud resources. Again, having access to those secrets, having those secrets stored with the build, presents a huge security risk. And we advise people not to do that.

The third alternative is that you can do federated identity. And you can give that pipeline an identity that is recognized by the cloud environment, by your cloud account, so that it can access under a known identity, under a known identity with a particular role those external cloud resources it needs to access. And GitHub supports this natively. But we felt like it's necessary to call attention to that, because we don't see people using it nearly enough. Next slide, a loose categorization of these tools that are on the Radar. CDK-Nag,

people have used Tfsec for a long time to do static security scanning of their Terraform files. It's based on-- it uses Rego, which is a thing Open Policy Agent folks developed this way of expressing constraints and security rules for infrastructure, using the Rego language. And Tfsec implements that, but just for Terraform. CDK-Nag is a relatively new static scanning tool that looks at cloud formation and CDK files, and not just the Terraform files. So it opens this up, this technique of statically scanning your infrastructure as code, up to a wider range of ways of specifying infrastructure as code. One of the cool things is CDK-Nag comes with some predefined sets of rules for specific security compliance regimes. So PCI

compliance, HIPAA compliance, and NIST compliance. There are rule sets that come packaged with this tool that allow you to scan for those particular things. So that's really interesting, and also the AWS solutions library.

There are specific rule sets for some of the standard AWS solutions that AWS recommends people use. Chekhov is a similar tool. It takes Tfsec to another level, and it looks across your entire infrastructure estate, so it will look at helm charts, Kubernetes manifests, look at your cloud formation, and as well as the Terraform. And you can write custom rules in Python, which is pretty cool. I just wanted to mention Orca and Wiz. Our teams, we have both of these on the Radar. We have teams experienced with both of them.

And they both get pretty good reviews from our developers. These are tools that scan your infrastructure in place. So they scan your cloud infrastructure, and they look at your containers, they look at your Kubernetes clusters.

They look at the whole thing, not in an agent-based way, but as a privileged process that runs in your infrastructure. And they produce really useful reports and dashboards and alerts and things. So these are very similar tools. They compete head-to-head. And I don't want to get into the fact that one of them has sued the other one for copyright infringement. But we've had great experiences with both of them. And it represents a new era, I think, in infrastructure security scanning.

That's all I had in this segment. So we can probably do questions. - Yeah, sure. So question for this segment will probably come in, going to the previous security one. We do have a question from Santiago, with the zero trust security, how do companies in the real world approach things like K3's recommended install method to be called-- you can watch actually the command in the Q&A, curl URL bash.

Recently in our QA at SUSC, we raised this concern. But I'd like to know more about how does it look like in the real world. So as they are testing the OS, not the containers part. - I understand, yeah. This is a huge problem, I think, in that, yeah, I'd be interested in hearing what Silva has to say too. But I think this is something that we're constantly alert, trying to be aware of people accessing the internet from privileged or internal networks. And

you need to do that to do your job, but you have to be very careful about what you are accessing. And one of the things that we talked about was using checksums. Like sometimes people do this but they don't actually do-- they use the checksum verification in downloading packages. So I mean, there are multiple things that could go wrong here. One is that you're accessing the wrong URL or that you're accessing the man in the middle instead of the thing that you think you're actually downloading. And so you do need to check those things,

verify them against the checksum that you got by some other channel. But there's a lot of internet potential security problems with using tools that you find on the internet, and using internet-based tools. Do you have any experiences with that, Silva? - Yeah, Scott, actually, you might remember this. In fact, in the Radar discussion, we debated about mentioning the blind wall checksum with SHA256 as a reminder for people that, hey, it is still relevant and you'll have to still do this even in 2023. And forgetting this can cause a dramatic impact. But, yeah, I mean, plus one to what Scott mentioned.

So it's a reminder to all the developers and techies out there, who are doing this. While this helps to quickly get started, for that in all the small experience that you have, when you want to try this tool out, and that's where probably many product companies suggest this for getting started quickly, at the same time I wish like everyone finds the trade-off of doing this and uses that with caution. There are some techniques like verifying signature of the binary, especially in Windows world, and similar things happening in the Android world. But in Linux world, I think the signature verification mechanism is not like an established standard. But I think simple techniques like checksum, and making sure-- or even things like scripting things in such a way that things can be verified, and using techniques that result in deterministic builds. Like this simple curl bash is one single thing that can put your ability to arrive at a deterministic artifact for a toss, right? Now you use this in your build scripts, that means you never know when things are going to change and what you will end up with. So that's why,

while it is OK for people to try it in a local workstation where they understand the trade-offs, I think we should completely avoid this in CI or any build automation steps, and use techniques where we can result in deterministic builds, and also verify how we arrived at every step. - Great, and following up on the security part, in your experience, have you seen companies and customers doing actively these checks before shipping to production? - We probably do always. I would say we do scans, we do SAST, DAST scanning on code when we're doing it. Does everyone else do it? I think probably not nearly enough.

Probably for those who are working in regulated industries, financial services, or if you're working for particularly the US government, where software builds of material are required, then you're paying more attention to your supply chain security, probably. And you're doing cryptographically verifiable attestations that these scans and things have been run before you can actually ship the software. But in unregulated industries it's probably not as common. - Great. Thanks, Scott, for that. So Vimal asks, there are different techniques to improve reasoning abilities of LLM, like chain of thought prompting, tree of thoughts prompting, internal monologue, et cetera. Based on your experience, any thoughts on what techniques are most effective for improving the reasoning ability of an LLM? - Go ahead, Shraddha, I think please go ahead with your thoughts on this.

I'll add if I have. - Good, OK, So yeah, LLMs are generative models. And recently that I have been researching on is actually the reasoning capabilities of AI. And where this stems from is the fact that LLM is essentially a generative model.

So it kind of tries to predict what is the next most frequent word or likely word to come in. And what I find is even the reasoning abilities that LLMs have. So I think of these as reasoning by analogy. So one has to train the LLM with how the reasoning process needs to work.

And then we hope that the LLM will pick this up and be able to reason in a similar manner. And that may definitely work for quite a few applications. Where I did find it lacking are critical applications such as in medicine, maybe in banking and finance, where a concrete reasoning of thought is required, which is in the sense that a cause and effect reasoning, this has happened because of this. And there is like a concrete path. And I think in order to do that, to improve the reasoning ability of LLM in that segment, the LLMs probably need to be combined with the other area of AI, which is knowledge representation and reasoning, more in terms of logical reasoning.

But I think the other reasoning abilities that you mentioned, like chain of thoughts prompting tree of thoughts, in the creative space or for example, in software development lifecycle, where we are interacting with the LLM. I think this is the space where these techniques can help in improving the reasoning abilities. Yeah, Silva, do you want to add? - No, I think that's fantastic. Nothing more to add. Thanks for that, Shraddha. All right, I think with that, we have come to the end of the webinar today.

- So thank you, everyone, for joining in, and thank you to the panelists for your insights on the technologies that we discussed. The Tech Radar will be out on the 27th of September, so do look out for that. And that is all for this sneak peek. Have a good day, everyone. everyone.

2023-10-02

Show video