Tech Radar Vol 30 — Western Webinar

Show video

- Welcome, everyone, to the sneak peek of the 30th volume of the Technology Radar. I'm Brandon Cook. I'm a principal technologist here at Thoughtworks, and we have two speakers who are here along with me. Mike, you want to introduce yourself? - Sure, yeah, I'm happy to. My name is Mike Mason. I'm the chief officer at Thoughtworks. I've worked on the Radar since we did the first edition, and I'm always excited to do that and to talk about the stuff that we're seeing in the tech industry.

- I'm James Lewis. I've been working with Mike, actually, on the Radar for many, many years now-- over 10 years. I'm a technical director. I'm based out of London in the UK. So nice to see everyone on this webinar.

- And yeah, just welcome again to the Radar, a sneak peek of the Radar. The Radar for me, personally, has been a great tool at my time at Thoughtworks, just being able to learn about all the different technologies that we are using across the globe. And what's great about the radar for me is that it's really grounded in what our teams are actively using and actually experiencing.

And basically, it's a great knowledge-sharing tool for both all of us internally, as well as for a great landscape of all of the technologies that we are using, as well as understanding how we can gauge which tools we should be using and techniques we should be practicing on our projects. So without further ado, I'll hand it over to our speakers to start talking about the different blips and themes that came across during the creation of the radar this go-round. - It's probably not a surprise to everyone who's joined us that this edition of the Tech Radar is fairly heavy on AI, on Gen AI, AI in general, LLMs, techniques, and tooling platforms associated with AI. So we thought we'd bring a few interesting things along to this sneak preview. And I'm going to start with something that's pretty close to my heart, actually, which is this idea of using new what.

New techniques can we use in order to solve some of the big problems that we've got as an industry? And I would say, for example, legacy modernization and legacy displacement is one of those big problems, right? And so the first bit we'd like to talk about or introduce is this idea of using Gen AI to understand legacy code bases. Now, I'll give you some flavor of a recent conversation I had with a client in Italy, where they're talking about migrating off their mainframe. And when we used to talk about monoliths back in the day, we used to talk about 2.5 million lines of code, right-- Java code.

And they're talking about 250 million lines of COBOL, right? So how do you even start as a developer, or as a team, to understand that-- understand the seams in that code base and how to start identifying bits or parts of that code base that you can break up? This is one of the techniques we've identified. There's a bunch of tools that we've sort of-- we've taken a look at internally that we're trialing, which we call out. So for example, there's one called GroupAi, which we've had some good experiences with so far, and another, DriverAI. But again, like everything in this space, it's so completely new at the moment. We're exploring this, but very excited about some of the results. - And I think this is one where the underlying tech is going to keep improving and allow us to do more and more stuff.

So one of the things associated with large language models is the context window, size, and I remember early on, I think it was Claude had a 100,000-token context window. And that was seen as-- because you can now put like entire books into that and then get an LLM to give you something useful, we're starting to see that token count crank upwards for the code-specific LLMs. And so that's going to give us the ability to interpret bigger and bigger chunks of code base and to use useful AI to start to point us in the right direction. So I think this is one where, as well, you know, the underlying tech improvements are going to be a big deal as we go on. - Yeah, absolutely. And I think it's-- I mean, as I say, I've got a personal interest in this.

I'm currently writing a lot about the subject of breaking up legacy systems on MartinFowler.com. And so any help, actually, we can get to actually to solve this problem, I think, is going to be massive going forward. So yeah, super cool. All right, the next thing I'm going to introduce, again, is from the world of LLMs and then Gen AI.

This is the idea of LLM-powered autonomous agents. And the reason this is close to my heart is actually, I've been playing around with agent-based models for a number of years now. If you've seen any of the talks I've been doing recently publicly, I talk a lot about how you can use agent-based models to simulate the real world, to ask interesting questions about the real world and then use these models to reason about things, which is one aspect of agents and using agents in a traditional sort of AI world. Of course, what we're talking about now with LLM-powered autonomous agents is like a step forward, right? I mean, for me, this is almost a science fiction view of the future. But actually, we have the ability to use this sort of science fiction-type technology today. So what are LLM-powered autonomous agents? Well, there are a number of frameworks out there.

So there's a couple from Microsoft. There's one called, I think, AutoGPT is from-- let me just check. So AutoGPT isn't from Microsoft, but there is one called AutoGen from Microsoft, and they also create CrewAI for Microsoft.

And the idea is that rather than use an LLM directly to solve problems or to address problems, what you do instead is you create networks of agents that communicate with one another to actually collaboratively work together to solve problems. So as I say, this is almost like a science fiction view of the future today, where we can create these small, highly-targeted agents that can do one thing. And then, they can use other agents to do other things as well. I think this is actually a fascinating view of where things might be going where we can create these almost ephemeral pieces of technology that can go off and do stuff and then, when they've accomplished tasks, they go away. Mike, I know we talked a lot about this in New York. So-- - Yeah, I mean, I think what these agents do can range from the mundane to the sublime.

And as you get more advanced and ambitious, the risk level goes up because, if you've got an agent-based system and it has the ability to take action on its own, then you need to be pretty sure that that thing is going to do the right stuff. So for example, Amazon towards the end of last year released a bunch of agent-based stuff on their Bedrock platform, and that included interesting features about how you would go about verifying and debugging and testing an agent before kind of unleashing it on the world. And I think agents are interesting because we do a lot of stuff, and this gets back to legacy a little bit. There's a lot of manual workarounds in the software world, right? I mean, I remember working for a telecommunication client where they had a lot-- they would do a cable, cable TV. So this was back in the cable TV days-- so cable TV, home internet, home phone kind of stuff. And they had three different systems that were running all of those.

And occasionally, the systems would get out of sync. And in order to fix a customer problem, you would kind go into the mainframe and de-provision a thing, wait a couple of minutes, and then re provision it, and then go and check whether that had solved the problem. It wasn't fixing the underlying data synchronization or whatever was causing the problem, but it was this workaround that call center reps would manually go and do when a customer called up with that kind of a problem. Today, you could think about an agent-based approach to that where you had agents that were actively going out into your systems and looking for known problems like that, and then taking some action autonomously so that you don't even need to wait for the customer to phone up and say, hey, my stuff is not working, and then have a human call center op fix that. So I don't know. I think agents are going to be really interesting.

We're going to start having agents on our phones. We already do to a certain extent with all the digital assistants, but those are going to become more and more and more rich as we go on. And then, the next one, I guess. And this is one of-- from the cautionary section of the Tech Radar, the Hold ring, which is overenthusiastic LLM use. We often see on the radar, as certain technologies, or tools, or practices, or platforms, or whatever languages and frameworks become popular, that immediately, everyone jumps on the hype train.

Everyone jumps on the bandwagon. It's probably no news to anyone on this webinar that that's the case. And certainly, we're starting to see that happening with LLMs. Now, I guess the reason we wanted to call this out more than anything is that there's a whole bunch of really interesting other machine learning and LLP-type techniques we can use that don't require jumping straight into using large language models or jumping straight into GenAI use. And actually, being sensible and being sort of careful about how we plug the different the different tools and the different techniques together can often give us better results overall than just trying to hit everything with the LLM hammer. I still think we should have had hitting everything with the LLM hammer as the title for this, but I was overruled in the meeting, you know? - Well, I think you can see why this is happening, right? Because-- let's be fair to the majority of developers in the world, and I include myself in this category-- is that we're not-- most people are not deep machine learning or AI experts.

And suddenly, this LLM stuff came along, and it was actually-- it became remarkably easy to start coding useful stuff against an LLM's API or whatever. And you could quite easily do a proof of concept on something that was AI-ish and that was going to provide some value, and solve a problem, and all that kind of thing. And it's one thing getting that working on top of an LLM. It's quite another to decide, OK, then we're going to deploy that thing to production. So one of the things, I think, is a sort of classic example is sentiment analysis, right? This is not a new field, sentiment analysis, but I think people are newly accessing the ability to do that through LLMs.

But if you look at it, using an LLM for sentiment analysis is 10,000 times more expensive than it needs to be because this is an established well-known technique within the machine learning sphere. So what we're calling out here is not that people shouldn't get enthusiastic about LLMs, but that you should temper that enthusiasm with reality and be a little bit careful before rolling to the next stage of productionizing and scaling something that started off as a proof of concept on an LLM. OK, so the next thing we're going to talk about, we're going to take a slight sidestep. Although actually, there is a question I just noticed in the Q&A, Mike. Do you want to take that one, just to put you on the spot? The basic question, what's the difference between an LLM and GenAI? That's a great question. So an LLM is a form of generative AI.

So generative AI is the basic category of new AI models that generate something. We're all familiar with the text-based models. That was what ChatGPT was doing, is it's generating text in response to your prompts to it. So an LLM is a form of generative AI, but there are other forms of generative AI as well that are interesting.

It's not just the text-based ones. The other thing is, the large language models are evolving to be multi-modal now. And "multimodal" means that they can cope with both text input and things like image input, video, can create video, can create images-- all that kind of thing.

Each of those models kind of varies in their architecture and their cost to train and run that model. But yeah, basically, that's what's happening here, is that the AI is generating something based on, basically, millions or billions of examples that it has been shown and trained on and then produces similar kinds of things. So it's actually interesting. When you're chatting with ChatGPT and you're asking a question, it doesn't actually know the answer to the question.

It knows what an answer to the question would look like, which is actually fairly useful right like and might be quite closely aligned to an actual answer to the question. But we should be a little bit careful about not assigning too much intelligence to these things, because they are just kind of spitting out things that look the right answer. But then, often, something that looks the right answer is good enough and useful enough for you to use going forward. I think there's some amazing-- I mean, it's hard to keep up with everything, right? In multimodal, especially, when you're seeing things like being able to generate high-definition video of a character playing Minecraft when there's no actual game there or anything other than just a prompt generating frames that look a bit like Minecraft, it's kind of-- it starts to get a bit mind-blowing at a certain point, right? OK, so the next thing we're going to call out-- and I guess this is kind of a theme for me, which it didn't actually-- well, I won't say more than that.

But it's a sort of theme we talked a lot about during the week when we put the radar together about recent advances in infrastructure as code, different techniques within infrastructure as code, different tools or new tools that are appearing to solve some of the perceived downsides to the current tooling that's out there. And so we're calling out in this slide that there are a number of these of these tools and platforms appearing-- which are taking us away from just having to just-- everything with the Terraform hammer, say, right? So Pulumi, for example, this is basically infrastructure as code, where you're writing actual imperative code to generate your infrastructure. So you've got access to all the techniques that you would normally use when you're writing code.

Wingiang, again, is another fairly new framework or tool which allows you to write code which is then turned into Cloudformation, or Terraform, or something else. But also, it has this split between being able to define your infrastructure in imperative code and then also execute and then define runtime behavior in that code as well. And they have this interesting split between kind of a run-once thing, which sets up your infrastructure, and then runtime behaviors which can run when, say, you know, functions are executed in the cloud. So that's another thing. And then, OpenTofu.

And OpenTofu was brought up, and there's a question in chat, which is, it's GA. Why it's only in Assess? Well, OpenTofu is a fork, essentially, of Terraform after there's been some controversy-- which we won't really go into in this, on this call, on this webinar-- but some controversy around some changes, potential changes to Terraform's licensing. So OpenTofu is a fork which I think is compatible with-- I think it's at 1.6 at the moment and is bringing in, starting to bring in some new features that are kind of paid-for features with Terraform. It's in Assess mainly because we don't have production experience of OpenTofu yet. But we also have some concerns around it because whilst it's a fork, and whilst it's being forked for the right reasons, often, when you see these things happening in the open source world, the longevity of the fork, of OpenTofu, is still up for question.

It's very dependent at the moment on a fairly small number of maintainers of OpenTofu. So our recommendation or advice around it is to keep an eye on it, absolutely, as an alternative to Terraform, and then think about it in the future if you do need to migrate away from Terraform for whichever reason. So that's Advancing Infrastructure as Code.

We also have in this section-- which we haven't got here-- we've got System Initiative, which is another really interesting, almost sort of visual, Smalltalk-like, image-based platform for creating infrastructure as code. So that's another interesting one to look out for. Mike? - Hey, James, can you-- there were a couple of comments in the chat about the difference between Trial and Assess. And you spoke about it a tiny bit there with the fact that OpenTofu is GA, so why is it not trial? But maybe more explicitly just handle that Trial versus Assess question? - Sure, of course, Yeah.

So we have the four rings on the Radar. So we have Hold, we have Assess, Trial, and Adopt. And we have pretty much distinctions around where items, techniques, platforms, tools, languages, and frameworks sit. So briefly, Hold is for things that we're either kind of unsure are ready for the mainstream or we're actively worried about, essentially-- things like we talked about overenthusiastic LLM used stored procedures-- logic and stored procedures is the canonical example of that.

But, then we have Assess and Trial. And the barrier to getting into Trial for us is that we have active production use of this thing inside Thoughtworks. Obviously, it's a technique. It's a bit blurry. But really, it's about our teams actually having experience, and then putting their hands up and saying, we're using this actively in production, and we like it. And we think it's a recommendation. It's actually a pretty strong recommendation for us, Trial, eh, Mike? - Yeah, I mean, trial is a pretty strong recommendation.

Adopt is, obviously, very strong recommendation. Assess, we're happier to just kind of say, hey, this is new. You should take a look at it. We're not giving a strong recommendation there.

And that whole how much experience have we really got with it is kind of one of the key deciding factors there, because we would rather make sure that our teams on the ground have really used something and kind of learned about what it's good or not good for before we put it on the Radar-- or certainly before we put it in Trial. - So this is the James Show at the moment. So I'm going to finish off my little bit of this by talking about another item, another blip, which is Infrastructure Orchestration Platforms. Again, this is showing that there is continued innovation and adoption of new ideas and new technologies in the infrastructure as code or in the infrastructure space. So in the last Radar, we talked a bit about platform orchestration.

And we talked a bit about tools like [INAUDIBLE] and Aquatics and a few others which operates at the level of platform and platform engineering teams providing contracts, if you like, between developers who are using those platforms and between the platform engineers providing features to development teams. Well, this is a slightly lower-level idea. So infrastructure orchestration platforms, this sort sits underneath that. These, you have build tools around things like Terragrunt and Terraspace.

You have things like Terraform Cloud. You have Pulumi Cloud. You have things like M0 and Spacelift. So there's a number of tooling, or tools and infrastructure platforms if you like becoming available, that solves the problem, essentially, of, I've got a giant ball of Bash scripts tying all this infrastructure came together. How do I solve that problem? I mean, to me, it's amazing. Back in 2012, I had a whole bunch of Bash scripts tying some Ansible together, right? Why haven't we solved that problem yet? And this is kind of what this, the idea of infrastructure orchestration platforms, is about.

Cool, so one of the themes that we do for each Radar-- so the radar has like 100 blips on it, and we also include 4 or 5' themes to try to tie together all the blips so that if you're new to the Radar, or if you're trying to get a sense of what's going on you, don't have to read every single individual blip to get a sense of what we think is important. For this upcoming edition of the Radar, one of the themes is AI-assisted software development teams. OK, that's a lot of words. What do we really mean by that? That's about holistic use of AI across the entire team for building software, and how that changes aspects of software development. We talked about a whole bunch of tools while we were building the Radar, including things like AI-assisted terminals-- there's a piece of software called Warp that does that-- the emerging ability to turn a screenshot into code, or at least into basic kind of JavaScript and CSS. We talked a bit about ChatOps backed by LLMs, where you can have a chat with a bot that would then do some useful deployment activities and stuff like that.

So the theme here for us is that we think all aspects of software development can be improved by pragmatic use of AI tooling. This is an area that we're actively investing in ourselves-- I'll talk a bit more about that later-- and we think this is-- everybody is looking at it because think in the tech industry, when somebody like GitHub comes out and says, hey, you can code stuff 55% faster using our tool, we, obviously, see a lot of things wrong with that metric. But it does get people's attention, and it does get people asking questions, how can we use AI to improve the way we build software? So that's really what this theme is about. And I think at the same time as an excitement around AI for software development, there's also risks that people need to take into account and be mindful of, especially to do with software quality, security. People, unfortunately, have a tendency to engage AI, disengage brain, and that's bad. So there's a bunch of pitfalls that you need to worry about.

So that's the theme, and we've got a few specific items within that theme. The first one I'll talk about is AI Team Assistants. So there's a lot of focus on coding tools and individual productivity coding tools. I think part of the reason that there's a focus on that is, if you're not a developer, I think the act of writing code is fairly magical. And so if you've got AI doing that, you've got magical AI producing magical code. It's like magic squared.

It's very easy to get people excited about that. And I think also, again, if you're not a developer, the perception is writing lines of code is the hard thing. And so I think we all know that's not the hardest part of software development, but that's the thing that these AI tools can magically help with and then help us spit out reams of code, which is why there's been all this focus on individual coding tools. We think what's more interesting than this is team assistance. So these are AI systems that are designed to help a team do a whole bunch of your team's activities better. So what we're doing is combining prompts and knowledge sources with a thin wrapper around a large language model in order to progress and help the team do stuff.

So one example of that is story writing. So if you have a requirement and you need to turn that into a specification, within Agile, we call that Stories. And then, a developer can pick up a story and actually work on it. But an analyst is the person turning that short description of a piece of functionality into a story. We built something so that you can have a conversation with an AI.

The analyst has a conversation with the AI about what the story functionality should really be in a sort of a conversational back-and-forth, digging into the story, asking you about different aspects of it. And then, at a certain point, the AI will offer, hey, I think I know enough about this piece of functionality to actually write you a story. And it will spit out an embellished story, as well as a set of acceptance criteria in given, when, then style-- which is super useful because you can take that and turn that into an automated acceptance test later on, which is, again a thing that we would advocate for. That ability is based largely on something that we call the reusable prompt or the team knowledge, where you bake in into this system a description of what kind of software you're building and any kind of context that helps that discussion with an AI actually do the right thing.

We've also done this for threat modeling. So you talk about the a little bit more about the technical aspects of what you're building, and then you can have a threat modeling conversation with an AI. And all of this stuff is not to say the AI-produced result is the final answer, or is a better answer than a human could have got to, but it's useful, right? Like I was talking about earlier, it's not, maybe, the exact answer to what is the right threat modeling approach to use, but it sure as heck looks like a reasonable answer.

And you can then, as somebody who knows about threat modeling, you can build on that and improve that. So and we've built a tool on this. We're going to be open sourcing it shortly.

It's called Team AI at the moment, although we likely have to change the name because there's already a Team AI out there. And, of course, open sourcing something, we want to make sure the code actually looks good underneath that and we don't have any problems there. But this is a really interesting approach. Doesn't require a ton of development work. Can just be a thin wrapper on top of the large language model of your choice. - Yeah, I think for me, this is one of the most exciting things that's actually happening at the moment internally for us, is how can we take-- because it's one of these questions, right? We've got this incredible institutional knowledge in Thoughtworks about what good software looks like, how to build good software, how to build maintainable software.

The techniques that we've advanced around that-- around things. Which aren't just writing code, which is around software architecture and design. It's around better practices and all these different things-- how you write better stories so developers are more productive, and all these things. And if you can use Gen AI in a holistic way around all of this, then I think going to be game-changing, actually, for software development in general.

So yeah, I'm super excited about this, about open-sourcing this. - Yeah, and to be clear, I don't think we've solved all the issues with it. And we're not saying AI is going to replace developers.

And we'll get to that a little bit later, I think. But it's definitely a useful tool. And I think the history of software engineering is applying the latest tools to the craft of building software and then doing a better job of building software. And we think can help there. We're definitely excited about it. OK, then the next blip on the radar is something called NeMo Guardrails. So that's a toolkit.

It's an open source toolkit from NVIDIA allowing you to put guardrails around large language models when you're building conversational apps on top. So that's important because, as we've all seen in the news, there's concerns about factual accuracy on LLMs. And Air Canada just recently had to had to pay out because a chatbot had invented a bereavement policy. And Air Canada's defense was, well, we're not responsible for the chatbot, or, actually, anything our customer service people say, or anything on our website-- which is like, to me, a slightly implausible defense. But whatever-- you are responsible for the stuff that your chatbot puts out, it turns out.

And so you need guardrails. Actually, the most recent example is Amazon had a product description, like, little chat interface in their iPhone app a couple of weeks ago, and you could immediately get it to go off the rails. So one of my colleagues posted some screenshots where people have been asking it about shelf-stable compounds for breaking rocks in their back garden, and it was giving you recipes for dynamite or something like that within the Amazon Chat application. And so these things can go off the rails quite easily, which is why you need something like NeMo guardrails.

And it allows you to put these programmable guardrails between your application and the underlying LLM or multiple LLMs that it's using in order to be able to say, I want to steer this conversation path. I want to detect when we're going into malware territory-- all of that kind of thing. It's actually, there's an LLM vulnerability scanner called Garak that this thing, NeMo, is tested against.

Garak is a vulnerability scanner kind of like Nmap is for networks, but this is vulnerability scanning for LLMs. So pretty fascinating. Helps you stop accidentally leaking your training data or your instructions. Can help with hallucination, that kind of thing. - I think for me, this is one of those-- when I started looking into the guardrails, it was, for me, one of those really, like, light bulb moments.

Not because of what it's trying to do, because, obviously, we need guardrails and tests around a bunch-- like either as user input or whatever it is, right? And self-checking output, self-checking, whether you should execute the next step in a multi-agent process, whatever it is. Obviously, this needs to happen. What I found game-changing with this is, it really made me think about, actually, how it works. I'm sorry about my phone ringing. That was really, really rude-- about how it actually works underneath. Because when I think about writing tests or putting guardrails around something, I think about myself writing some code, right? And that's like-- which is going to be executed somewhere, and that's the test it's going to pass or not.

But with something like NeMo guardrails, you write the guardrails in natural language as well. And this kind of, for me, really made me think about, oh my god, actually, this is totally different to how I've ever thought about working with technology or working with software before. You know, I don't say, assert this contains harmful data. I basically ask the LLM check whether it thinks it contains harmful data, and then flag it up later if it does.

It's just a completely different way of thinking about solving these problems. And for me, if anything, that's what's super exciting about the future, is we haven't even I think started to see what we can do using the combination of things like guardrails, like agents, like LLMs in general. Yeah, as I say, super exciting. - So the next one is-- excuse me, so this is basically a serving engine for running large language models. So it allows you to do high-throughput, memory-efficient inferencing, which is important if you want to run LLMs yourself. So I think this is important because not everyone wants to use a cloud or SaaS-based LLM, and sometimes there are good reasons for that.

If you're already on one of the big hyperscalers, and you're going to use AI on their platform, likelihood is that's a fairly low-barrier path forward because you've already got a relationship with that cloud provider. You already trust them to a certain extent. But not everybody is in that situation. Or even if you are, you might have use cases where you don't want the data to go outside of your organization.

We worked for a large financial client where their policy is not to use any of that kind of stuff. They do everything on-prem, and they wanted to do code completion. And so we ended up writing them, or building them, a system based on some open source code LLMs, and fine-tuning that for their situation, and then running it on a bunch of GPUs in the corner of their data center. And so there are definitely use cases where you might want to run and run this yourself. The vLLM seems to be an area where the open source community is advancing the state of the art, even a little bit ahead of what the big players are doing-- or at least the way that the big players are releasing frameworks-- because if you're a big organization, you can maybe just throw bigger GPUs at this problem.

Whereas if you are not an OpenAI or a Microsoft, maybe you actually need to think about the kind of hardware that you can run this on. And so getting the best out of your hardware with something like vLLM starts to become important. Our teams have had good results with it. They've run a bunch of different models-- Code Llama 70 billion, Code Llama 7 billion, and mixed rule. So it certainly seems like this is useful if you want to serve your own LLM results.

OK, and we'll move on to the last blip. So do think about putting your questions in the Q&A section, and we'll try to get to those. Coming back to LLM-powered autonomous agents, but with a bit more of a software development spin on it. So I think the two things that have been making headlines-- well, making headlines last year at least-- was Copilot Workspaces from GitHub, which was an ability to actually use AI to create-- you know, spin up an environment in the cloud, and then fix bugs in that and create you a pull request that you could then approve.

And more recently, something called Devin AI, which over the last couple of weeks has gotten a lot of attention with some really impressive looking demos, where Devin has the ability to do what I call "chain of thought on steroids." So chain of thought is a prompting style for LLMs where you say, hey, how would you answer this question? And it turns out, rather than just asking for the answer, asking for all the steps, you actually get a better more consistent good result from an LLM. So this does chain of thought, and it allows the AI to make a plan for-- and this is for bug fixing, actually, this particular use case-- set up an environment, reproduce the problem, and add debugging outputs to the system, fix what it thinks is the problem, add a test, confirm that it's fixed, tidy up afterwards. I would try to add the test first, personally, but whatever. Maybe we have to adapt slightly to what the AIs are going to do.

And so that's really interesting, right? Because you see a snazzy demo that says, hey, look, we can fix bugs in software. And actually, the AI is doing the bulk of the work. We know that computer systems get better, faster, cheaper, so again, this kind of comes back to the potential promise of us being able to build software faster, or maybe take some of the maintenance drudgery out of it. And maybe us humans can work on the cool new features, and the AI can work on fixing all the bugs that we made when we invented, when we created the cool new features. And there's definitely a question about the difference between a demo and using something for real. I think there's been a lot of demoware over the past year where things have looked really great, but actually, the agent is-- if the agent is 70% reliable, is that useful enough? If it's only 50% reliable, it's probably not useful at all other than running it 10 times and recording one of them as a demo.

But this might imply a change in the way that we build software as humans. If you're bugfixing, do you need to bugfix any more? Or does your job become an overseer of AI agents that are trying to reproduce the problems and fix them? Maybe your job becomes somebody who is-- maybe you understand a lot more about the context of the problem. Maybe you're doing a lot of translation between end users who are reporting problems and actually setting up real failing test cases in order for an AI to be able to work on them.

I think that's interesting. And then, the other thing that this gets into is, if you've got a single agent that can do something useful, can we build multi-agent systems where you have multiple agents, each doing, maybe, a task-specific job. Maybe it's a very customized agent that can do kind of one thing in your software process or in your business process, and then you kind of chain those things together? That, to me, is interesting. But the thing I worry about at the moment, a lot of those things, they're chaining themselves-- you're chaining them together using English as the glue language between them-- which, as James was pointing out earlier, is incredibly powerful, but on the other hand is wickedly imperfect and loose as a language, right? Is that where we end up, that we're going to actually use English to specify the contracts between multiple agents in a system? I think is very unclear where all this stuff is going to go. But hugely interesting for dev teams if you can get an AI to do some of the work for you. - OK, so yeah, we have quite a few questions.

I think some are-- starting off with, there was a question earlier in the chat around, essentially, what is the future of developers and coding? Is coding going to be dead like everyone's predicting? There's also some nuance around-- there's other questions around engineering practices, and how you can do these, how you can use these tools along with best practices as well. And the challenges are there. So maybe we can start with those type of thoughts and thinking. - Yeah, I mean, I can react to that.

and I think the thing about development is it's actually really hard, right? Like I think, we all know this. You get one thing subtly wrong, and everything falls apart. You misinterpret a requirement in some way, everything falls apart. You don't ask whoever's asking for this software feature exactly, why are you trying to accomplish that thing, right? Like there's a whole bunch of stuff to building software that is not just writing lines of code.

And even just the writing of lines of code is a difficult problem for AI to solve because you have to be so precise. But one thing that leads me to think that there is some legs to this particular AI revolution, and we're maybe not going to see another AI winter after this, is just the continued pace of capability advancement since Chat-- I don't know, ChatGPT 3.5. Let's call that the minute where the world woke up to the fact that AI was a thing again. Over since that-- that was like December, 2022, I think, right? So since then, there's been continued improvement, like continued new stuff-- vision capabilities, unit coding, all sorts of stuff just improving.

And it's all getting faster. It's all getting cheaper. And at the same time, companies like OpenAI are saying, no, we need trillion dollars because we're going to build even bigger models. NVIDIA last week is saying, hey, we built a bigger GPU, and actually, the GPU is-- I don't know. I think they're getting to the point where they're claiming that the data center is a GPU. That's kind of the direction in which they're going.

So bigger, more compute, more training power, more data used to do the training. So the folks who are really on the cutting edge of this research see no reason to stop investing in it, or to not build bigger models and all that. So I think this is going to continue going. I do think we've-- again, software is so complex that I don't think you're going to remove the human in the loop. But I do think we're eventually going to get to things where humans don't need to do that stuff anymore, right? So we don't do transistor placement in a CPU, right? Like actually, there's not even transistors. But you know what I mean, right? Like we don't do gate placement in a CPU anymore.

There are hardware description languages where a chip designer figures out what are the kinds of things I want to use, and then the system compiles it down to-- compiles it down to stuff you would actually fab in silicon. Even that is changing today. Like NVIDIA is saying, no, we use it-- we use AI to design our chips now, right? So you're getting higher and higher-level abstractions. And I think we'll see that in software. So maybe the developer's job becomes thinking, maybe, more about high-level concerns like system architecture, large components and how they interact-- I don't know, messaging strategy, persistence strategy.

I don't know. I would expect us to evolve to slightly kind of higher-level concerns than dealing with lines of code eventually. But I don't know how quick we're going to get there.

- And then, going off that, I think some people have had questions around, with AI generating code, where does the intellectual property concepts come into play? And what's our thinking on that, currently? - Yeah, so I mean, the first thing is I'm not a lawyer, and I'm definitely not your lawyer. So you definitely shouldn't take my advice on this and should look internally for legal and risk advice. But it's definitely something you need to look at.

It does seem like the larger players like GitHub, for example, are doing-- Microsoft are doing things like indemnifying people for use of their tools. So if you use GitHub Copilot, and you use it to generate some code, and you get sued, Microsoft will back, you up on that court battle. Now, maybe that's not helpful to you because you you're worrying about getting sued in the first place, or you're worried about the murkiness of all of this-- which is fair, you know? There are multiple models.

So there are open source, open-ish models, where they are much clearer about the training data that was used. And so if you're uncomfortable, you can look to one of those kinds of models. I think there's a ton of competition in the code generation space. Jetbrains just released their system general access a few weeks ago. So we're going to get to the point where this is kind of like an IDE. And I know people will have lively discussions in the bar about their favorite IDE and whether Vim or Emacs is better or whatever. You

know, I said Vim-- whoops. I think this is going to become like a standard thing that we all know how to use that is very standard like the IDE. And it's going to stop being this kind of competitive battleground.

And we're going to out the legal issues and all that kind of stuff. But until we do that, it's definitely worth looking at those, and looking at those things, and making sure that your organization is comfortable with the risk that you're taking on. - I do find the whole indemnification thing very strange. I mean, we've been through so many giant battles around copyright over APIs over the years that, like, what, suddenly we fix that by-- Microsoft just throws money at it? It's a very, very strange place to be. So I don't think legislation, or law, or anything is anywhere near keeping up with what's actually happening in this space, right? - No, and I mean, I liken it to, actually, the state of Uber and Airbnb, who went out and built big businesses that were arguably quite illegal, right? Like it's an illegal taxi business.

It's an illegal hotel business. And they just became big enough that the momentum that they had meant that those things just got kind of approved after the fact. I think from a practical perspective, that's probably where we're at with AI. There's lots of people saying, hey, what data did you use for training? But Sam Altman at OpenAI has said himself, that's OK. We don't need any of The New York Times' data. We're good-- because they've either bootstrapped a synthetic data generation system that is going to provide for their needs, or they've transcribed the entirety of YouTube, which doesn't have any copyright problems with it, or whatever, and they've now got trillions of tokens of training data.

So I don't know. I think that It's sucky to say this, but the realist in me feels the horse has left the barn on this one and we're not going to put the genie back in the bottle, to massively mix my metaphors. - I think that kind of goes nicely into another question that I saw around on-premise LLMs, and running LLMs on device, and concerns about using LLMs in a regulated space, as well as other privacy concerns. So any thoughts around that? - I mean-- sorry, James, I seem to be hogging the mic here, but feel free to [INAUDIBLE].

There's the reason you're the Chief AI Officer, Mike, right? So-- - Well, with an unpronounceable royal title. - Ciao. - My advice on that would be to experiment. So actually, a lot of these things, you can get a relatively lightweight code-generating LLM. For example, you can get Code Llama 7 billion or something like that and just play with it yourself.

There's a application called LLM Studio, Language Model Studio, that allows you to just like find models off of Hugging Face, download them, and run them on your local machine. If you've got an Apple Silicon laptop, the unified memory on those laptops lets you run those models really nicely. If you're a VR gaming nut like me, you've probably got an NVIDIA graphics card sitting in your tower PC, and you can run stuff on that.

And so there's a lot of competition, and things are moving fast in the open source and open-ish model space. So my advice would be to play with them and see what you can get. I think they are definitely useful. - And then I think there's another. I mean, since we just listed out a whole bunch of tools and all this, this kind of goes nicely into this next question as well. Is there going to be any consolidation, or do we think there will be any guidance on which tools people will start leveraging in the future? - I mean, our convention has been-- I mean, sorry.

Yeah, I mean, I'm not sure about, you know, in terms of the wider industry. probably. I'm sure there will be at some point. I mean, I know we're actively-- and, you know, ourselves, we're actively looking at which tools are most appropriate, which are the best players in the space, and these kinds of things. But that's in order to do a number of things-- both make our people more effective in writing software, delivering software for our clients, but also, then, how can you use these tools to deliver interesting solutions as well for our clients. So we're definitely doing that.

I mean, I think probably the recommendation, like most of these things on the radar, is, we can give you our experiences with some of these things. We can say, hey, these are really interesting. We're having some good results, and we're looking internally to do a lot of research into these products, into these tools and techniques. But I think, like everything else, you should probably be thinking about making some investments yourselves. And if we can help point you in that direction, then that's probably the way to go.

But yeah, I mean, I think there will be. I mean, I think there was a joke in the room, there would be some item where we were putting this together. Some blip would come up, and it would be like, all right, that's literally just a web page with something dot AI on it. Clearly, they've got a bit of money from somewhere like, but there's nothing behind it, right? There's so much of that going on at the moment, how much of that is going to shake out. I'm still waiting. I guess we're still waiting.

- Yeah, and I think that goes to goes well with another question. Was this-- like, since we do this Radar like every six months, was this one more difficult than others given the huge influx of all the AI hype and things like that to truly assess what's out there? - If I was to put my tongue firmly in my cheek, I'd say I'd prefer putting this one together than the ones, probably, about 5 years ago when everything was a JavaScript NVDM framework, right? So that, I really did not enjoy those. [LAUGHS] - Yeah. Yeah, and I think just from a volume of blips perspective, what did we end up with? Like 40 AI-related blips made it onto the radar or something like that? - Yeah, yeah. But we had way more. - So we were seriously thinking, do we need to rearrange the Radar and have an AI quadrant, right? Because we actually haven't changed our quadrant since we originally started doing the Radar 10 years ago. So we did at least talk about that and then wonder whether it was something we should change.

And it also leads us to think that there could be a place for state of AI in the software industry publication of some sort. Not the Radar, not quite the same kind of categorizations as the Radar, but just sort of a broad set of what we think is important in AI and software development and some guidance in there. So I'm seeing thumbs-ups and hearts in the chat. So like I'm guessing, I'm guessing that's positive affirmation that we should do something like that. So it's definitely on our list, definitely on our list. - OK, and then, I think this question is a little more forward-thinking, and probably will probably end us, I think.

Are you envisioning Gen AI agents and RPA converging at some point? I think that's Robot Process Automation, I believe. And then, I would think that's just a general question around Gen AI and all the industry-- how convergence of technologies in general, I think. What's our perspective on that? - Yeah, I think a lot of the RPA tools claim to have added AI in them already. One of the things that we're always concerned about when someone talks about RPA is traditional old-school RPA is usually on top of a GUI.

And so once you've created an RPA system on top of it, you're effectively freezing that GUI. So you've got an old system that you don't want to update anymore, and you're going to use RPA to automate it rather than doing the right thing and building an actual API. And then, you're locking that interface.

So we talk about pouring concrete over an old interface using RPA. Now, the potential of AI tools is that you can have fuzzier matching, and you can enable an RPA tool to work a bit more dynamically with that user interface. I'm still just wildly suspicious of the choice to operate something through something as loose as a GUI when you actually want to do machine-to-machine interaction. To me, I mean, I get why people do choose to do that, but I'm never going to get over the suspicion in the back of my mind, are we doing the wrong thing here? And I don't think AI magically helps that.

In terms of convergence, the big question is, are we just on the upwards tick of an S-curve with AI, and we will get to a point where we understand what AI systems can do and what they can't do, and that'll be that? Or are at the beginning of Ray Kurzweil's Singularity, and at the end of this, we all upload our minds to the Matrix? And then we start worrying about the heat death of the universe? I don't know. - Well, maybe this is a bit too philosophical, but on the [INAUDIBLE] point, I think it seems too much of the world that this has happened so suddenly. And boom, suddenly, everything is getting AI and LLMs. But actually, if you talk to a lot of the researchers in the space, what they've been seeing is a lot of incremental improvement over a number of years.

And is that going to continue? Is the incremental improvement going to continue? Or, as Mike says, is it, are we suddenly going to be welcoming our AI overlords? I'm skeptical. Even though I'm a bit of a futurist, I have to say I'm skeptical about the second part there. I think it's probably more going to be, again, you know, incremental changes that bring more interesting things to the table. But yeah. - Yeah, OK, with that sort of more philosophical ending, thank you all for joining the webinar. Thank you for your great questions.

I believe the Radar is publishing next Wednesday, so stay tuned for that, and enjoy the rest of your day.

2024-04-10

Show video