Implementing AI Technologies into Developer Tools with Tsavo Knott

[Music] Let's get into it. We won't be holding everyone up too much. We've got a good talk today. Turns out we build developer tools, but we also want to talk about how we build them. We are kind of a product that was born two to three years ago, and everything we do is basically based in AI. Small models to do specific tasks, large models to do conversational interactions, vision processing models for all that stuff.

We're going to show today how we've built a world-class product that ships across macOS, Linux, and Windows to tens of thousands of daily and monthly active users. That's what a next-gen AI product looks like. So, we're going to get into it. Real quick, quick introductions. My name is Tsavo, Tsavo Knot. I am a technical co-founder here at Pieces and also the CEO, which is not super fun as of late. We just closed our Series A, so the past three months were not much code and a lot of Excel and PowerPoint. I'm glad to be here doing another tech talk. We're really excited about what we're building. Sam is here with us as well. Sam?

Yep, I'm the Chief AI Officer. I'm based out of Barcelona. So, we really span the globe, right? And with that comes a lot of challenges and informs what we do with our workflow. Awesome. Well, let's get into it. Okay, so the main thing we're going to get into first is what we're building and why. Then we're going to talk about how we've layered in machine learning in a very thoughtful way into every experience and aspect of our product. These days, as people hear about generative AI, vision processing, and all these capabilities, it's super important to think about how they compose into a product experience that makes sense. That's what we're going to show you. We also are building developer tools,

which means every single person here can use our product. We'll take the feedback from it, and we want to get your thoughts. We'll have some time at the end for questions. It's going to be a bit of a technical talk. Sam's going to get into some of the challenges and nuances of bringing AI into production, be it small machine learning models or large language models at scale. We were one of the first companies in the world to

ship Mistral-5 and Llama-2 across macOS, Linux, and Windows in a hardware-accelerated regard. Sam's an absolute beast when it comes to that. We'll show you some really cool stuff that's coming up with phase three and phase four of where the product is going. Anything to add there, Sam? No, that's about it. Great. Let's start out with what we're building and why. Maybe some of you were in the main

stage earlier and got a little teaser. Long story short, myself as a developer, I've been writing software since I was 16. It's almost a decade, 11 years now. I feel a lot of pain in the form of chaos, especially as I navigate large projects. Be it in the browser where I'm researching and problem-solving, or in a ton of projects where I'm looking at this Rust project, this Dart project, this C++ project. I'm jumping around a bunch. Our team went from six people to 12 people to 26 people very quickly, jumping around on cross-functional collaboration channels. For me, it was already painful a year ago, and it feels like it's getting more painful. Raise your hand here if

you've used GitHub Copilot or some type of autocomplete tool to write code. Well, guess what? Every single person now can go from the couple of languages they used to write, three or four maybe five if you're an absolute beast, to six, seven, eight, nine, ten. Your ability to move faster, write more code in half the time in different languages is going up.

But what happens when you increase your ability to write a couple of languages in a couple of different environments to now five or six? Well, that's five times the amount of documentation, five times the amount of PR requests you're able to do, or the cross-functional teams you can work on. We're seeing this advent where the volume and velocity of materials that you're now interacting with, the dots to be connected, are fundamentally hinting at how the developer role is changing in this era of generative AI. For us, we want to build a system that helps you maintain the surface area as a developer in those workflow pillars: the browser, the IDE, and the collaborative environment. So, this chaos, how can we distill it down and make it a little bit more helpful, a little bit more streamlined for you throughout your work in progress journey? I'll kind of power through some of these other slides. Maybe it's hard to see, but how many people, this is what your desktop looks like at the end of a workday: lots of browser tabs, lots of IDE tabs, maybe Gchat open or Teams or Slack. That's what chaos feels like. Then you close it all down because you want to go home and drink some tea. Then you wake up the next day and you're like, "Oh,

it's all gone. Where do I get back into flow?" That's what we're talking about, developer chaos. In reality, it starts to feel like there's so much to capture, so much to document, so much to share, and there's not enough time. How do we do that for you in a proactive way? How can we make these experiences where it's like, "I found this config, it's super important. I changed this implementation but didn't want to lose the old one," or "Here's a really great bit of documentation from a conversation I had, just fire and forget and save that somewhere," and have that somewhere be contextualized and fully understand where it came from, what it was related to, who you were talking to about that material, and then maybe give it back to you at the right time given the right context? What we're talking about here is the work-in-progress journey. Most of the tabs you interact with a day, most of the projects you have, most of the messages inside Gchat or Slack, all of that is lost. You maybe capture 10% in your mind, and that's why as a developer, our job is to half remember something and look up the rest because we simply cannot capture all these important context clues and these dots. We're kind of like in this

era now where that stuff doesn't have to be lost. We have machine learning and capabilities that can capture this stuff for us. That's what we're going to get into. Just to put it simply, we're talking about an AI cohesion layer, a layer that sits between the browser, the IDE, and the collaborative space and says, "This is related to that, and that is related to XYZ." We're going to capture that and save that for you, and then give it back to you given that same context at the right time. Phase one was really simple. It was a place for humans to save. This is our first principle, fundamental truth. People need to save things. As a developer,

you don't realize you should have saved something until you realize you didn't save it, and that's really painful if your job is to half remember something to look up the rest. We want to make a really great place for people to save things, and that was phase one, roughly two years ago. This is kind of like an AI-powered micro repository. I'll do a quick live demo here, where when you save something, all that stuff you didn't want to do but should have done to organize it—tags, titles, related links, capturing where it came from, who it's related to—all of that we want to do with AI. That's what we started out doing two years ago. I'll just do a quick example here.

Sam, maybe if you want to hold that for me, and Sam's about to get into the tech on this as well. By the way, I'm not even lying, look at my tabs that I have open right now, just like 50 tabs, docs, GitHub, the whole nine yards, got my email, everything like that. But I wanted to feel natural, right? I'm not up here with just a couple of tabs, like I'm in your workflow every single day, same deal as a technical co-founder. Let me go ahead and show you this. Maybe you're on, you know, open source, right? You're looking at an image picker implementation. So, I'll kind of show you what this looked like before. You're here, you're checking it out, maybe I'll use this, maybe I won't, I don't really know, but I'm going through the doc and I'm like, "Okay, I want to capture this."

So, this is what it looked like before. Maybe you go to your notes, I don't know, maybe use text files or something way more primitive, you paste it in there, right? It's not capturing the link it came from, you're not titling it, tagging it, it's going to be really hard to find this later, whether it's Google Docs, Notion, you name it. A couple of years ago, we said, "Can we build a product that has AI capabilities to do a little bit more magic for you?" We're going to grab that same code snippet, and by the way, I have very limited WiFi, so I'm actually going to do this in on-device mode. Here we go,

we're going to do this completely on-device, and I'll go ahead and... so what I just did there was I switched the entire processing mode to on-device to give you an idea of what this edge ML looks like. That same code snippet that's on my clipboard, I'm just going to go ahead and paste that into Pieces, and you're going to notice a couple of things. Maybe it's a little dark, so I'll switch to light mode. Classic, if Macs in the crowd, we got some buttons. Either way, what you see there is I just pasted that bit of code, and it right away said, "Hey, that wasn't just text, that was actually code." This is using on-device machine learning, and it also said it wasn't just any code, it was Dart code. We took it a step further. So that title I

was talking about, it's an image picker camera delegate class for Dart. If you flip it over, you're going to see there's a whole bunch more enrichment as well. We gave this thing a description, we're giving it tags and suggested searches, and we're also going out to the internet and aggregating related documentation links. When you save something, we're pulling

the outside world in to make sure that stuff is indexed and contextualized. So I can say, "Hey, this image picker is highly related to this bit of doc right here on the animated class." That's the power of machine learning. I just took a paste from what it looked like three or four years ago into my notes app, and I pasted it into Pieces. Instantly, with no internet connection, it enriched it. It said, "Here's all the details, the metadata that you're going to use for organization, for search, for filtering, and sharing later on." It also took it a step

further and said, "Here are the things that it is associated with—the people, the links, the docs, etc." We're going to get into that a little bit. That's just super simple paste. There's a whole lot more to Pieces that we'll talk a bit more about shortly, but let's keep it rolling. Sam, can you tell us a little bit about how that happened, what was going on there? Take us away.

One of the big challenges we faced was this happened incrementally. We started off by classifying code versus text, then language classifying, then tagging, then adding a description, a title, all the rest of it. Once we had all these models, effectively what we had were seven different architectures, each reliant on their individual training pipeline, each needing separate testing, optimization, spread instruction, and all the rest of it. One of the challenges we faced there was once you have all these models, how do you deploy those across these different operating systems, especially when you're training in Python but not deploying them in Python if they're on-device? We were really lucky. We sort of hit this problem just as the whole ONNX framework started stabilizing and being really useful. Eventually, we plumped for that. Of course, this had a lot of challenges. You're training seven different models,

each requiring different data, which is a huge overload on cleaning and preparation. It doesn't scale well for local deployment. We were sort of creaking at five models by the time we got to seven. Things were taking up a lot of RAM, and it was quite intense. It's also super slow and expensive to retrain and train these models. Each one's working on a separate pipeline. We're using CNNs, LSTMs, there's not much pre-training happening there.

One of the solutions around this came with GPT-3.5. Suddenly, there's one model that can do all of these tasks. Great solution, done. You can wrap GPT, wrap a cloud model, get all this information, fantastic. The only problem is the LLMs are too slow. What you

saw there was seven models executing like that. You can't do that with cloud calls, even if just over the network cost. It's not going to happen. What we saw when we integrated LLMs was this whole experience became super slow. Also, these cloud models, they're fantastic general learners, but they're really not there for coding, for your normal coding tasks. I was reading a paper a couple of weeks ago using GPT for

Python coding tasks. You had an issue, you gave it the bits of code it needed to change to solve the issue, and GPT-4 managed 1.4% of these tasks. So, two challenges: slow and not really there to work with coding. I would add one more: security, privacy, and performance. We have DoD users and banking users and things like that. They can't send their code over the wire and back. There was a huge thing around air-gapped systems that were able to do this on-device.

Exactly. We were looking for ways to take this sort of experience you have with the large language models, which is very nice, generating tags, generating descriptions, getting really nice stuff back, and put it locally, serve it on-device. One of the techniques that really helped was this low-rank adaption paper that came out from Microsoft, which solved a lot of our problems. Lots of you maybe, who's heard of LoRA before? There we go. I'll run

through what it does and why that really works for us when it comes to serving these models. With low-rank adaption, you take a general learner, a large cloud model, use that cloud model to generate data for your specific task. It doesn't have to be a lot of data. You then take a small model, something that executes very quickly, and you tune a very specific part of the weights in that model. We were using transformers, a T5, an excellent foundation model. Specifically, we were using a T5 that trained extensively on code, so it was good at generating code, very bad at generating descriptions. We found

if we took this general learner, used it to generate maybe 1,000 examples, we were then able to take this T5 trained on code and distill that behavior into that model. The way you do that is you focus on a particular part of the architecture, the attention layer. An attention layer, there are lots of analogies to what it does, but it's basically just mean pooling. What you do is you parameterize that and learn these weight parameters, which are a transformation on the attention layer. At inference time, you run inference as normal. When you get to the end of the attention layer,

you sum these LoRA weights with the attention layer, pass it forward, and what you get is a highly performant task-specific learner from a more generalized one. This had several valuable features for us. The first was no additional latency. There are a number of ways you can take the knowledge from a large learner and distill it to a small one. This one was particularly nice because we don't have to pay any extra cost for inference. That was number one. Second, training is much faster. You can get nice results with maybe 1,000 training examples, very nice results with 5,000-10,000, you're solid.

What that meant is we went from training seven different models with different data pipelines on hundreds of thousands of examples, sometimes millions, to reducing that to 5 or 10K, which is great for speed of iteration. It also means our engineers can train these models on their Macs. We're talking maybe six or seven minutes to train all of these locally, which means we can move a lot faster, deploy a lot more, and retrain and update these models almost on the fly. The other really interesting thing is this works really well with other methods. You can use LoRA. If you're not getting quite what you want, you can use prefix fine-tuning models. Most importantly, we went from serving seven separate models, which impacted the size of our app, the memory usage, and all the rest of it, to serving one foundational model, a small T5, coming in at 190 megabytes, and then just adding these LoRA modules on the way. This

meant we could serve seven different specialized tasks on the back of one foundation model with these modules. It meant we could move a lot faster. It was a huge turning point for us. Phase two. Phase one was okay; I can save small things to Pieces. I can paste some code in there, a screenshot in there. It's going to give me everything I need. When Sam's talking about those LoRA modules, those embedding modules, we're talking about modules for titles, descriptions, annotations, related links, tags, etc. We have this entire pipeline

that's been optimized. To sum it up, the LoRA breakthrough was pretty profound for us. Then we moved on to another challenge: how do we generate code completely air-gapped or in the cloud using your context of your operating system or a current code base or things like that? How many people out there are hearing about gen AI or AI features from your product teams or management? Who's getting pushed to do that? You're going to face challenges like which embedding provider to use, which orchestration layer to integrate, which vector database to use. These were all challenges we evaluated early on. We ended up at our approach, where we want our users to bring their own large language models. If you work at Microsoft, you can use Azure or OpenAI services. If you work at Google, you can use Gemini. We'll do a quick demo of this. From the initial idea that humans need a place to save, we moved to phase two, which was how do I conversationally generate or interact with all the materials I have saved in Pieces, and then use the things saved in Pieces to ground the model and give me the relevant context? When I go in and ask it a question about a database, I don't have to say what database, what language, or anything about that. It just understands my workstream context. It even understands questions like who

to talk to on my team about this because Pieces is grounded in that horizontal workflow context. I'll do a quick demo of this and then we'll keep it trucking along. Let's take a look at what it means to have a conversational co-pilot on device across macOS, Linux, and Windows. We're going to be doing this in blended processing only. I am offline right now, probably a bit far from the building. One thing I wanted to mention super quick before I get into this is Pieces understands when you're saving things. What I was talking about is the additional context: when you're interacting with things, your workflow activity, what you were searching for, what you were sharing with whom, etc. That's the when,

which is an important temporal embedding space for the large language model. Over time, you begin to aggregate all your materials. All of these materials are used to ground both large language models and smaller models on-device. These are the things I have

already, but let me go over to co-pilot chat. Something interesting here is that you can basically think of this as a ChatGPT for developers, grounded in your workflow context. I can come in here and it gives me suggested prompts based on the materials I have. I'll do a little bit of a setup. I'll use an on-device model, as I mentioned. You can obviously use your OpenAI models, Google models, whatever you like. I'll use an on-device model. I'm going to use the F2 model

from Microsoft. I'll activate that one. Let's ask a question. This is how you know it's a live demo. "How to disable a naming convention in ESLint?" Who's had to do that before, by the way? Any pain? Okay, well, you're lucky. Let's see what the co-pilot brings up here. I'm running on a MacBook Air here, an M2. We were really lucky to intersect the hardware space starting to support these on-device models. That's looking pretty good. It looks like we got a naming style with the rules.

Ignore naming style check. Great. When in doubt, if you're having linting problems, just ignore it. Besides that, that's a generative co-pilot grounded, air-gapped, private, secure, and performant. We think that this world of proliferated smaller models working in harmony with larger models is going to be the future, integrated across your entire workflow. Everything you saw here in the desktop app is available in VS Code, JupyterHub, and Obsidian, and all of our other plugins. Definitely worth a try. Let's get back to the tech. Just wanted to show you how this stuff actually looks, all the crazy stuff that Sam says. How do we actually productionize that?

Sam, you want to take us away? For sure. What you saw there, a couple of things are fundamental. First, you need to design a system that's isomorphic across cloud models and local models, which is super challenging. You need something you can plug into any model because the entire model space is moving so fast. If you don't

do that, you're going to be locked in and left behind. Second, you need to take full advantage of the system. You've got to use any native tooling, any hardware acceleration. Finally, you need a system where you can pipe in context from anywhere. A lot of AI products out there, you're stuck within one ecosystem or application. We wanted to break that. We're a tool between tools. We want to enable you to add context to your AI interactions from anywhere. These were our design principles for the co-pilot.

How did we do this? It was an emerging ecosystem when we got into this. We looked at ONNX Runtime, which was great but didn't have the tooling to run a vast selection of models. We settled for TVM because it abstracted away a lot of the hardware optimizations. Specifically, we used MLC. Has anyone used MLC or TVM? This is your go-to if you want to deploy a local model. MLC was great because it's open source. A lot of the work when you're compiling a

machine learning model is designing specific optimizations for certain operations across a wide range of hardware. That's almost impossible with a four-person ML team. Leveraging open source was fantastic. This is all in C++. We had to get it into Dart. Excellent tooling in Dart to do that. We used FFI to handle the integration into the app.

You've got this thing. Large language models, if you want them to perform well at a specific task, you have to put a lot of effort into your prompt. That's just a fact. You need a way to slam context into that prompt as well. If you're prompt engineering for a specific cloud model, that's not going to do well across specific local models or even other cloud providers. We built a system that allowed us to dynamically prompt these

models based not only on what the model was but where the information was coming from. We wanted to integrate multimodal data. A lot of your information is in text, but I watched a lot of videos when learning to code. My team insists on sending me screenshots of code, which is super annoying, but there you go. We wanted to take away the pain and allow you to add video and images to your conversations with these models. We did that using fast

OCR. Initially, we used a fine-tuned version of Tesseract. Does that ring any bells? Super nice open-source OCR system. We pushed that as far as it could go for us. Now we use native OCR, which at the time seemed like a great move away from Tesseract, a clunky system developed over two decades. Using native OCR had its own challenges.

Real quick on the Tesseract, who here uses Linux? Okay, we'll have to figure that out. What we're talking about next is the evolution of our product. Phase one was humans can save to Pieces, and Pieces will enrich that, index that, make it searchable, shareable, and all the like. Phase two is you move into the generate, iterate, curate loop, using conversational co-pilots to understand, explain, and write code grounded in the things you have saved. Now we are moving into phase three and phase four. We've just closed our Series A, but let's show what all this is about. You guys are some of the first in the world to check out this crazy stuff we're about to release before it's even on TechCrunch or whatever.

We want to build systems that are truly helpful, systems that you do not have to put in all the energy to use. You don't have to memorize what this button does and so on. We want systems that are fast, performant, and private, that feel like you're working with a teammate. We know you're going to need this, you're about to open this project, talk to this person, or open this documentation link. Can we, based on your current workflow context, give you the right things at the right time? The other thing is I don't want to build a 100-person integrations team. There are so many tools that developers use every single day. Building and managing those hardline

integration pipelines is a really difficult thing to do. Standardizing those data models, piping them into the system, and so on. We've spent a lot of time in the past year and a half on some in-house vision processing that runs completely on device, is extremely complementary with the triangulation and the large language models that process the outputs, but it allows us to see your workstream as you do. Understand who you're talking to, the links you're visiting, what you're doing in your IDE, what's actually changing. This is the same approach that Tesla took with their self-driving stuff. Those cars run on-device, see the world as you do, and aren't designed to hardline program every rule for every country for every driving scenario. We wanted to scale very quickly across all the tools, both in an air-gapped environment

or connected to the cloud. From there, can we truly capture and triangulate what's important? That's hilarious because this is straight from our Series A pitch deck. It says for designers and knowledge workers; we're talking about developers today. That's hilarious. Sam, do you want to add anything here? I would say something super important when we're deep in your workflow is we need to stop you from surfacing any secrets or sensitive information. A big component of this is filtering and secrets removal and giving you the ability to run it completely air-gapped on-device.

That's right. Just like the human brain, you take in so much more than you actually process. The job is to filter out 90% of the information and capture and triangulate and resurface the relevant 10%. That's what Sam's talking about. There's a lot of machine learning and high-performance models involved in the filtration process, but it's not just machine learning. I don't know about you, but I'm fed up with interacting with AIs through a text chat. I think there's so much more room for innovation around how we interact with these models beyond just assuming that's the best way. Part of this is really exploring that. Yeah, I would say the joke I make to investors is we went from a couple of keywords in a Google search bar and then opening a link and scrolling to now full-blown narratives in ChatGPT, trying to prompt and reprompt and dig it out. It feels like it's become a bit more verbose

and laborious to use these conversational co-pilots. We think the way they're going to be integrated into your workflow is going to be different. They're going to be more heads-up display or autocomplete-esque systems. That's what we're going to show you. Sam, do you want to hit on anything here? I think we've done it. That's good. Great. Let's do a quick demo here. The first one, which is going to be pretty awesome, hopefully we can see some of this. Basically, I'm in my IDE and Sam,

you could probably talk through some of this as well. This is your demo. For sure. Let me remember. This is my workflow. I'm just moving between windows. This is how you knew I was offline during the demo. There we go. I'm just doing a bit of simulation here. This is a super early proof of concept from a while back. Basically, we're moving between screens. To do this previously, we'd need four integrations: one for the IDE, one for chat, one for Not ion, and one for GitHub.

Let's see if I can switch to the hotspot real quick. There we go. T3 for the win. By the way, that's why you need on-device machine learning. If we can't load a video, how do you think we can do all the processing in real-time for you? It's not going to keep up with your workflow. That would be an absolute nightmare. All right, here we go. What happened here is we've skipped around the workflow, capturing the information from the active windows. We're using your activity in your workflow as an indicator of what's

important. We're not capturing everything, just what you're looking at. We're adding it in as context to part of the chat. This enables me to ask, "Could you tell me a little bit about Dom's PR?" and it's all there. This highlights one of our key principles:

AI is interesting, and these large language models are very convincing, but without the actual context, the outputs are useless. When you give them the context, you can get these magical experiences where you can talk about your team and all the rest of it. We decided to take that and push it further. I'll just add real quick. The way this will

actually manifest is in reality, you can go to your co-pilot and say, "Give me workflow context from the last hour or the last six hours." Instead of adding a folder or adding a document, you're just saying, "I want to use the last three hours of my workstream to ground the model." Then you can ask it, "What was Dom's PR? What was going on? What are the specific changes?" In this video, you can see this real-time output, real-time aggregation, or real-time triangulation as Sam is just moving through the browser, in Gchat, looking at research papers, looking at documentation. Over on the left, it outputs the links, who you're talking to, the relevant documents, filtering a bunch of stuff. That will also be used to ground the model. You can say for the last hour, "What was I doing?" We think those conversational co-pilots, as I mentioned, are a bit laborious. We

want to give you a heads-up display given your current context. Wherever you are, if you're in the browser looking at a webpack config, can we give you relevant tags that you can click on and expand? Can we give you snippets you've already used related to webpack? Can we tell you with high confidence you're about to go look at this npm package or check out this internal doc? Can we give you a co-pilot conversation you've had about this webpack config previously? Can we tell you to go talk to Sam or someone on your team that's worked on webpack? Also, can we point you in the direction of an introduction webpack project? I didn't have to do anything here. Pieces are running in real-time on your operating system, giving you what you need when you need it based on your current context in the browser, in IDE, in collaborative environments. I don't have to type out a sentence. I don't have to go search for anything. I switch my context, and the feed changes. These are the experiences we think are going to be profound not only for developers but for a lot of different knowledge workers at large. Having the right thing at the right time. You look at Instagram, YouTube, TikTok,

Facebook, every feed out there in the world. It's really good at giving you interesting content. Why can't we apply that to your workstream to help you move faster each day? That's a little about the workstream feed that'll launch later this quarter. We've got some crazy enterprise capabilities that'll launch later this year. I think we are at a wrap here. We'd love to have you come talk to us. We've got a booth with no words, but it's got a really cool graphic. You

can't miss it. It's black and white. Sam and I will be around to answer questions. Thank you. Did you say it was watching you scroll through a browser window? Yes, that's right. In real-time, on-device, no internet. So it would see the URL that you might hover, seeing what you're reading also? That's right. In a really smart and performant way, by the way. A lot of processing, so we do a lot of active cropping. I wanted to know if you could talk more about how you decided to subdivide the tasks from a larger model to smaller models. Running these models locally, did these smaller models provide more leeway for the tasks? You want to talk about this? Well, it was one smaller model. That's the key thing. We went from seven to one. That's a huge

saving in terms of memory and all the rest of it. In terms of what tasks we decided to focus on, that was really informed by designing a snippet saving app. All of this metadata we generate is very much keyed into you being able to find the resources you've already used. Things that are useful to collaborate. That's why we hit up links,

related people, descriptions, all the rest of it. Those tasks almost designed themselves. There are constraints. If you want to generate a description, there's a limit to how much memory you're using at that time. If you can generate that description in less than 100 milliseconds. We look at the problem and say, "What's the probability we can generate a two-paragraph description in less than 100 milliseconds?" We work backwards to solve for those tokens per second outputs. We work backwards to solve for GPU, CPU, memory footprint, whether it's cold started, you turn it off and on to free up that memory. There are all types

of challenges involved. I think generating a description is the largest, but generating tags, titles, those are all sub-problems of descriptions. We executed that pretty well. Other questions? When are we going to be able to use AI for no keyboard, no mouse interactions to do our tasks? We're waiting on Neuralink or Apple Vision Pro, not really sure which one's going to hit first. If I was thinking at the speed of OpenAI's output, those tokens per second, I'd probably be pretty bothered. We need to get that way up. We need to make it efficient. It's quite expensive right now to process all the data as we're just throwing it up at these large language models. It's kind

of like the human brain is a quantum computer walking around and operating on 13.5 watts. We need to hit that level first. I can't give you the answer, but I know we're well on our way. I love my keyboard. I don't think I'd ever want to be in a situation where I just talk to my computer. We're getting there, at least heads-up displays. You showed how you literally watch what's being done. Any way of going

the other way around in the future, instead of just being in a tiny window, you have a draw on top of the apps to show what buttons to click and what the definitions are? I think it'll be really tricky, but if we fail at building developer tools, at least we could open source the system so robots could order at McDonald's. We know that button's important to click, we're going to do that one. We're looking at the URL, the tab name, what's scrolling, and so on. Vision processing is going to be important. Our eyes are pretty important for us. These systems need to be able to do that. If we could build a great foundation on that to overlay recommendations of actions to take, that would be cool. But we'll get there. That's maybe Series F or something.

One more question. That's exactly right. Our models strictly understand technical language. If you asked it a question about poetry, it would not give you a good output. That's how we're

able to detect if you're working on technical language or natural language. Our models are biased towards code. We've trained them on a lot of code. When you distill a model down, you have to be very specific in what type of environment it's going to perform well in. When you take it

outside of a technical language environment into a heavy natural language environment, the performance is capturable but not helpful. If you ask the co-pilot about the bank statement, it might do all right, but we don't care about that. We filter all that out. Just don't tell your employer you're using your work laptop for 25% personal stuff. We

get around this with a lot of clustering. We came to co-pilots after we'd done a lot of work in embedding clustering and search. We leverage that work. At the end of the day, you can run this all on your device. If you're using that device half and half for personal and work, feel free to use our co-pilots half and half for personal and work.

One more. I'm trying to think of this in the context of my everyday workflow. Let's say I've been using Pieces for three months. It knows exactly what I've been doing. It knows my Kubernetes clusters and all that. Do you have something in your roadmap where it interacts with Kubernetes, where I say, "Hey Pieces, remember that API that I built that has this CICD workflow in GitHub and is deployed to this cluster. Do that again." Is that something?

We're really interested in human-in-the-loop systems. A lot of work is being done around these autonomous agents. We see a lot of interplay between co-pilots. When we get into the operational execution side of things, we'll slow roll that a bit before we say, "We know all the docs, all the people, go out and do it for you." We're quite far away from that and we'll let GitHub take the lead there. For us, the stuff you interact with as a developer, that's at least 30-40 tabs worth, a couple of conversations, and definitely a Jira or GitHub issue ticket. All those dots we

want to connect for you. When it comes to executing, hopefully, you're still the one doing that. Otherwise, if it's fully autonomous, we might have to change the way we write code. We'll deal with that later.

Cool, I think that's it. Awesome, thank you guys. [Music]

2024-06-21

Show video