Unveiling SurrealDB 1.0.0 – SurrealDB World Keynote with Tobie Morgan Hitchcock

[Music] SurrealDB: A Step Ahead [Applause] Welcome to SurrealDB World 2023! I'm delighted that you could join us today, as we release SurrealDB 1.0 and dive into several major areas of database development. It's been just one year since Jamie and I launched our idea of what a database could look like to the world, and the interest, enthusiasm and uptake has been incredible. All around the world we have developers, teams and organisations building and creating applications on top of SurrealDB in ways that we hadn't even imagined, from embedded IOT devices, offline research data stores for healthtech, to gaming or traditional database deployments within larger tech platforms. Today we have people who have joined us from Europe, the USA, Asia, Africa and of course here in London, with over 2,000 people joining us online. I think I can speak for the team to say that we are really excited about what we have been working on this last year, and are releasing today.
We know that SurrealDB will change how developers and organisations build and simplify their applications, taking their projects to the next level. Today has been the culmination of groundbreaking work both within the SurrealDB team and with contributions from the wider community, and I can't wait to show you the result. I'm excited to talk to you about SurrealDB version one today, but before we dive into the product I'd like to welcome onto the stage Developer Relations Marketing Manager, Aravind, to update you on the community and growth that we have seen over the last year [Applause] [Music] Thank You Tobie. Wow, it's so great to be here with you all today.
Over the last year, the community and growth and interest in the SurrealDB has been infectious. During this time SurrealDB has been downloaded over 250,000 times and has reached an astonishing 22,250 stars on GitHub, as I checked this morning. Across our repositories we now have over 150 contributors and 500 active members, involved with bug fixes, feature requests, documentation improvements and community discussions. On Discord, our members now total over 4,500, with hundreds more joining every week. Members from the online community have made incredible contributions not only to the core database, but to client libraries in a number of different languages, from Rust, JavaScript, to Python and to Dart and Erlang. Many of our team members have joined us from the community itself,
to work on building SurrealDB into the future. Internally, so that we can support and help our global community evolve, our Community Team at SurrealDB is now five people strong. Over the next few months we will be building on, and improving, our documentation, deployment guides and tutorials, to enable developers to get going with SurrealDB, utilising all of its powerful functionality even faster. The community to us is who we are, from ideas and use cases all the way to contributions; without you SurrealDB would not be what it is. Join us at the 'Driven by the Community' talk by my colleagues Naiyarah and Alex later here on the same stage, where we will dive more into the community that has shaped SurrealDB. [Applause] [Music] Thank you Aravind for that update. As Aravind mentioned, the community is so important to us
here at SurrealDB. Now, I'm sure that you will all now have seen the SurrealDB clothing worn by all of our team members, on our online videos, and at our SurrealDB socials, or here today at SurrealDB World. Today I'm pleased to announce that the SurrealDB store is officially open. Let's take a sneak peek. The launch of SurrealDB store was about strengthening and extending the brand, producing clothing that can be worn by both developers and The Wider Community too. All our clothing is exclusively produced by Stanley Stella who specialise in sustainable and ethically produced garments and who are dedicated to using organic and eco-friendly materials. Visit SurrealDB.Store to check it out.
At SurrealDB we have been working hard internally to ensure that our clothing line reflects the SurrealDB brand and the things which we care about. Originality, quality, sustainability, design: these are some of the core attributes which we always want to keep in mind as we build SurrealDB into the future. Over the last year an important focus of ours has been on the tools and interfaces which developers use to interact and develop with SurrealDB. These come in the form of our query language and our client SDKs. We want SurrealDB to fit seamlessly within developers' workflows and tech stacks. In addition to improvements to the SDKs for JavaScript and Golang, we now also have community created SDKs for Java, C#, and .NET. But some of our most interesting work has been on our Rust SDK, which will form the basis of other client libraries in the future. Client SDKs,
built on top of the Rust engine, benefit from the SurrealQL type system, local query parsing and a binary communication protocol, which leads to better code and improved performance. On top of this users will be able to run SurrealDB natively within the programming language of their choice, using all the features that they currently experience with the SurrealDB database server. This month we'll be releasing SurrealDB.wasm, SurrealDB.node, SurrealDB.deno and SurrealDB.py. All of these client libraries will be built on top of our native Rust SDK, and will enable developers to run SurrealDB right within JavaScript in the browser, or on the server side, and within Python. Looking further forward we'll be releasing an SDK for C,
on top of which even more native SDKs can be built. Now when we launched SurrealQL alongside SurrealDB in August last year, to say that there were certain 'opinions' about a new query language would have been an understatement. But with comments like "the 'S' in SQL now stands for Surreal" spurring us on, we have added an incredible amount of functionality. SurrealQL, with a host of new functionality and statement types, is growing from an SQL-like query language into its very own programming language. Developers can now use even more advanced expressions and logic to model and query their data. FOR statements enable simplified
iteration over data, or for advanced logic when dealing with nested arrays or recursive functions. The THROW statement can be used to return custom error types, which allow for building advanced programming and business logic right within the database and authentication engine. Code blocks and multi-line sub-queries can be used alongside the looping and error functionality and allow for nested blocks of code with a single return type. This can
build on top of any of the advanced SurrealQL capabilities, making use of graph queries, record linking and data aggregation functions. In fact, over the last year we have added support for global parameters, custom function definitions or stored procedures, SurrealQL constants, range queries, complex Record IDs and a new strict typing system, all now available in SurrealQL 1.0. All of this new and improved functionality enables developers to build advanced logic right within the database, or to query their data remotely in simplified ways, saving development time and enabling developers to focus on the product. Alongside improvements to SurrealQL and the introduction of additional functionality, we've also enabled the ability for database administrators to configure their database in a more secure way. Capabilities introduced in SurrealDB 1.0 enable fine-grained control of the specific functions and network destinations
that can be used when running a SurrealDB server, or when operating in embedded mode. Now as a layered database platform, SurrealDB operates with the storage separated from the compute layer. As a result of this, SurrealDB supports the ability to run as an embedded database server in a number of programming languages, as a single node server or as a distributed database cluster. We now enable running SurrealDB on top of RocksDB, SpeedDB, FoundationDB and TiKV, and on IndexedDB in the browser.
While all of these Key-Value storage engines have their own benefits and will be supported by SurrealDB, a really exciting area of focus this year has been on SurrealKV. SurrealKV is our native embedded storage engine built entirely in Rust. Unlike other B-tree based or LSM-tree based data structures, SurrealKV builds upon TART, our custom built timed adaptive Radix Trie which forms the basis of concurrent and versioned data storage in the SurrealKV storage engine. SurrealKV will be optimised for multi-writer workloads with the ability to query historically at any version. As a transaction-based ACID compliant data store layer, SurrealKV will form the basis of Version Control within SurrealDB, a foundational feature which will enable us to support data change auditing, graph versioning, historic network analysis and aggregate queries over time. This embedded SurrealKV engine itself is optimised for large data sets and version values, splitting the storage of keys (which will often reside in memory) from the values which are more likely to reside on disc. SurrealKV will enable us to deploy SurrealDB natively in
any programming language, without the need for complex bindings with C libraries or packages. Now we're not releasing SurrealKV with SurrealDB Version 1 today, but we are really pleased with the progress we have made and look forward to our native storage engine being available in a future release soon. In the meantime, however, you can follow along the development progress online, and today we are releasing SurrealKV as an open-source project with an Apache 2.0 license. I'd now like to invite onto the stage Software Engineer and Epidemiologist, Dr Caroline Morton, to talk about how SurrealDB is being introduced for research purposes within a clinical setting in the NHS. [Applause] [Music] Thanks Tobie. I'd like to introduce the concept of how SurrealDB can be used to create dummy data. Okay, so say you have a cough and you visit your GP. You get your blood pressure taken, you get
your lungs listened to and you get a diagnosis of pneumonia, maybe you get some antibiotics. This is the sort of thing which will get recorded. So, every time you visit your GP or a hospital your appointment gets stored as a series of time stamped codes; snowmed codes for Primary Care, ICD-10 codes for hospital, and this gets used for research. So the underlying codes they can they get basically picked and it's a big tree, so for example cough, okay, I might put cough down and that's a finding, but the parent code of cough might be respiratory function finding, and cough in itself might be, well it is a parent to about 43 different codes; chesty cough, allergic cough, all sorts of different types of cough. Okay, the basic thing is it's it's a graph. Okay, so you then researchers like myself, we use statistical code to carry out research. Now,
lots of researchers don't think of themselves as programmers but they write code, but unlike lots of startups no one ever really checks their code and what they do the output of it is a paper saying 'this is what I did', okay, and 'this is what I found'. Now, I think we could agree that the ideal situation would be that you share the paper, you know the what you found, you also share the code, so we can see exactly what you've done, how it's been carried out, okay? And we can find errors and it makes the results more believable. But researchers don't share their code, and the number one reason they don't share their code is because the underlying data is not available. Okay, and that's appropriate. We don't want, I'm not advocating people uh you know release their private medical data online so don't worry, so we can't release the data but we do have a situation now where we've got researchers writing code that's quite important, and it's not really being checked and they're working in these secure environments and there's got lots of problems, one of which is you know it doesn't get that code doesn't typically get reused again, okay.
So by releasing a fake or dummy data set alongside the statistical code and the paper, I think this situation could be resolved, and so if we think about what do we want from our dummy data. So, there's a few complicated aspects of this, so we want similar code so I could code your pneumonia, okay, as pneumonia - finding. Okay, but somebody else perhaps my colleague next door will use a different code maybe infective pneumonia, which is a child code of pneumonia. Somebody else might code it as cough requiring antibiotics, so the data, the dummy data, needs to have this level of complexity that researchers need because they need to write statistical code which will capture all of the different ways you could denote this person has pneumonia. The second thing is conditions.
Conditions are related to each other, so simple example, if I was to take your blood pressure, your systolic blood pressure, that's the top number on the blood pressure reading, you probably have had a diastolic blood pressure, the bottom number, done at the same time. So those two things go together. And also conditions have shared risk factors, so we know that if you have had a stroke in the past you're much more likely to have a cardiovascular event like a heart attack, so those things coexist together more commonly. So why Surreal? So Surreal is a really good option
for this and Toby's talked a little bit about why this is but the one thing that I'm really, really interested in is how we can traverse a graph structure. So we want to model this complex relationship between different codes or nodes in a graph, this is something we can do in Surreal, we also want to run code snippets as part of a query, so as the node gets hit as we're traversing that that graph and the node gets hit we can send off an async thread and that will generate the data records and eventually come back to produce the dummy data. Relate statements are super useful for finding similar codes and pragmatically it's a single back end we could I hope one day have a nice GUI on the front of it and that will be available in a browser or maybe even a desktop app and I'm gonna hand back over to Tobie [Applause] [Music] Thank You Caroline. SurrealDB Version 1. We think that serial DB can bring many benefits to applications of all sizes, regardless of how they run. Whether embedded on devices or running as a traditional database
platform. SurrealDB Version 1, released today, marks the beginning of SurrealDB's journey towards a stable database platform suitable for integration within large tech platforms. To enable this, we have been working on three integral core functionalities to SurrealDB. The first of these features is Change Feeds. Here is Senior Software Engineer Yusuke to explain more.
In order to integrate SurrealDB within the wider technology ecosystem, with SurrealDB version 1, we are introducing Change Feed. This fundamental feature provides change-data-capture functionality to SurrealDB, enabling users and developers to track and respond to changes as they occur within the database. Whether exporting data in real time into the third party systems, moving data to object storage for backup or analysis purposes, or even for real-time cross-cloud synchronisation with other platforms, Change Feeds enable greater interoperability with other technologies within larger enterprise systems. In order to implement this core functionality, whilst at the same time
ensuring that it worked consistently, regardless of database deployment setup or environment, we needed to ensure that the logic itself was separated from a storage area within the database. Change feed functionality in SurrealDB sits within the ACID-transaction layer of database responding to any changes which occur, from schema and index changes to records and changes in the graph. Accessible to any database user with the correct permissions level, our initial implementation of Change Feeds can be applied to individual table separately or all tables within a database as a whole. What this mean to a developer is that applications can subscribe to specific data that they need, without impacting the performance of the database system or cluster. Change Feeds are beneficial both as an externally facing feature enabling users to retrieve the data as it changes, and also as an internal feature. Looking forward to the future. SurrealDB will use Change Feeds as an underpinning of a number of long-running tasks,
including a non-blocking indexing system. This will enable SurrealDB to support improved background indexing for traditional full-text-search and vector embedding indexes, allowing for zero-downtime asynchronous generation or reconstruction of large data sets, without any need for table or database locks. Head to SurrealDB.com/cf for more information. [Applause] Externally, Change Feeds will enable SurrealDB to play a role within the wider ecosystem of enterprise, cloud or micro-service based platforms, giving users the ability to retrieve and sync changes from SurrealDB to external systems and platforms. Internally, in the future, Change Feeds will enable SurrealDB to handle long-running tasks, including the rebuilding of unique, full-text search and vector indexes asynchronously, without any downtime or blocking. I'll let experiences manager Lizzie go into a bit more detail about how this will feature will be beneficial for developers going forward.
Change Feeds in SurrealDB are a foundational feature which form the underpinnings of change data capture. Integral for both internal uses to SurrealDB and user-facing benefits, Change Feeds enable a multitude of use cases for any application, from small projects to integration within enterprise platforms. Change data capture is the process of tracking changes in a database, in order to synchronise those changes with destination systems; this enables data integrity, data backup and consistency across systems and environments. From ingesting data into third-party systems, archiving data to object storage for backup or analysis purposes, or for real-time synchronisation with other platforms Change Feeds are a core feature for the enterprise. In SurrealDB, Change Feeds can be enabled on specific individual tables or applied to all tables within a database with just a single Surreal-QL command. Under the hood, SurrealDB tracks all of the changes made to table data by any user, whether running as an embedded or single-node instance, or if running in a distributed cluster with multiple SurrealDB nodes.
As a database administrator, you are then able to retrieve query data using the new SHOW CHANGES command, allowing you to retrieve changes since a particular version stamp, or by specifying a date and time after which the changes should be streamed. By introducing Change Feeds within the core engine of SurrealDB, we are enabling data consistency and synchronisation with your external platforms, regardless of the database setup or operating environment. In turn, this feature enables you to use SurrealDB as a central component of any enterprise, cloud or micro-service based platform. Head to SurrealDB.com/cf for more information. By introducing a new statement type, users can now retrieve changes since a specific version or timestamp for an entire database, or for a specific table. Version-stamps enable exact historic retrieval, whilst timestamps enable a more developer friendly way of achieving changes. Our second major feature in SurrealDB Version 1 is Live Queries.
Here is Senior Software engineer Hugh to explain more. In order to enable modern, collaborative and responsive applications to be built on top of SurrealDB, we decided to go one step further. Live Queries, although similar to Change Feeds, open up a whole new type of application to be built on top of SurrealDB. Whilst Change Feeds give a historic view over time of the changes to a database or specific database tables, with the ability to listen to changes since a specific point in time, Live Queries give developers the ability to receive real-time change notifications to data as it is happening, but without any ability to subscribe to historic changes. The big difference, however, is that while Change Feeds can be accessed by database administrators, Live Queries are integrated directly within the table row and field level permissions of SurrealDB. What this means to a developer is that each Live
Query notification is unique and tailored to the authentication of the user who issued the query. When looking at other real-time databases, streaming functionality can be built in by subscribing to the database changes. Authentication and permissions logic needs to be built in a custom API layer sitting in front of the database. With Live Queries, users can build
applications that respond to specific document changes, full table updates or aggregate table views with just a single select query using field projections or SurrealQL functions if desired. In order to implement this functionality so that it works both when running as an embedded database or in a highly scalable distributed cluster, we need to build it as a layer above the data storage engine, using a combination of in-memory and persistent storage based techniques. This ensures that any change initiated from any SurrealDB node in a cluster can be sent to the relevant node processing the live query, with ordered, at-most-once delivery characteristics. The applications where live queries can be of benefit are numerous ,whether for live updating user interfaces, real-time game notifications, dashboard visualisations, collaborative diff-patch-match based editing, live updating activity feeds live chat, or even for responsive geofencing detection, Live Queries offer a much needed feature with effortless integration, and when pairing this we've predefined aggregate views the functionality becomes even more powerful. We can't wait to see what people will build using this functionality. [Applause] As Hugh said, we really can't wait to see what people will build using Live Queries.
What was previously as complex as synchronising changes between multiple different databases, dealing with the permissions, authentication and business logic in a custom API layer and then handling the real-time communication with external users, now is possible by connecting directly to SurrealDB and issuing a single query. And this is because of the permissions and authentication layer built directly into the database. Here is Developer Experience Engineer Obinna to talk about the benefits that this will bring to developers and users. Live Queries in SurrealDB enable a simple yet seamless way of building modern responsive applications, whether connecting to SurrealDB as a traditional backend database or connecting directly to the database from the front end. With just a single query you can now subscribe to changes as they happen in the database, either a whole table, or by filtering the real-time notifications, so that only the desired change data is delivered. With just a single word addition to the traditional SELECT query, a Live Query enables you to select all fields from a document of projected fields. In addition, by using a native JSON-diff-patch
implementation even the exact concise document changes can be received whenever a document is modified. Live Queries are built right into the core of the database and benefit from all the functionalities that you can use elsewhere on the SurrealDB platform. Take for example the need to modify each change notification as it is delivered to your users. Here, custom functions can be used within the field projections to alter the data before it is sent to the client.
Most importantly, however, Live Queries in SurrealDB are fully backed by the powerful authentication and permissions layer, meaning that regardless of what a user has subscribed to, notifications will only be delivered based on the authenticated session of that user. This all happens seamlessly within SurrealDB in the same way it does for normal SurrealQL statements. By bringing the simplicity of Live Queries alongside the advanced nature of predefined aggregate views, you can now build powerful dashboards that rely on aggregate data queries, computationally expensive analytics queries and filter collections of massive data sets, that updates in real time as your data and your database changes. Head to SurrealDB.com/iq for more information. [Applause]
Live Queries is such a powerful feature, allowing you to take a simple SELECT statement and turn it into a subscription-based query with change notifications. But the real power comes when you combine predefined aggregate views with Live Query functionality, allowing you to subscribe to aggregated data as it changes over time with support for custom grouping, rolling averages and grouped minima and maxima. This is perfect for live updating dashboards, charts and visual displays. Our third major feature in this release is Indexing. Here is Senior Software Engineer Emmanuel to explain more. an integral part of any database system is the secondary indexing used to optimise and improve the performance of database queries and data analysis. With SurrealDB, when it came to
implementing indexes within the database we wanted to ensure that whatever approach we took it would enable us to offer the same functionality whether running as an embedded database, single-node database with vertical scaling or horizontally scalable distributed database cluster. In SurrealDB version 1 we are really excited to now have support for traditional indexing, unique indexes and constraints, full-text search indexes and vector embedding indexing. The quickest and simplest approach with indexing would perhaps have been to rely on any of the popular indexing libraries or third-party platforms but this would have limited the functionality and the applications where the indexing could have been used. Instead we reimagined how indexing might be implemented, opting for a completely custom built indexing engine which sits within the SurrealDB core itself. The engine is agnostic
to its deployment environment whether running on top of IndexedDB in the browser, an embedded runtime in Rust or Python, or distributed over multiple nodes in a highly scalable cluster. With this approach, instead of passing the document indexing query parsing and data structure storage to an external library or platform, SurrealDB enters all of this logic itself directly within the ACID transaction model of the database. What this means for a developer using SurrealDB is that the indexing engine is able to integrate and interoperate with the SurrealQL query language natively, without the needs for an additional external query language or for indexing specific functions or plugins. Looking at the indexing functionality itself,
we are really excited about what can already be achieved. For traditional and unique indexes SurrealDB already supports simple single field indexes, multi-filled compound indexes, nested object and array fields and also as support for flattened indexing of array data. With full text search, SurrealDB allows developers to define custom analyzers which specify exactly how their text data should be processed, with support for multiple tokenizers advance and filters including Ngram, EdgeNgram and Snowball and support for 17 languages from English to Arabic. With vector
embedding indexing our initial implementation supports exact nearest neighbour retrieval for vectors of arbitrary size using Metric Trees, with support for HNSW-based approximate nearest neighbours retrieval coming in the future. Along with the indexing, SurrealDB Version 1 now has support for explaining complexity of any query which selects data from the database allowing developers to understand the performance implications and index usage of their SurrealQL queries, and also gives users the ability to force the database to use a specific index. As the index data is stored directly within the storage engine and not within the query nodes themselves, this opens up the possibilities of how data can be indexed and queried at scale with SurrealDB.
And with the indexing functionality implemented natively within SurrealQL, we are really excited to see the uses and applications which can benefit from this technology [Applause] Indexing is such an important piece of any database platform and we're really excited with this initial implementation. Vector embedding indexing is now available in SurrealDB Version 1 as a beta feature, and we'll be working on the performance aspects of the indexing engine over the coming months. Here is Developer Advocate Pratim to go into how this can be used by developers and how it will affect applications built on top of SurrealDB. SurrealDB is designed for building applications of any size whether an indie-project or an enterprise platform, for that, query performance and improved data analysis workloads are key.
With SurrealDB secondary indexes you can now index data using traditional indexes, full text search indexing and vector-embedding-search for artificial intelligence use cases. All of these index types are native to the database meaning that they interoperate with the SurrealQL-query-language and work the same way whether running on top of IndexedDB in the browser, an embedded runtime in Rust or Python, or distributed over multiple nodes in a highly scalable cluster. For you that means that defining and implementing these indexes can be as simple as running a single query. Take for instance a multi-field compound index with nested array data using a single-index-definition-statement, we can easily implement this index specifying whether arrays of values should be flattened into separate-index-entries or not. For full text search indexes more options exist for configuring the indexing behaviour. Custom analyzers allow you to specify which tokenization methods are used to split text into boundaries and a range of filtering algorithms, including Ascii, Lowercase, Uppercase, NGram, EdgeNGram and Snowball.
They allow for advanced processing and stemming of all types of text in a large number of languages. When retrieving indexed results sophisticated methods for matching term highlighting allow for effortless integration with front-end interfaces. For indexing and searching AI based vector embeddings, SurrealDB now includes native support for exact nearest neighbour retrieval using Metric Trees. Similarly to the other index types these indexes are simple to set up and native to the database score, whether using SurrealDB as an embedded database, a single node server or scalable database cluster the indexing functionality is designed to work seamlessly, giving you the power and performance that you can expect from a database. head to SurrealDB/ix for more information. [Applause] Built directly into the SurrealQL query language, SurrealDB supports many different index types with a whole range of configuration options. Unique indexes allow for data constraints on single or multiple fields in a record, traditional indexes have support for multiple fields, compound indexes on array values or nested object values within arrays with the ability to combine multiple fields together.
Full-text search indexes are simple to define and once again sit natively within the SurrealQL query language, enabling users to efficiently index, search and retrieve results using relevance and scoring functionality. Vector embedding indexes mean that when working with artificial intelligence data and large language model data the index information can reside right next to the data itself. All of these indexing functions and functionality works the same way whether running as an embedded database or as a distributed cluster. Before I leave you to enjoy the rest of the day there is one more feature we are introducing today.
SurrealML is our first step towards bringing Machine Learning to the Surreal ecosystem. Here is Senior Software Engineer Maxwell to explain more. One feature we're extremely excited to introduce in SurrealDB Version 1 is SurrealML. Instead of just a feature within the database,
SurrealML is a whole suite of tools which is just the beginning of bringing machine learning and artificial intelligence workflows, inference and reasoning into the database itself. With this release we are introducing a new SurrealML file type for working with PyTorch and SKLearn models in Python. This file format powered by our Rust runtime allows machine learning model developers to train in Python and save the model and metadata to a portable and open source file format, allowing for seamless model versioning and execution across different Python versions, environments and platforms. Although powerful in its own right, the real benefit comes from
the ability to bring these pre-trained models into SurrealDB enabling model inference within Rust-based SurrealDB runtime. With embedded metadata and data normalisation logic stored within the sSurrealML file, the surrounding and pre-trained internal model the database runtime understands what arguments and values the model expects, allowing any data within the database to be inferred against the supply model. Whether running as an embedded database instance, as a single node database server or a distributed database cluster, the machine learning engine in SurrealDB scales effortlessly to meet the demands of today's applications. With SurrealML we are taking the flexibility and ecosystem of machine learning in Python and bringing it alongside the power and performance of Rust and SurrealDB. Whether working with raw data inputs or simplified model arguments, SurrealDB extends the power of Python machine learning without changing the traditional approach to implementing machine learning workflows in Python. This is just the beginning of our journey to bring machine learning and artificial intelligence to SurrealDB we are eager to see how SurrealML and the accompanying tools alongside the machine learning engine in SurrealDB, power and enable applications within any industry, from indie projects to startup products to mission critical enterprise applications operating at scale.
[Applause] With the introduction of our own file format with metadata and versioning included within the file, machine learning models can be greatly simplified, ensuring reproducibility and consistency in machine learning pipelines. Today, SurrealML can be used in beta within Python, with any PyTorch and most SKLearn models. In the future we will be integrating SurrealML with the Hugging Face ecosystem so that Large Language Models can be used and transported into SurrealDB for inference and reasoning directly on your data. Here is Software Engineer Misha to dive into how SurrealML can be used with SurrealDB. With SurrealML you can now use a portable and open source file format to package and embed PyTorch an SKLearn machine learning models whilst at the same time storing model metadata and normalisation logic alongside pre-trained models. This allows for effortless versioning of different
machine learning models and interoperability across different Python software versions, environments and platforms. Using our Rust-based SurrealML runtime, the file header stores the model name,, description version and data normalisation logic. This means that the processing logic that is required for all data as it is passed into a model for inference purposes, no longer has to sit separately in the Python deployment. Instead, as this business logic now sits alongside the model itself, the stability and reproducibility of each model is guaranteed.
In addition to the benefits that the SurrealML file format brings to Python, the Rust-powered machine learning engine in SurrealDB now supports ingesting a fully packaged SurrealML file, enabling performant inference within SurrealDB, whether at scale or embedded on any device. To import a pre-trained Python model into SurrealDB, a single command can be used to simply add this model to your database, scaling across database cluster nodes if desired. SurrealDB automatically reads and understands the model requirements immediately setting up a custom inbuilt function which can be used to infer results from the model itself.
Inference works with either raw data inputs for advanced usage or with field name key bindings packaged into the SurrealML file format itself. This means that seamless integration with SurrealQL object types allows you to work more quickly and consistently with models as they are updated. Head over to SurrealDB.com/ML for more information. [Applause] Using an HTTP root and combining this with SurrealQL, we can now import our models directly into SurrealDB effortlessly. SurrealDB automatically reads and understands the model requirements immediately setting up a custom inbuilt function which can be used to infer results from the model itself. Inference works with either raw data inputs for advanced usage or
with the field name key bindings packaged into the SurrealML file format. Developers no longer have to use external platforms or systems to run model predictions against data residing in the database. Instead the model logic can sit directly within the SurrealQL query language, extending the power of custom functions. As with Live Queries the real power comes when we combine this functionality
with the other powerful features within SurrealQL. Model inference can now be used seamlessly within events, Live Queries and other custom functions both as a traditional database backend, or connected to directly from the browser. SurrealML within SurrealDB will be released in beta this month so that you can start working with machine learning models right within the database with ease. thank you to all the team members whose hard work has been instrumental in the launch of SurrealDB Version 1, and thank you also for the incredible Experiences Team that has made SurrealDB World possible today. This is just the first step in SurrealDB's journey to simplify the lives
of developers and enable applications to interact with and build upon data in a newly imagined way. Each of the features announced today from Live Queries to indexing and to the start of bringing intelligence to SurrealDB brings its own power yet simplicity to applications, but it is the ability to combine all of this formidable functionality together within the Surreal ecosystem that will really lead to unbounded possibilities. Thank you.
2023-10-05 16:35