Knowledge Systems and AI
I am. Excited to introduce our next speaker as our. Theme has been how. Systems, are fueling, future disruptions and yesterday, we, had mark russinovich come, in as the CTO of our Azure business, to give you a view, of how systems. Is impacting, our cloud. Business today. We have the CTO from our AI and, research group coming in Dave and COO David is also the, corporate vice president, for our AI. Core. Business, this is the group that. Powers. The. AI, capabilities. Behind, Bing Cortana. Office. And Azure, it's. The, group that designs, develops, and deploys the Bing. Graph, the Bing ads, marketplace. The. Office, knowledge. Graph and substrate. David's. Always been a big supporter, of research. As well and our engagement with academia he's the executive, sponsor for our relationship, with Stanford, and. Is. Very engaged with a number of communities from an entrepreneurship, standpoint, and connecting high tech communities, into, the Silicon Valley and. He's. An all-around good guy so. With that let me introduce David. Koo all, right. I'm. All-around good guy well. Good morning I'm. Very excited to be here I would say I would. Say in my many years of of working. In products, that I had really the deep appreciation. And, pleasure to work with researchers, and academia, I think. We're all working in this very advanced. Field, of. Technology, and products and that. Collaboration, I, just deeply appreciate, and the fact that Microsoft is so deeply invested in. Research in academia, I think. It's a blessing and I'm certainly, honored to, be here and what. I want to do is I want to focus, the theme on knowledge now, this is from my experience in. Working on in search in advertising, you. Know as sandy, said in office. Productivity. Azure. There's, a theme which, is around the ability to semantically. Model, and we just call that broadly knowledge, and I want to share a little bit of the journey but also, encourage. And hopefully frame, that for. I, transformations. Both. With. The business impact and in terms of experience, impact I, personally, think knowledge is one of the core capabilities, and it's a rich area for which we're still at the beginning of understanding. So, I enlist all your your. Creativity, to help us make, advances, in this area so. Let me start out by just talking, a little bit about, the. Promise of AI so there's a number that comes up which is 1.2, trillion, that's. Tea with trillion. With a T and that's, the estimate that. IDC, has of the, new incremental. Revenue. That's. Going to be created, in three years with AI now. To do that and for this optimism, to really come forth there's. Really this belief that there, is unlocked, potential, in the data assets. Within. An enterprise or within a company that, they can bring to bear to gain, new insights, and to. Change the way they interact with customers and to. Grow their business so that's that's, a massive, massive, expectation. In. The context of our engagements. With enterprises. That are looking to tap into the cloud, tap. Into AI in. Their desire, to transform, themselves, we've. Seen a pattern, and I want to list out the pattern, of what they're looking for in their progression. Of. Applying. In AI to their business it, starts out with applications. That are intelligent, right being able to take applications, point. Of point. Of sale specific. Line, of business applications productivity. And make that intelligent. Feedback. Driven predictive. Analytical. But. Be honest, there's. Also this desire to change, use AR to change the way that you interact with people, and. That could be employees, or that could be customers, and this is the wave of conversational, AI where. It's not just any set, menu or, interaction. Metaphor, it's really just this language. Asting, interaction, metaphor, and that, again we're still at the early stages, of that of exploring. That trend and. Then there's process transformation like, being able to understand, not just analytically. Retrospect. Dibley, the. Difference between bi and a is in some sense the ability to predict anticipate, and optimize, going, forward, not. Just looking backwards, and then, a general, desire that. The transformation. Of a company, is complete, when every aspect, of the system of the company, is, somehow modeled. That. Allows us to reason optimize. And to drive and so. That, that. Wave, of this value, requires. A company to really deeply look at what it takes to. Be an intelligent, organization, so let's dig into a little bit you know you, know it's no doubt that. The first step to becoming an intelligent. Organization, or for a company to be effectively, embracing. AI is to start with the data start. With data on silo, it in many many, companies. Starting. With product companies but also enterprise companies, they. Need to start to now look at untapping, the value of your data so lots of work in, understanding, the quality the data being.
Able To connect to all the pulses and the inputs both. In terms of real interaction, with the customers and internal, processes, and interactions, and. To. Be able to have the ability to. Model. And that requires us to really have a, foundation. For, rapid. Iterations. Experimentation. With. All sorts, of different modeling, techniques and so, this modeling, agility, with, the ability on siloed data our table. Stakes but. When you start looking at what it takes for a company to truly embrace AI. I would. Posit that there's a third phase which is the ability to, start. Shifting the mindset. That. Data is in fact at the core that there's a flywheel, where. Everything, evolves around not the existing, interaction, of the customers but in fact there's, a there's a knowledge there's a model, of the business, from. Which even software, and experiences. Are there to help us understand, that data and model better as opposed. To the, other way around and that requires a really. Fundamental thinking. Of what is the core asset of a company, now. When you look at all this and you say hey this seems fairly, abstract and, how, do we bring this to life is this really something that happens, in practice I want to share a little bit of the journey that we went through at. Microsoft, and certainly, we're not alone so this is not saying that this is unique. But I think it is indicative of, this general, understanding and, the, evolution, of capabilities. And mindset, that. Is in fact changing, the way we look at the world so let's start with being right beings a search engine web. Search engine you got lots of documents. You got queries, coming in in fact there's, a beautiful beautiful, interaction. Model of ranking. Feedback. Through clicks from, which you start building out a richer, and richer index. Of the. Web of the. Documents. And the concepts within so. As part of that over the DNS as. We did several, years back probably, 10 years back we, started working on knowledge graph because hey why stop. It just going to links why not bring, that information start, creating that information, or, that action, that can be directly, engaged without one. Hop away right so we start building this knowledge graph it turns out to be a pretty big graph two, billion plus entities continues. To evolve across different domains it's open domain many. Many web pages lots of techniques on it but, the thing that is interesting is. Around. A couple of years ago we start shifting it from oh this that's just that's just a Bing graph that's just the Bing graph it's only, useful for being and we realize that what. Underlies it is in fact a model, of the, digital world right these are the people places and, things, that. Happen on the web these are digital artifacts, on celebrities, people on locations.
Stores, And when, you look at it from that standpoint. We're. Starting to now create understand. Both. The facets, the relationships. Of. Things. In the world the public world right, and we can start joining with structured data and as a result, that, piece of asset that piece of asset created, a life of its own it's, now valuable, not just in Bing but it's valuable whenever you need a model of the world which, could be an office which would be in cloud and that's something. We're seeing this general pattern that you start out with an application, feedback, but, you start creating, valuable. Assets, and knowledge models, that. Have life beyond, that, scenario, and let's go into the next one so a little, bit later my colleague, Krantz on will come and talk a little bit about the Microsoft academic graph, and this, one think of it as this is a graph that models, research, and technology, innovations, and really. That process, of who. Communicates. And what published, what. Collaboration. From what institution. And where does the flow of new ideas, start. To now come, up from the minds of people into. A broader adoption and inspiring. Whole. Fields, and domains and that's an another example where you can now use. That aggregator. Signal, for many many different places the publication's, the fields of studies and start, to create that ontology that gives you a predictive, power on the. Impact, of new, articles. Or new fields, is that a hot field is that going to be in traction these are really, that model, that now takes a life of its, own that. You can now extend to. Other areas and, in fact it's not just the. Public. Domain it's also in enterprises, so in the case of office you would say hey office it's just a bunch of word out. Outlook. Email. SharePoint. But in fact, how. Look is in fact at this core, of how, people. In a company, interact. And communicate and, collaborate so, that flow of signal, allows us to start creating a model digital, work how, collaboration, happens on the on what topic, how, does people and organization. Impact, that how, do we now understand, the activities, that people do how, do we now understand, in fact the topics, and the customers and the interactions, to predict, the. Effectiveness of engagements, with companies, collaboration. Across teams the, likelihood, of information. Progress. That are usually. Scattered. In unstructured, way so we have massive. Systems where we've taken in, some sense that journey from being and our, technology, understanding, of and the modeling capabilities, and we brought it to, office right. And we're still at the beginning of this journey but, this is a case where now we're saying hey maybe maybe. Office, is in fact representing. A deeper, understanding of how. Work, is done digital. Work and can that create, more dhilae for experiences, but so can that now be bolstered to. Help companies transform, and, in the way that they interact with their employees, and customers so, these are just examples where. Knowledge starts. Out from hey, let's make an application smarter. To, starting to create data assets that. Can now inspire. And connect to new scenarios, now. Once you have knowledge. Doesn't mean that it's readily accessible, by, people right, knowledge is semantically, organized, and understandable, but, humans, ability to interact and engage and, to, conflate. Is. In. Fact unstructured, right so let's, take for example search, and question, Q&A you, can ask in any natural language anything. But you have to somehow map it to what's known and modeled in a knowledge well that could require technology. Parrikar's, understanding, both, how the model, language but also the information of knowledge within you can also look at enrichment, recommendations, the relatedness, of concepts, so, that you can start to now connect, the dots right, in the world view that we have at Microsoft, that Satya talked about which is this intelligent, cloud, intelligent.
Edge We, talked about this ability to connect, the dots so it's not just disparate. Interactions. With different devices it's. A multi, device. Multi-sensory. Coordinated, world and that. Coordination, that glue is in fact that connective, fabric of people of things. Of the environments, of the context, that is the, glue that starts to tie that, different pieces together into something coherent so, I want to now give some examples, of the, power of these experiences. When. You have knowledge and this is in fact the thing that gets is excited, but again we're at the early stages of it so let's start with. Nice. Animation, let's go with the Bing. Starting, with Bing clearly. We have, the. Ability to have, reasonable. And interesting, questions, around the knowledge graph so in this case what's, the size of Ritalin there's. The knowledge graph there, are different nodes there different facets, based on that we can now derive, the computations, to answer that in a much more directed, fashion right these you, can imagine this across a large number of domains. But. We've, also, recognized. That that doesn't capture all the information, there's still lots of information in the billions and billions of documents that may not be explicitly. Represented. And so, there's lots of advances, in machine reading comprehension new, modeling deep Q&A, and these, are things that we have in being and also in the malapa as. Well as many many academia. And research systems. That are out there and this. Is the casework now we understand, semantically, model, the, knowledge is within. Snippets. Different. Documents, warranties. And be, able to now have QA around, it and, in fact we've taken it one step further and say it's not just one answer in fact there are different perspectives because. When. You start looking at information it is not the facts only it is opinions, its perspectives. And the ability to surface, that and to, recognize, that there's also critical so we have this multi, perspective, answers now these are just examples, that. Now go beyond the ranking, of the temple links to. Now understanding. What, is the inherent, intent and the need that. People may have and how do we start. To surface, technologists. Within the web and a lot of that requires this too now investing, technologies, that, now to elevate, and goes beyond the indexing, the posting lists and, to start to look at the organization, of information and to, be able to reason around it now, in office it, turns out we're also starting to now bring that technology, and that knowledge infusion. Into, the. Experiences, that you all know like word so. In this case imagine you're writing, some. Article, you're writing a paper and you want to know hey I got, to be inspired, or tell me a little bit of contextual information so.
In This case where, we're. Systematically. Bringing what's available on the web from. The knowledge graph from. Internal, the company to. Now be contextually. Relevant to. The things that's happening, in, your workspace in the thing that you're currently working on so again it's a different way of bringing in that connected, tissue of, context. That we think allows you to stay in the flow if. You look at productivity in one way to define productivity, is staying. In the flow where. Your most productive as long as you can and bring, that contextual, knowledge in a relevant integrated, way to anticipate. That next step is in fact Oh mark of good knowledge, capabilities. And in, fact it doesn't even stop at, the. Word or the flow imagine, you're in Excel you got lots of different ways. Of describing, data. And values, and in fact that there's information, there, are referenced, that may be valuable for you to contextualize, so in this case for example you. Can imagine that let's say you have the word United, but appears in different contexts is and this is just to illustrate the complexity, and the, richness of the language but also the ambiguity so. In this case United. In the last, context. Is in fact a bunch of movies right, but we wouldn't know that unless we actually understood, the, other elements, be able to conflate and likewise. In other contexts, is it could be an airline it, could be companies. It could be European football, clubs. I mean it could be many many different, ways but again context, matters and the, ability to model that is, in fact rich and so we're only beginning to scratch the surface of, bringing. Information, and. Knowledge in the context. Of user. Needs in, context, or through explicit, queries and intent but, even beyond this the one thing that we've discovered as we start working on agents, and assistants. And BOTS is that. Whereas knowledge, may be considered. To be valuable but it's still a nice to have you can still get your work done with temporal links it can still get. Your word document, done without, being, inspired by contextual. Search but. Indicates the conversations. Especially, conversations. That, go multi-turn. You. Kind of need knowledge and the reason is the context, and the, information, interactions. That you have in one turn needs to be passed and transferred, to the next one so that you can start to reason, and that, fluidness, and that sharing, of context, across turns, where, the turns and the actions, taken on each term may be from different vendors, different. Applications, and different knowledge bases is in, fact one of the hallmark, challenges. And opportunities, in conversation, design so in this case imagine that. You're now going through this this, dis. Flow in fact I'm this is inspired by our semantic, Machines, acquisition. We were very excited to get Dan, Klein and Percy to be part of our family but in, this case imagine you say hey I want to go to two days before Thanksgiving. What. Is two days before Thanksgiving what. Does it mean on new york we're at the airport's how do I, understand. It how do I reflected. How do I understand, the facets for elaboration, these are all things that you would say yeah it actually requires, you to orchestrate, across many, many, back-end. Systems, and api's, each. One, has. An ontology in some sense of the, values, of the, capabilities, and so, in this case it's really about this ability, use, knowledge to connect. Connect. The dots across turns to, be able to reason and contextualize and, to, guide that discussion, and so in this case take, travel in New York be able to now recognize that nearby. Airports. Of New York are JFK, and, the certain location likewise, you cannot imagine each. Of the api's and each, of the systems having its own variant we need to learn the, Association, and kind, of do dynamic conflation, across, these now. With. All the things said with, knowledge. There. Are many things you can do to start to now bring. That knowledge in the context, of, interactions. And flows, but. How do we bring this to life and in, this. Is a rich area of research and. Product, efforts, and so I'm. Not going to go. Through the details on this because I think you're all kind of world experts but I do want to share a little bit of how. We're looking at the.
Dimensions. Of what. Knowledge systems, the. Quality, what does that mean and, also. Some of the challenges, that we have that. We think are really pushing, us to the limit but, certainly an area that we like to invite your. Creativity, and your collaboration, so in this case to, bring knowledge systems to life we start with the fact that hey data can be chaotic inga come from structured, and unstructured right. Lots of effort across, different, pieces but bringing that together in. Some coherent knowledge, production process to be, something where as fresh highly. Structured semantically. Understand but that's a challenge that's kind of the data chaos, to. The structured, semantics, flow and there, are different approaches, not. That. One approach is better but in fact over. A number, of these will work in concert right starting with people you, have Wikipedia. Dbpedia, like there are certain efforts that, really do require the, domain experts, to start capturing, that knowledge in some descent of away and to be able to now create, an incentive that keeps it fresh right quick Abita is the great example almost every search engine kind. Of looks at Wikipedia and say that's great let's use that to see it our understanding because, that's probably the you, know the best articulation. Of the, basic, shareable, knowledge, that, people interact and that's one of the facets, that I think it's important for knowledge it's not just important. For systems, to understand, it, is something that needs to be understandable, and explainable. To people right. But beyond knowledge. People. At some, point you're gonna run out of steam because at, some point either the willingness or the capacity of people are gonna, hit a limit and so, this is where systems, start, to come in so this this, theme of systems, for AI and AI systems, it's, in fact one of the hallmarks, where I think it applies beautifully, to. Knowledge systems so in this case we can imagine implicit modeling, we. Talked about the Machine reading comprehension we're. Still at the early stages, of it but again this is a case where we're now trying to understand, the. Shape to the language the shape of information, and the, shape of the retrieval so lots of efforts on that but. There's also knowledge, representation. Representation that, are explicit, these. Are the triplets these are the RDF these are the different graph structures, and they're, I would say we have lots of lots, of research and. Lots of production systems being. Google, Facebook. LinkedIn. They are all the systems, that are creating, these knowledge representations. Some, which are proprietary some, of which are public and in fact a lot of research systems psyche from, the early days all, the way to, the research systems that exist today so this is an active area now. What I want to do is is, just, talk a little bit about regardless.
Of The approach there, are different dimensions. That. We evaluate, and assess knowledge right is a knowledge system, correct. Like, what's the degree of correctness, what's, the degree of freshness, what's, the dealer coverage there's. A standard but when you start looking at it they kind of work against each other when, you hit extreme scale, in fact that's where we. End up which is it all sounds good for. 1 million documents it sounds good for 10 million it, works really hard, if you have hundreds, of billion and and, that's, kind of the chaotic web and it's not just accurate chaotic. Web it's chaotic, enterprise, systems and kind. Of real-life situations. So. With that let me just give some some. Examples, in the case of correctness, precision. Really does matter right because once you create, that knowledge, you, have to stand by the quality, if something is wrong you can't just point to the source and say well I don't, know I think I extracted. It right this is fake news it's fake knowledge bad. Well. You got to look at Authority you got to look at Authority you got to look at the, synthesis, how do I look at voting if I don't have an authority, how do I judge how. Do I get user feedback that's a, challenge, especially when, you have uncork, astray 'td. Multi-party. Sources, of information, freshness. Speed matters right things are constantly changing in many cases we don't even know what changed not. To even mention the ability to propagate, that update through the system in a way where. We understand, that some of the updates may in fact be, incorrect, so, you can actually start propagating. Lots. Of challenges with one mistake you can ripple and destroy. Pretty much everything and that's the, lesson we have and and then coverage, right size really does matter at some point they'll say great this all sounds good but is it complete, does it capture that domain so. All these work in different forces, and there, are lots of systems and lots, of efforts there, are in fact at the frontier, of this I want to list out at some of these to.
Both Acknowledge, that. Lots, of problems still remain unsolved, but also the importance, of these problems for us to get ahead in. This, in this world where I think knowledge is increasingly, important right unsupervised. Unsupervised. Autonomous. These, are all keywords basically, we say we can't play humans and depend. On humans to do the final verification in all the cases right how do i unstruck, sure when, I look at a webpage it's. Not just the head sites that I can create templates and do rapid. Induction. Site, understanding. Like can I understand, the unstructured, nature, of information can, I understand, and cluster can, I extract, facts can I test our hypothesis. Through verification, and validation like, that's that's, a whole body of work that pushes, from the head to in fact that torso tell, knowledge. In somatic embedding, right you got lots of today. Explicit. Knowledge that's present. Today through years and decades, of, human. Aggregation. Or capabilities. Aggregation, and yet we're building neural networks and how, do we see that how do we anchor. That knowledge, so that you build upon it as opposed to restart, from scratch and. I'll just pick another one like, multilingual. Right language fundamentally. Is multilingual in fact information on knowledge may, in fact be. Multilingual, and multicultural and. How, do you even represent, that in a way in which you, don't get to, precise, because, once you get to precise you get to brittle and there's no generalization. All. The way to you're overly. Bucketing. Things and you're like it's not that useful all right so these are lots, of active research both, within technology. Product. Companies, in research. As, well as in academia and I encourage continuing, innovation we will look to partner in any, and all of these but. Let's share, a little bit of our learnings. Like there are many things we did throughout. All, the systems that you heard about but. I'll just pick a couple that I think are really. Hard. Problems. And, we've, taken steps towards, but I think it's not a problem that's gonna easily be solved, but. I'll just use that hopefully to tee, up the, areas, where I think there's productive, research to be done one. Is the inherent complexity, of the real world we talked a little bit about it I will share some of the learnings that we had you, know dressing that the, second is, this, symbolic, versus neural, approach, where you can get the best of both worlds both. The ability to now explore, but. Also anchor, on the knowledge that exists, and the ability to make it an understandable, explainable, and. The last one in fact is something that we're seeing a huge swell of interest. Which is how enterprises. Can, start to take control to, understand, their unstructured. There are digital information, assets in a way that. Makes it available for them to change the way they interact with customers employees, right so I'll just list out some of you each one has his own challenges. We've made some progress for it but I wouldn't say that we've kind. Of cracked a nut so. Let's start with the inherent complexity, right just to motivate this a little bit I working. In web. Search is, beautiful. Right it's chaotic nobody, has control over anything. And. Things. Completely, change, at some break break. Break, neck, pace. But. There are three dimensions that really put pressure all the things, we do one. Is just the logic, which is hey. Lots. Of detailed, information describing. Different oncology's, across different sites at what, point does it meet makes sense to generalize at what point does it become. Common. Patterns, or across domains how, the way now and lists. Domain. Experts, and how do I now recognize that domain, experts don't exist the only one company that in fact it's decentralized, and distributed so, be, enough to tap into that and be able to reason on both. The domain-specific but, also the domain general, these are the common sense these. Are the basic things on units on basic. Understanding, of distance, like, there are lots of things that I think are shareable. And fundamental, well, the time dimension everything, changes, which. It's just that nobody tells us that they're changing and and, so, even coming up with both the ability to detect. The. Status updates. But. Also be able to see whether you can create even big incentives, where they are willing to tell you that, things are changing and so to look at systems to system integrations, and, under in the space you can imagine that the category, and the domain and the scope continues, to expand, across domains across.
Facets. Across. Entity so in. Bing, I'd say that there are a couple of pivotal, points, in which we've shifted. That. Really grew our horsepower to a tackle, some of these all this three of them the. First one is a shift, from batch based conflation, to a streaming based completion, right that's a fundamental. Shift imagine, it's, no longer that hey on Monday. Of every week you take all the data sources you do a big job. And you publish. A big, blob of a graph is it correct I hope so but you know that's but, it takes time it's like you know it used to be a couple of weeks and. But. The web doesn't work like that it's not a it's, not a discretized. Sequence. Event so we shifted into a streaming based system which changes, everything the, notion of having an incremental, base the ability to start looking at updates in a different way ability. To now look. At the incremental. Conflation, and evolution, and even, reason, about correctness, in a different way the, second shift we have is to, start to look at going, from the head to. The tail, right, and that requires. Us to dramatically. Scale, our ability, to both manage, the ontology but also start to go into deeper, site and domain understanding. In, an automated way all right again that's an area where we're pushing more and more but, clearly lots more to be done and I think the last one that. I think it's notable is our view, of correctness, right, correctness, it's, not like everything out there is correct, or in fact there are different shades of correctness there are different confidence, factors, and, yet nobody's there to be the ultimate arbiter, how. Do we deal with that how do we now make sure that inputs. And changes, that are proposed, our hypothesis. That allows us to reason score. Understand. The likelihood, of churn before. It goes into the rest of the system may be able to now get that feedback so these are just examples in, which the, current system, continuously. Evolving like we we like to say the colleague, Eugene Eugene gal who's in the audience who's running the satori team would, say that knowledge is always a life system it's a live system like, it's not like you build it and it's done it's a constant, constant, evolution so, lots.
Of Challenges that can imagine both systems, and, algorithmic, advances, that allows us to now tackle, some of these so that they can continue to grow and evolve with the complexity, of the web the. The, second, area is, something, we've talked about already which is knowledge, can be represented, explicitly. Or implicitly, and. We've seen great advances. In great promise, in the neural modeling of information. We've also seen the advances, on symbolic but their pros and cons right at some point the, discretized. Representation. Is probably, too big and too gnarly. Too hard to manage in. The context, of now. Joining. Against. The intent or context, and vice versa and so there's a there benefits of both there benefits of both one is understandable. To humans but really computationally. Not efficient, when. You start looking at the mirror ad of understanding. The relatedness, and all that stuff and yet, all neuro it's efficient, but it's sometimes you're not quite sure what's happening. So, we've taken some efforts you know I just list out one example but this is clearly an active area research that we hope that. We'll see some good advances. In breakthroughs we're seeing. That need to have actually, both techniques. Apply. Systematically. One, example is let's say you do have a neuro system that, allows you to model against some. Embedding, space and you have a knowledge graph and the question is how do I now use, the knowledge graph the bootstrap my understanding, and the, way you do that is in order to do this mapping we first understand, the. Hypothesis. That. Are available. In, terms of the questions and answers in the knowledge graph we. Apply, that through the system so it urns and maps. It against an embedding space oh sorry, I went too fast. Maps, it against in a better space so over time you can imagine that through, techniques like this or similar, we. Can hopefully start to now bootstrap. And connect. And hopefully even do this in a much more integrated way but this is an active area research that I encourage all of you. To look, at and see what we can do together, then. On the enterprise. Going. Back again on this enthusiasm that, people have around enterprises. And. Their knowledge but lots of lots. Of unstructured. Information. In, people's. Communications. Then the documents, people right people. Don't writing structured, facets. And fields and certainly they don't they write it with different variant. Degrees equality, and if, there's a study from, Gardner, that says over, 80% of. Enterprises. Knowledge. Is. And, unavailable. For, broader use and, that's that's a staggering number you talk to IP you talk to the companies they're saying you know I don't, know like it's all kind of black box to me and. The data continues, to grow so in the world in which data is growing massively they're, seeing tremendous, value but, they can't make sense of it because they're stored in different ways written in different formats and not, conflated, what do they do and that's a real challenge but that's also a real opportunity, and. So let me give you one example of. How, this is coming to bear. So. About late. Last year and early this year we engage with publicist which is a advertising. Agency. They. Have lots of agencies they acquired like 1,200, acquisitions. They have 200 sub agencies, they have 80,000, people each one, you. Can imagine having, their understanding. Of clients, their. Understanding, of the work the, advertising, campaigns they're deep understanding of brands, their talent, profile so it's lots of information so in their case what, they're saying is they, want to see the benefits, of scale right, here's an example of a digital transformation where a company is really at the very fundamental level which is the. Fact that I have 80,000, and 1200. Acquisitions, how do I get scale, advantage, I'm going to get scale you got to bring that. Reinforce. And. So what we were. Working with them is to start to bring together their. Understanding, of their talent, understanding, of their accounts, understanding, of the work they've done and start to create really.
Disability, - no reason map, and correlate, right this. Is something that I'd say that if you're an online company or a commerce, company you just do this but for enterprise this is all new stuff this. Is all new stuff and in fact it requires the deeper understanding appreciation, of how, do you reason with knowledge and to. Get that flying but in the course of these engagements we also recognize that enterprise, knowledge, is unique, it's, unique it's got there's only unique challenges, in two. Dimensions one is they're, they're you know data in enterprises not readily. Accessible to, everybody it's not like it's a public web documents, that anybody can read this. Is private email right this is documents, that are sensitive, this is lean or ACP, documents, there lots of information that. You just cannot, even see. And. Enterprises. Have obligations. Both, in terms of regulatory compliance, right. Certain financial data a certain HR data can't be shared there's. Certain encumbrance commitments. If you get data from third parties for, whatever, reason, you have obligations, to, limit, its scope of views you have security, considerations, different people have different access. Rights and you have privacy with gdpr that. Ability, for end-users, both employees, and customers to now have, control and that. Ability to influence, that data is critical so all of this is, creating a case where in many cases we're in are quite sure how do you build an AI model, when you can't, reliably. See, the data that's. A real challenge so within within, Microsoft, Office for example we. Have different practices, we, keep out the highest standard, and we in fact enlist. The users in this case Microsoft employees, to contribute portion, of their email for. Us to build the models but you can imagine new, techniques and new approaches, are needed for us to get beyond, this hump and that, is not necessarily the same algorithms, an approach that works in the web and in, public data can work in the enterprise setting so lots of innovations, around multi-party. Secure. Computations. Private computations, that are potential. Here but. Perhaps the most challenging, is that, their, access, to the data within. An enterprise is, of, mixed. Degrees. Of quality and completeness right, it's a bootstrapping, problem, hey I'm interested in doing all this but the data may not be clean, it. Has their variant degrees of quality it may. Not be complete in fact in. Order to bring that domain expertise, it's, typically diffused, it's. Not any one organization has. That depth and certainly, not the same the. Talent that knows the domain doesn't, necessarily understand, the, systems. So. With that I would say we're. At this stage where, might. We are seeing this desire. To. Transform, and knowledge. Is increasingly, this. Fundamental. Capability, that. Is starting to, reshape, the way we think about information and. Interactions. And so, I would just, say. That imagine. Fast forward a couple of years from now where you're saying imagine, you're living in a knowledge, rich world every person, has. Behind, him or her this, rich knowledge graph up there are activities interest, interactions, relationships. Every. Object, every physical. Space has its own knowledge graph. That's created from different organizations for, different purposes and, every service has his own associated. Knowledge. Bringing. All that together in, some coherent way where they're multi-party, they aren't working together and yet.
You Need to now attest and navigate, and interact in, a fluid way across is a real, opportunity. But, it's a real challenge and so, with that these are some of the questions in domains and and and. It, is hopefully there to inspire that I believe. We are heading in a way where we are gonna see a knowledge ecosystem, we're gonna see knowledge technologies, for both the production and the consumption knowledge, it will require, different. Teams and different people to work together and. We're still early in this research and technology, innovation, wave so, with that let me bring in my colleague consum. To, come talk to us about one example, of how. We want to push, on this process, great thank. You thank you David good morning, it. Is my pleasure and to, be here to talk to you about knowledge, system, and AI after. All it was eight years ago in this very venue, Microsoft. Research faculty, summit, in. 2010. That. We first described. Our ambition. To. Teach the machine to acquire knowledge from the web by itself. In. Addition in the ensuing years, in addition to the continuing. Investment. From Microsoft. That David has just described, we. Are glad to see that look the idea the little idea presented eight years ago has, received, industry-wide. Adoption, including. The, Google's, knowledge, graph, efforts. Two years after our offical summit, and the. Baidu's announcement. In siga our 2014. So. Today it, is my pleasure to, be, here to share with you the, next evolution. For. Us to, share the, resources we have from. The industry, research lab with. It. That. Consists of a data set and open, source tools that. Can facilitate research, with. The hope that more of you can join us to. The inner journey to. Advance, the state as part of analogous. Systems and research, so, let me start with the, data set this. Is as David. Alluded Microsoft. Academic graph. Like. The, pin knowledge graph it, is it. Is extra, it is built by, extracting, and large from the entire, web and so. We have the scale to. Cover the. All the scholarly. Communications. Published, in the past one and half centuries, so. Speak. Speaking, of an ecosystem this. Is knowledge graph that every one of you is. Already on as a node. So. Is every, student that, you have supervised, every. Institution, that has sponsored your work all the, journals that you have cited and the, conferences that have gone you have gone to present your work so. As you see the number is increasing, very is, is massive. And it's. Growing rapidly. Right. So. One. Of the, benefit. Of sitting, on top of being is we, have the broad coverage, so, as you can see that, we have teach our machine, to, cope a young computer science, and to. Cover more than. 200,000. Fields. And subfields. Including. Medicine art. History. And so on this. Broad coverage. Turns. Out to be very important, for, research. Managers, and, your, provost, and Dean's maybe to. Understand, the impact broader. Impacts of our research and for. Many decision, makers to determine. The investment, or research so. I hope that this is give you a some incentive, to take a look at of data but, for us you give us a strong use and to, push the technology to, make sure the data quality is here. Being. Researches, ourselves, we naturally, have published, paper to describe, how this graph is. Created. And what we envision, it can do and the, easiest, way are to fly this paper is 200. To remember, it is published in dubbed about 2015, and so, you, can just in. This system you can just type. 2015. And. Right. There you you, can see all the papers are published in this air in. This in that year and our paper is actually cutting reg number to. Other. Sexual results so, they are choosing immediately, you, can notice the, first is. The. Query, so. By recognizing the conference, as a first class citizen in, the knowledge graph we. Can we, can actually do more query. Searching. Better search experience by, not just keyword, matching, the. Query terms, to, the title and I, understand, that many of many, of my colleagues have, been using, this feature when. They are rounding the test of time awards committee are to find our.
Papers. And in, their citations, for. The particular, venue and this, - you can do you, can do it with a single query in this tool and, I, can, also, collect a report for, the past two years since, I discovered this new usage. I've been watching it and our, ranking, system, has, been able to predict. The, test of time awards, winners quite accurately, and that's, actually the second thing, that I was likely, talked about the, ranking here is actually, the full-blown, search ranking, and it's not just based on citation, counts as you can see the. The second the third search. Result is actually receive more citations than. Than us and. So. But the. Ranking, algorithm, here is actually, also estimating. The. Reputation. Of the citing, parties, not. Just quietly, counted citation, so I'll. Be more than happy to talk more about it with you afterwards. Right, so. If. This train continues, I'm. Predicting. The, w20 25 this. Paper about. A neural. Representation. Of a knowledge system, is, likely the winner of the test of time awards and this, paper is that coincidentally. Is from my colleague, in, MSR. A Gentile. Now. For many of you for speaking of Aegean who. Has who unfortunately, do have a very common name and many of you too have the common name to, get your research results. I'll. Aggregate. Correctly, on a search engine has been your challenge isn't it right, so, let's try it for. My. Colleague Jen. If. You try to sit with, your favorite, search, engine keyword, basic search engine you will see well yes oh they are quite. A few G and hung in, the system and the, results, are actually intertwined, together but. Because this. System is sitting on top of a being we. Were able to use, all, sort of knowledge including. Your CV, and resume is, published, on the web and you actually. Learn, how to disambiguate, different. Authors so, for example when in this system we actually when, you hover. Over, different names, in turn, internally, at the back end we actually understand, which gentle. Is which and not, to mention all sorts of other, variation. Our, physics, colleagues, like to publish, their their. Their papers with only first initial and last name and in, many publication. The extra apps are, observed. The Asian. Convention by putting family first rather. Than last and so these are all different, hypotheses. We put it here as. I alluded, yes you already seen you can just just hover, over it in this case this is our colleague, our G and what I want and just a one. Additional, click voila. All his, work is, aggregate. Correctly, and, showing here, so, this is I hope this is giving you a quick. Understanding. About, what, the knowledge, system can help us and how, you can enrich, the. Experience, we. Have and so, let's now switch. Back out you, the PowerPoint, so. Here's. So if you're interested by this useful I'm interested, these, are these, are the resources we're. Making freely, available to, everyone the. First one is the graph the data set is available through, Azure we. Again. Really available, to everyone and there's, a also, and in the same folder you will fight a lot of open source tools they, hopefully connects with you jump start all.
These. Operations. To deal with large data set and chronology, systems we, encourage you to take a look and. As. You see we have already more than 100. System. 21, and 27, systems, that. Are already published, based on this data set I invite, you to take a look and tell, us what you can build more. Information, is that URL or, a k-8 ms/ms, are, a cat with, that thank you for your attention and let, me welcome take it back alright thank you. Let. Me wrap it up. If. You roll back at, the beginning what we're seeing is, this. Emergence. Of an. Appreciation. For knowledge, knowledge. Is gonna push the limits of systems in AI but. It has a transformative. Effect in, the way we think about products, and we think about companies, and this. Is also a rich area where, we're still at the forefront of that technology, race, and that technology push and, with. That I want to really thank all of you for your attention your participation, in the faculty summit I'm excited, the Microsoft is such a strong, advocate. For academia, and for research and. We. Are looking, forward to seeing amazing things. From. All of you and from our collaboration so. Thank you. You.