hi my name is William Wood Harter I go by wood I'm a Solutions architect for Luminoso and I'm going to go through how to use the API daylight has a really robust user interface where we can explore our text information and find all kinds of amazing insights it's only half as powerful as this amazing API and anything you do in the UI you can do in the API and there's many many things you do with the API that you can't do in the UI so it's worth exploring and you need some minor python skills to actually do this well the first thing we're going to talk about is how to set up with the API documentation looks like we're going to install a virtual environment with python we're going to talk about using the API and create a little notebook to do that first up is the setup so there's two forms of documentation the first is the core API here it's available to anyone under daylight.luminoso.com API V5 and here are the topics here and this part has the endpoint points that you'll use things about projects creating projects documents how to get documents how to get the terms how to get the concepts and the drivers and the sentiment and the rest of it is a lot of user navigation when I talk about those today maybe another episode or something and the most important ones on this are filters and concept selectors and probably have a whole session on that as well if you look at how to create a project again this is just a restful API you can call it with any language we have a python library that I'm going to talk about next but in general I've used this with JavaScript Java of course Python and to create a project you call Post on API V5 projects you're going to give it a token which we're going to talk about next you give it the name of the project and the language of the project that'll create the project and then you're going to call upload upload a bunch of documents and then you're going to say build the project and we'll go through that process at some point today we're just going to look at an existing project the next step is the python client and you can find that here it is under Pi Pi under luminoso underscore API pip install luminos or underscore API the next step is to talk about the installation and you need any python between 2.7 and 3.9 the way you do is create a virtual environment we're going to activate that environment that's how you do it on a Mac this is how you do it on Windows then we're going to install a couple of modules luminos API and then we're going to install a couple of modules Lumina so underscore API and this URL lib3 there's been some issues with compatibility on Mac an SSL library or something so I just backed it down to the version 2.0 has some issues on every Mac and this will probably fix itself eventually so the next step is to show you how to do that so I'm going to create a virtual environment python minus m v e and V the EnV and there it goes and it creates it this creates a folder with a whole python inside it and then I'm going to say to activate that Source dot slash v e n v slash bin slash activate and I run that now I can see I'm running that virtual environment pip install luminoso underscore API and then pip install urlib3 equal equal 1.26.6 .6 and that installs that now the next step is to get get set up on authentication how do we talk to the API with our account how do we get in what do we give it and that is going to be with a tokens if I go over to Daylight here I'm looking at a project here and I choose my name on the right upper right and I choose settings and I go to tokens here I'm going to generate a new token I'm going to create the token I'm going to give it a name API training I'm going to type in my password and then I say create token this is going to show me the token one time I copy that to the clipboard and I have to save it that's going to be the important part you save it the way you do save this is the easiest way is to say Lumi this came with the luminos or underscore API module Lumi save hyphen token and I give it the token that I just copied off the clipboard press enter and it's going to save that token in your home directory slash Dot luminoso and there's a file called tokens.json so all your tokens will be saved in there the python Library
will actually use that file so you don't have to save it in your code or create an environment variable or any kind anything like that there are ways to give the token use environment variables there's many ways to use it but this is probably the easiest way to go if you don't want to go through that whole process you can simply say a Lumi hyphen save hyphen token and it'll ask you your username and password at that point it will generate a token and it'll save it in that file that's the other way to do it so once you've got that token saved you can start talking with the API first things first this might be a way to connect if I want to put my token into code I can say client luminoso connect give it a URL in this case the URL is https if you don't have the S here it's gonna it's gonna give you some strange errors and it might take a little bit to to figure it out it just says hey there's no endpoint here and you're saying oh I did put in the right thing but it's not right because it's not talking to the right web server so https daylight luminos.com API V5 is the root URL for the API you pass that to the luminoso client connect function and you give the URL and the token if you don't give it excuse me if you don't give it the token it will go and look in that tokens.json file again here's how you save the token or you can use your username and password if I want this is something that I do a lot of if I parse the URL I have a little function here I'm going to copy and paste that into our first notebook first things first though I'm going to copy this code I'm going to go over and create a new notebook and say new python notebook here it is call it API training paste that luminoso from luminoso get the luminoso client and the next thing I'm going to go back over to my slides I'm going to go find that code for the project URL I want to use and then I'm going to print out all of those values you could write your own parser I'm going to put the project URL in there I usually do these in separate functions here in general I'm going to use a different one here because I don't know what that one is so what we're going to use today is the vitamin gummies project that's the one we were looking at before we went and got that token I'm going to go back a little bit here and the URL has it doesn't have API it has apps so that's the user interface projects this number is your workspace ID and we could talk about this every user can be in multiple workspaces and and every user typically has their own specific workspace so in general this is the workspace this project is saved under and this is the project ID typically you only need project ID but in this case when I split apart the URL it's nice to just go get the one here and paste it into my project URL this one can have highlights the way this works it can have highlights it starts in the beginning and counts slashes instead of getting something from the end I set the project URL value and I'm going to split it up and at this point I can go and see what is the API URL and there it is API so went and got all of this and created the API URL you do this because sometimes if you have an on-site version we have a lot of customers with on-site versions I might have a different name here your company name.luminosa.com if
you're in Europe it's eu.daylight if you're in Australia's Au daylight if you're in Japan it's JP Dash daylight I'm pretty sure so we have a lot of different hosts that you can use so you need that that URL right there we've got it got the workspace ID the project ID so you have all those values saved now just using that simple split URL function so the next thing I'm going to talk about is again the standard client connection does look like this where I give it a project ID I give it the URL I connect and then once I've connected excuse me once I've connected with that base URL I create a new client and I say client for path projects project ID so this is the next step in this piece of code remember I haven't done anything yet so let's go get both of those pieces of code so I'm going to client Connect using that root URL and then I have a client project and a client route so there it is and I can print out what this is oh it's going to print out the E5 URL and this is API URL let's run that and it comes back and it says remember we connect it up to our project ID we created a client for path from the root URL client to client project projects project ID now that'll be the route for this guy so any calls we make to here if we give it a path we'll start at project slash project ID and we say get the get on a project ID Returns the information about that project ID in Json what's the workspace what time was it created who created it that's me any description that was given how many documents are in it this vitamin project has 6797 documents what language it's in last time the metadata was updated the build info so if there was a build that kicked off there will be a build info here you can't go and get Concepts if the build hasn't been completed we can talk about that probably in another section session what the science version was when the build started when it's done sentiment is a separate build we have that here where you know we didn't skip sentiment the start time of the sentiment the end time these are all Epoch time they're integers since uh Unix 1972 something like that and whether it was successful or not and whether the core build I remember there are different builds the core build versus the sentiment build and then you've got this last successful the project name the project ID and the permissions that you have on that so that's how you get information about a project and we've already got our API going here so if you want to connect with an environment variable token this might be how you do it the code here is if there is a token in the OS environment let's go and get it and then we're going to pass the token on here so we just have a couple different ways to connect I'm just showing you how depending on your environment you might be running on some kind of back-end system and you're not going to have somebody go and say Lumi hyphen save token you just want to go get an environment variable and pass it in we do that a lot too if you're using another language when you pass that token in you put it on the header and the header looks like this it's the authorization header and the value is token space and that value of that Lumi token and so if you're using JavaScript or Java you're going to be setting that header like that I'm just gonna do a quick couple of things and get a couple of documents maybe get a couple of Concepts and then we're done and you know how to use the API and we'll start digging in on other sessions from there so the idea of a document is that it has basically three values the text the title and the metadata the text is what we use to process the title is just for displaying in the in the user interface we don't do any analysis on this and then the metadata metadata values can have strings numbers dates scores and this is what they would look like each metadata field has a has a type it has the name of the metadata field and the value for this specific document same thing with number it has a name it has a value and a date in a score so those are our different types of metadata so let's just jump right in this section shows we're going to look at this documents endpoint so if I'm going to get some documents out of this I'm going to say client and we're going to say clientproject.get the root of the URL in this case we are already there projects project ID docs limit how many you want and the offset so I could call this in a batch system if I have a lot of documents you don't want to say download me two million documents right now you have to do it in batches that's an HTTP issue that's not a luminosovicious common Web Service practice so let's just go and get a document out of this thing client underscore let's say Docs equals client underscore project dot get remember this one is at slash project slash project ID already so if I say slash docs which was outlined in that documentation here if I go to get documents remember it's projects project ID docs and there's a bunch lots of different things that we can we can use filters and concept selectors and limits and offsets and things like that and we'll talk about those in another session but if I just want to get some docs and I'm going to limit it limit equals one I'm just going to get one document here and then I'm going to print that out so this goes and gets the documents so um the result of this call is a Json object and it's a dictionary the result is in is a list it's a list of documents it only has one document in it the document has the text that was originally given it came half empty the title it was three stars here's all the metadata these are the metadata fields that are in this project it has things like the date of it the rating they give it a score of three there this was 150 count vitamin bottle of vitamins uh the time the type of vitamins is a One A Day Women's there's lots of videos on this specific project I'm pretty sure and I've added some other fields here we can talk about how those get added we have some other scripts that can add a sentiment filters to these projects as well we have plenty of sentiment but I added some other things as well the actual terms on this document so the term is come half empty so remember it is a stop word so that didn't show up as a as a term term but it does show up of where these other terms exist fragments are the non-co-location version if any of these are co-locations every document and every concept has a vector that's how we understand the relationships between all of the concepts and this wasn't done with search so there's no match score the document has a uid and there's another field in here I want to get there's one more so this is kind of it has a lot of information in this Vector but mostly it's just the data that we put into it there's another piece of information if I look at getting documents there is a flag here called include sentiment on Concepts this one's very fairly new it's very interesting and I'll say docs equals client underscore project dot get docs2 Docs sentiment and I'm going to say I want to get it with Slash docs and limit equals one and include synthetic Concepts equals true and then I'm going to print out these docs so this is going to have a little bit of more information now when I see these terms this term is this term of come is negative and confidence is 99 this term of half is negative of 99 and this term of empty is negative of 99 um 99 confidence so that's a way to get confidence you can do that on every document once the confidence builds and we can build really interesting things with that that's one of the ways that I built out this uh this other metadata field on there we can talk about that as well I'll do one last thing on Concepts if I want to get some Concepts on this I can just say concepts equals client project Dot dot get and I'm just going to say slash Concepts and I'm going to say uh limit I think I can say limit one at 10. oops client project I gotta type that right
uh it says that I need there's no limit on this so I'll just take that off and see what happens I think that the limit would go into what we call the concept selector oh I didn't print it out there is the concepts these are the top Concepts within this project again vitamins is the top concept it's relevance is how I would sort them when it comes back it has the texts and the terms and a Rel the next one is gummies the term is uh gummy and the vector for that the relevance the next is taste so we have the top Concepts in this project so that's the way we get set up and how to use the API The Next Step probably is to start talking about filters I can get all the top concepts for the filter of women's one a day or all the filters for the concepts of reviews of the one and five or one and two or four and five I wanna I wanna know what the top concepts are around the worst of the reviews and the best of the reviews so I can start getting that information out using filters and the other way is with Concept selectors which are interesting and I can say hey return me the top Concepts which is the default and then I can do this based on a concept list remember in daylight UI and with the API you can create concept lists say I want to watch the packaging and the transportation and the cost and The Taste and the flavor I want to watch all those Concepts I put them in a shared concept list and I can pass that here to get the concepts around just those terms I can specify a concept that I want information on I can get all the related Concepts to something say vitamins what are all the related Concepts to that you see that if I have excellent here and I look at all the related Concepts it's good great perfect and so I can get that through the API using related Concepts suggested Concepts these are concept clusters and sentiment suggested unique to filter and Driver suggested all of these have you know we could talk for a whole session on each of those and we probably will but that is the API training for now of how to get set up and how to actually look at get some quick information out of the API around documents and Concepts I really appreciate your time and thank you for listening
2023-06-14