okay I think we we can we can start uh I will uh thank you for joining uh this uh new first session of 2025 for our AI jam series and um audre and I will be presenting today or she will be presenting today a new um panel discussion uh before we do that uh I would like to quickly introduce the HP developer Community uh with a couple of uh quick slides here mostly to tell you about some of the things we try to do for Skilling people up and we have an entire section of the website dedicated to that on the developer hp.com skillup um as you know this is uh the AI jam series or more more precisely the keep it keeping ear real with AI jam series and uh we have a a session plan in February uh we don't have the registration link yet but we know the title we are working on the abstract and and the rest of the content but we're going to be talking about AI inferencing at the edge uh from HST to space so I don't think you want to miss that um that's going to be February 12th we also operate two additional um Series in as part of the developer Community one is called the Manion learn and these are thought leadership vendor and product agnostic types of talks we have two planned uh for March and February and March one is um actually this is wrong this is February 29 I'm sorry about that uh this is about um private AI um with uh sorry it's February 19 and I'm getting my dates wrong uh it's unlocking private AI power with an example it's a use case about insurance fraud detection um and this is on by one of our experts Jordan Nanos we have another one in March and that's again March 19 uh and this is about how we implemented um our own um kind of chat GPT within HP called chat HP and we're going to have one of the main um uh main Architects for the solution talking about uh uh the how we implemented this uh using private uh Cloud AI we have another type ofof talk called the meetups this goes into a little bit more detail about technology and uh sometimes also open source and uh January 29 that's the date I was getting wrong earlier we have one at the End of This months called uh en leashing AI Innovation a deep di on the HP private Cloud AI software stack um I know we have a few s we had few sessions talking about the hardware part of the solution but we uh we didn't have so many about the software stack so that's uh going to be an interesting one in February we'll be talking about Green Lake web hooks uh so that's technology we are introducing to integrate with Green Lake uh using mbooks technology um to automate or integrate uh with with HP Green Lake and finally we have a topic the title is not 100% defined yet but it's going to be on ebpf if you're interested about how you can build those kind of soundbox under Linux kernel uh join us on March 26 I think it's going to be interesting and it's delivered by uh Neil um one of our experts in opsramp in the opsramp team and that's it for the for the session for the workshops just the last point on the bullet on the list here uh we have another way to learn new technology called workshops on demand we we started this initiative back in the pandemic days where we couldn't deliver workshops uh in in person so we build that solution which we kept since then as an ond demand uh solution so we have a catalog of about 30 workshops uh we deliver those through Juniper Jupiter notebook sorry and um and you can get those 24 by7 for free it's available to anyone over the Internet so take a look at the catalog uh you'll find things um open source things you know like getting started with kubernetes with Docker with rest API with Git uh but you'll find also programming languages things like python or rust and also some product uh from HP uh like Green Lake apis or sustainability Insight Center and um ILO one view you name it give it a try if you do provide us some feedback because we can grow this catalog over time and we'll be happy to hear about some of the gaps you can reach uh the dev community on on different touch points uh first first of all there's a website so remember developer. hp.com that's pretty easy U we have an external slack you can register called HPD on slack.com we are also present for the HP and partners uh in the room we have ask hpde community and that's internal employee and we have Community oops Community HP I forgot the the rest of the oh sorry about that Community HP developer Community external so that's for partner it replaces the uh the Yammer group that we had before we have a newsletter that we send every beginning of the month uh you can subscribe we have an email uh we have a a NX account uh and we have a link to the workshops on the month that's it you can scan that code if you want for all this in one place and that's it for me and with that I will just turn it over to orre for the rest of the session okay thanks T so um is everyone is everyone able to see the lineup we have for today on the screen okay great all right so welcome everybody uh today's conversation is about building ethical and trustworthy AI systems and this is a this is a very large topic and when we say that we're in this you know era of AI or sometimes we say AI everywhere and we consider how these technologies will be incorporated into our lives and how we interact with them as employees and technologists and as organizations um we now begin to think well with these new tools what kind of responsibilities are associated with harnessing these tools so in the spirit of having you know a happy it new year since it's still January we thought we'd kick off with a topic that follows the lines of the you know eat the Frog methodology so do do the hard thing first in the beginning that has the most amount of of value so I would consider this to be one of those topics and we have a wonderful panel of guests today who span our whole organization really so we're going to look um at our HP AI ethics principles and and what does it mean at an organization level um to be using this Technologies and how does this align with the values we have as a corporation and then we'll go into well what does this look like within within the business and and how do we operationalize this and then we'll take you into our uh HP Labs um and learn more about the model selection process and in a loone application use case on how we're using um AI but how we're also approaching and considering the things um and all the attributes around building what would be a trustworthy AI system and trustworthy just meaning you know is it reliable does it um consider biases are there security and governance and so there's a whole number of things that go into this topic so hopefully we can bring you a taste of it today so I want to introduce today's guests uh we have Pam Jeff and sahad here and I'm going to turn it over to each of them to introduce themselves and um a little bit about their role and what really gets them up in the morning so Pam over to you hi thank you Audrey uh so I'm pemwood uh my team and I we conduct human rights due diligence across many aspects of our business looking for risk of harm to people ensuring that we're protecting everyone's human rights and partnering to identify and support mitigation strategies so this includes due diligence in our supply chain in our operations sales Acquisitions and of course AI Solutions and what gets you in the morning Pam um I would say the ability to make an impact you know Tech making sure that technology is used for good and in line with our company's Mission and not being misused by any actors um and of course inspiring and enabling others to do the same that's a wonderful life mission you have a wonderful job in role at HP I really do Jeff hey folks Jeff Oxenberg uh I lead machine learning engineering for a team within HP called sales and solutions engineering um our primary role is to work directly with customers to implement proof of Concepts and scale those out to repeatable Solutions um really enjoying uh the the work here with hp's AI ethics working group um for which I've been a part of for the last several years um my research interests are mostly around um the security and robustness of machine learning systems and the intersection of people and technology in um socio Technical Systems and um what gets me up in the morning um I think our our customers I'm in a customer focused role and um the time that I spent with the a ethics working group and um which I'll share later being able to work U with our sales teams and with our customers to ensure that all of the AI projects that we as HP Implement are robust and secure and trustworthy uh and responsible um for our customers really gets me up in wonderful thank you for all the work that you do with our customers and sahad over to you yeah I'm s for part part of AI research lab for I mean AI research is part of HP lab uh working on different AI topics in our team uh such as robustness safety ethical Ai and also uh we are trying to also if uh HPS customer have any AI solution we are trying to address their problems and solve their problems so we work on varieties of problem uh so I would say safety robustness and ethical AI is in the core of every AI solution uh and it's HP is strategy and what gets up in the morning is desire to learn our fi I mean is evolving every seconds I mean when you wake up every morning you see a new publication so definitely I desire to learn every morning that I wake up I want to learn what's going on in i f so in that way I can be more helpful to HB customer and also my teammates and uh definitely that would be my first priority goal to evolve as a person so I would say that is what get home in the morning yeah and that's an I mean I think so many of us and probably a lot of our listeners to e Echo that in the sense that there's so much to learn in this space right now and it's evolving really quickly that you you take a week off and uh your phone summary of what's on substep back and what's been released um from research is just really evolving it's evolving really quickly so thank you for having that spirit Spirit of continuous learning okay so here we go um we're going to start with um a poll for our audience and uh we're just going to ask you if your organization has AI principles or uh anything that you're aware of that helps guide product development decisions and what that's like in your organization today um so I'm curious to see what people have to say um we know that this is you know as a technologist sometimes you know we don't necessarily look up and say okay well what is the business developing and and putting out and what should I be following but um we have seen more um and in talking with some of our Professional Services organizations who host workshops as well around AI strategy and development that when we look at the adoption of a new technology like we are seeing um with the advancement of machine learning deep learning and into AI it it starts to broaden the conversation about well what's the mission and objective of the organization what are we trying to deliver as products and services to our customers how is this technology enabling a digital transformation um and and evolving the experience for the customer or for the patient um and for the consumer and so it's becoming a lot broader now all right let's see let's see what we have oh interesting okay this is great to see this is actually probably a lot better than I anticipated um which is a good thing I would think so 52% said yes their organization has AI principles and it helps them guide their development decisions very few people said no and others are just not sure um so I think I think the not sure is an interesting category because it it means that we're all in the potentially this early stages there's so many organizations that maybe say hey we want to get started with AI let's just jump into it right let's try out some use cases like let's get some data let's build some models and then the organization says well you know what's the business problem we're trying to solve how are we trying to help our customers and and how can we be thoughtful and and considerate um in the choices that we're making so this is interesting thank you everybody okay I'm going to hand it over to Pam to walk you through how we've approached AI ethics at HP great thanks so much all um as a little bit of background hp's AI ethics emerged from our human rights work um here in the ethics and compliance office and in partnership with labs in 2019 we wrapped up a companywide human rights impact assessment and that identified some new Salient human rights risks for us at HP including responsible product development and responsible use of technology and we had a recommendation from our thirdparty experts who conducted this with us to focus on AI specifically as perhaps um the single solution or tool that we had that had potential for most harm of course most opportunity as well um and so we went around the business at that time explaining this opportunity um to establish some AI ethics governance and explaining the risk to almost every be and function within our company and we had full interest without hesitation from everyone so um in April of 2020 we established our AI uh governance structure uh and that includes really two main components an AI ethics Advisory Board now called the AI ethics responsibility committee that has pan HP executive level representation guiding our work in this space and they respond systemically to the risks that we identify over in the working group so the board provides oversight makes important decisions that bridge both business and ethical considerations um and signs off on much of the work of the working group transitioning over to to that that other element the working group is where Jeff and I on the call here um sit and have been an active part of and we roll up our sleeves to dig through tough challenges and apply a multi-disciplinary expertise um to drive and operationalize our principles um that includes building our processes um conducting training and awareness and of course assessing all AI Solutions if we flip to the next slide um we can share one of our first tasks was just defining our principles so so great to hear that um so many of you as well have principles in place are aware of them or or perhaps think you might have them in place and would encourage you to really explore that um we at HP knew back in 2020 that our principles would really be the foundation for all of our work on AI ethics there are extremely beneficial um so they frame all of the trainings that we do our engagements and most importantly how we assess risk to people for AI that we develop that we source to use or that we integrate into our Solutions when coupled with guidance principles give our team members the ability to consider ethics by Design as well so a little bit of history on this how did we land at these principles um well we in hp's AI ethics working group we established a subcommittee to develop these uh first we researched principles that were already out in the public domain and sought to learn from others but why they chose them and what was important and then we took a lot of time to carefully consider which principles were most relevant to our business and our partners and we consulted with external human rights experts folks internally and flushed out and really debated um each word of each principal before seeking approval from our executive level committee I'll just briefly take you through what these principles are uh first of all uh privacy enabled and secure um which is of course extremely important to make sure that all of the appropriate safeguards are in place to protect personal information um and of course especially with the rise of generative AI to protect uh sensitive business information as well we want to make sure we protect that against Bad actors but also just um sloppy practices if you will the second is human Focus uh we really believe that respect for human rights um must be in place that AI Solutions should be designed with mechanisms and safeguards um to support human oversight and prevent misuse human oversight is really needed uh for deployment and use of AI so AI should be augmenting human activities not vice versa um and Technical oversight of AI solution is required to allow for continuous Improvement um what we've seen all too often is an AI solution is developed but not maintained and properly nurtured to be fully effective and to be fully ethical as as it time goes on and of course misuse which can be intentional or in what I think is most common uh due to a user's lack of awareness of the risks and those necessary safeguards that need to be maintained over time our third principle uh inclusive is really about minimizing harmful bias ensuring fair and equal treatment and access for all so this is you know for example ensuring unbiased inputs into the solution checking for unbiased outputs making sure it's accessible to those who need it and in a fair way and building in diverse representation as well we have responsible which actually contains about four different components that we usually dig into here one is transparency um making it making users aware that AI is in use uh and and not trying to hide that in any way explainability uh which can be both for users to be aware of why AI is in use or for what purpose or what information is collected but also for the folks who are managing that AI solution to be able to understand um at some level how that AI came to a solution or an output and that goes back to my previous Point around continuous Improvement it's also about accountability so uh for HB at least most of our recommendations on AI we assess ensure that there is some channel for users or those impacted by AI to be able to raise concerns about the AI not just about how well it may function but also about if there is any potential negative impact on people around that so having a grievance mechanism in place and sustainability uh which is a tough one you know but really thinking through how do we use AI in a way that isn't too energy uh intensive and and be responsible with our use of it in that way and finally robust a really important one um it's about building in quality testing including safeguards to maintain functionality minimizing misuse and impacted failure so um there's sort of three elements uh that we typically talk about under robust initial model performance so let's select the appropriate metrics and evaluate accuracy from the get-go it's about ongoing performance let's have a designated person who's responsible for overseeing that ongoing monitoring and continuously improving that performance um you'll see a Common Thread here about continuous Improvement really um the better the the better that an AI solution Works tends to align with how ethical it may behave as well and resiliency so adversarial attacks could Compromise significant amounts of data um while system downtime could impact a significant number of people if it's a solution for a critical function so that and a nutshell are our principles um they may sound pretty similar to some that that you may have in place as well and uh we do feel they align quite well with many of the industry principles such as oecd uh with whom we actively engage yeah these different these different principles when I think about various AI use cases that I hear across Industries some of these become um just very important right in particularly on the type of use case that you're trying to um identify or build towards so anything that's really I think of a lot in he Healthcare and life science is anything that's affecting at human health um especially in a realtime manner right like when we think of what robotics would do um for people like these types of principles become so important in those use cases that they're really core actually I'm curious how these hold up over time so you know if we okay we were thinking about this back in 2020 and now it's 2025 how have these evolved or held up and and how do you like as even as an organization say okay do we need to revisit this yeah we when we drafted them we had in mind that it would be a bit of a live document if you will and we'd continuously update but in practice what we have found is our principles have really remained fit for purpose and proved expressly important when generative AI rolled out um what really changed for us was the degree of internal collaboration that we've had uh since generative AI was introduced so um and that's more about internal actually collaboration internal Partners such as legal and cyber security have more at stake now with generative AI than more traditional forms of AI um and often generative AI requires stronger scrutiny and controls and so that's allowed us to really build out a broader AI governance structure across the company for which AI ethics is now just a part not the only part um and to uh share our ways of working and together introduce U more efficient ways of of assessing and supporting all of our folks who are planning to use AI internally or who are developing AI Solutions or products wonderful so we're gonna pass it on now let's see make sure to Jeff um tell us more so you know Pam just walked us through what we would look at from the organization level gave us a set of values and principles to align to so now how do we how are we taking that into you know our day-to-day jobs and roles what's that look like awesome yeah um so I think that this is probably um my favorite part of my job um just seeing the transformation in how we went from in 2020 like Pam mentioned uh forming a committee to uh develop these principles now to operationalizing them over the past 5 years um some of the lessons that we've learned have been awesome and yeah I'm here to share those with you folks today um so how do we operationalize these principles within HP um for every AI technology that we have um either a product that we're intending to develop a partnership that we're intending to form um a process that we're intending to augment with AI or even a customer engagement we all we funnel these through the same uh intake process so a member of our working group Jeff F shout out Jeff if you're watching um did a lot of amazing automation to develop an intake form and a process to triage and automate a lot of this work um but when we started out in 2020 like Pam mentioned um kind of trying to operationalize these principles um hp's been an AI company for a while now right U but there hasn't there wasn't this generative AI boom that led to so many AI projects so things were kind of easy to maintain manually um we convened an assessment panel um composed of members of our working group and our working group is comprised of members from around the company with very diverse backgrounds and I think if there's one uh tip or one takeaway for everyone here today it's when you're developing a working group in your own company develop one um that has a very diverse set of experiences and stakeholders from around the business so we have people from security and privacy and human rights and HR of course but also from engineering and labs and sales um actually over half of our uh working group is comprised of technologists um people who deal with technology um in a more of an engineering uh capacity dayto day and so in you know 2020 2021 2022 um this kind of the manual process worked uh worked very well we convene an assessment team um we schedule a workshop and then we go through this one-hour meeting where we provide a risks and recommendations after kind of just asking these tough spicy questions um about the uh project that's under um under assessment what could possibly go wrong here what's the risk to people um what could you envision the harms being um and what we found is this proc works very well but it was tough to scale um so we ended up building out a bunch of really robust automation to do like automated triage um identifying high medium and lowrisk um projects and activities and then triaging them appropriately to assign assessment teams quicker um one of the Al it's really the only um barrier that we had and I had come from kind of the sales engineering role so I interact with a lot of our sales teams is our sales teams wanted to ensure that there wasn't really any more bureaucracy being added to the sales cycle um but what we found is that by Expediting this process by having a really robust triaging process in in working with our sales teams um what we found is that they've become a huge Ally of ours you know we work directly with our customers um and I've been on the phone with our customers speaking to them about this process and how we ensure that the work that we're doing for our customers is robust is trustworthy and is ethical um and we found that this instead of being you know perceived initially as maybe a roadblock for our sales our sales teams actually and our customers by extension um really enjoy um and provide uh and see value in US working through this process with them um so it's been a huge Journey from 2020 kind of doing a lot of manual work here and now this experience five years later we have this very robust process um that we're actually you know I informally consult with our customers on how do they build out a similar uh trustworthy and robust and responsible AI practice internally um has been really really awesome and like I mentioned before my interest primarily has been um throughout my career um like socio Technical Systems right how these systems um that we build in a day-to-day basis my background is machine learning engineering and should be the data processing systems um how these systems actually interact to affect people's lives um it's been an amazing experience uh here just to see this transformation and operationalization of our our principles yeah I think that's such a great just realization Jeff that it's so fun to work with technology and it's fun to just like develop something new and interesting and say oh look what we can do this is this is amazing and then you realize that so much of what we do in technology is to the effect of people and so what you just said just you know really resonate to say okay like what am I developing and how does this impact my customer or how does this impact even the employees in my company if it's an internal tool abolutely yeah this topic too you said you know at first you can think and say oh well if we do this it's going to slow down Innovation right like let's not make barriers to um our our big thinking or like let's not slow down our efforts but in the case that you just said it was not it's been more the opposite right it's more okay we're you know developing intentionally in a direction that we're confident in um and by automating this um it it's helping us to still to move quickly exactly I think doing things responsibly ultimately like enables Innovation and um that's definitely clear over the past several years of working in this um in this group is that like the more responsibly you can do things um the more it easy it is for Innovation to happen sure absolutely okay so speaking of responsible so we're going to turn it over to sahad to walk us through what does it look like to adopt generative AI models for uh in this case a lone use case and and at this model selection process as a developer how can you go about assessing what models you use so sahad I'll turn over to you to explain a little bit more about this this use case uh thank you so much so uh in AI lab we try to we aim to introduce HP as a company providing AI Solutions also we see significant potential in large language models robot safe ethical AI uh so this aligns with hp's strategic interest and address both internal needs and customer customer demands so uh having that in mind that large language models have a significant potential uh they need there are a lot of challenges when you're choosing a model uh for your task as you mentioned for example if you have a loan assessment use case that given a prompt or a context of a client at Financial attributes it could also include demographic background of the client the llm or the large language model is tasked with uh giving you the uh the loan status either it is approved in this case means the client has has a good loan status or it should be denied which means the client has a bad loan status so having that in mind that the client can have a either bad or a good loan status so if you see in this uh uh diagram uh we are working on threee components we have a recommendation in G that given a description of your task we find or recommend you the best suited large language model for your task and then we further we further evaluate your model on uh some other criterias that I'll talk about it later and also there's an opportunity or to refine even further the output of large language model the chosen or recommended large language model so we trying to uh make it available to the user all these three components so he can save a lot of time not going through all these uh challenging phases uh they could could they could take a lot of time to come up with a large language model uh for for desired task the evaluation could take a lot of time it needs a lot of resources gpus if you want to evaluate thoral that could be take time consuming and definitely refinement also that would add additional value uh to the whole process so as also also I need to uh mention and stress on it that the ethical part also can play play a role in all these three phases so as also I shown next in next slides uh we also try to add address the ethical part of yeah the ethical part of the the ethical concern on both recommendation and refinement phases so here is this example prompt uh given a client's Financial attribute the llm is tasked to say okay is this loaner status is good or bad so yeah next please mhm so uh why recommending an llm is recommending llm is important or is challenging let's say okay let's just pick up a popular llm if you see with the Lan status use case the accuracy is pretty low if you go just with the popular llm like Lama 3 1 billion inst the accuracy you achieve with the loan status US is 3030 33% so uh it's awful this accuracy so going with a popular llm probably is not the best choice next please so what if you go with a larger llm let's say 70 billion parameters does it help you yes a little bit you can see with the loan status case your accuracy improves a little bit 53% but it's still not good that that's what not desire uh next please so so this is where a recommendation engine comes to play if you see uh given the same prompt for the client's attributes the accuracy uh the recommended llm by our engine achieves an accuracy of 82% which is a significant Improvement uh and you can see the financial attributes uh have been assessed properly or appropriately uh reflecting the real laner status the H quick question so I mean I think this is an important Insight that it's like okay larger isn't necessarily better right and depending on your use case if you're dealing with a foundation model that's tailored to the problem that you're trying to solve right from your experience is it easier to understand that training data set is it easier to understand like what goes into it so you can be sure of what comes out uh I mean definitely for example this M Mortgage look loan is I mean is a small large language model trend on the financial uh or Mortage Loan Data set right so that helps but also uh it could uh it could have some biases or not not be able to assess the financial metrics properly let me give you an example what I observed or what we observed uh that when you for example if the loan is about uh car app I mean if you're it's a car loan versus a home loan this large language model even the one that is train on the mortgage loan could have make mistakes could make mistakes uh uh cannot differentiate different types of loan for example a car loan needs to be addressed differently the financial metric is a little bit different when you compare it to a home loan I mean in a car application when you're applying for a car loan usually what matters is short payment affordability because the car loans usually are three years 5 years and what matters that you can pay off quickly in the short term uh so the payment affordability plays a big important role when you're applying for a car loan but when you're applying for a home loan then your financial stability has a more importance or plays a big more plays a big uh bigger role when uh you're applying for a home loan so the large language model needs to differentiate between these types of loan so just training on a mortgage loan doesn't mean that uh it's going to do good for the loan status but in this case you can see this recommended uh uh recommended llm achieves 82% and we can further refine with our uh as I mentioned that we have also refinement component that also can be further refined uh the biases the ethical part of it can be addressed and also in this pH we can address that part as well yeah I hope I answer your question yes you did so when also why it's challenging to choose the right uh model for your task so if you see you have different criteria when it comes to choosing uh the right llm how good they are as we talked the hardware requirement I mean not all people have the same Hardware resources right so that could also impact your decision on which language model to choose the cost I mean they can they could be costly right running on Amazon server they have different costs uh time to assess each loan I mean these different llms have different speeds based on their sizes the license constraint and also what uh the the fair and ethical part that is our topic here are they Fair are they ethical are they aligning with HP is a strategy for example when it comes to fairness and ethical AI or are they good at explaining the decision because that's very important right for a uh loan loan agent because this is assisting a loan agent right so are they explaining the decision really good so these are different filters or criteria when it comes to choosing a large language model for your test you want to have all these options and having these option and we provide all these filters for the customer for the user so it makes it easier their job to choose the right llm but if not imagine not having this recommendation engine it's very challenging to I mean to write to choose the right model it's very time consuming it sounds like a it sounds like a lot of work right in a good way like a good this is this is the the good work right where asking yourself these questions um I think it's important like for the development to help you truly understand you know what you're creating because I I guess like you know one thing I hear is that and this goes back to like our our data analytics basic data analytics right okay the data that you get in garbage in garbage out was the same so um in this case too with AI sometimes it's the idea that well do I know what went into it and then do I know how to classify what was generated or why it came up with what was generated and can we explain that so I think just asking the questions is important to the comprehensiveness of our Solutions yeah definitely I mean you can get frustrated when choosing the right llm we had I mean we had a presentation or demo of this work uh Barcelona discover Barcelona and a lot of people were approaching us and they were frustrated they were asking how we can do that and they they show they were interested in our work and I mean we got in touch with some of them uh but yeah I mean you can really get frustrated when choosing the right l so definitely having this tool available makes uh a lot of people's job easier uh yeah yeah that's very very reassuring for anyone out in the audience that is is going through this work right now that uh yes I mean this is a work in progress and a theme effort so it needs a lot little bit of patience to uh evolve it in a better way and uh more accurate and more mature so it's work on the practice yeah so yeah this is in nutshell what we are offering so uh the realm which is the recommendation engine yeah the realm is the recommendation engine and uh we specify some domain specific benchmarks and the realm comes up with the initial list of model and then it gots it got those initial list of models get get they get evaluated on those domain specific benchmarks and then they are passed to the herd which is the evaluation component or the evaluation engine and then uh it f it evaluates those initial list of model recognized by the real and evaluates them on the domain a specific benchmark and scores them or ranks them and user can choose among them and it gets evaluated on user task and the user can further how to see how they are performing on uh on his task so that's basically what we're offering by realm and heal the recommendation and evaluation engine so this is for example the result out of Realm you can see how the scores uh those initial list of models uh we are offering free stages because uh the user can have different concerns so I mean these three stages they're they're the most important factors probably when it comes to the user Minds so stage two is just the accuracy for for for example the loan use case right the accuracy that is performing based on the ground troof given you can see the mistr achieved uh 82% versus some of the large language models like 70 billion Lama only achieve 53% uh the mistal large uh achieve 73% and uh so also we provide the explainability as scores if you are interested how good they are at explaining the decisions and the financial factors that also we provide that factor and more important the ethics and fairness uh because a part of HP hard I mean AI Solutions we always are concerned about ethics and fairness ethical AI safety not creating harmful content is part of the core of every AI solution we provide uh we can you can see in this image uh under stage three ethics and fairness score so you you all you are all given these scores and then you can choose the right llm for your task appropriately based on these scores uh I mean there are some tradeoffs but I mean it makes that makes your job easier to choose the right model so for ethics and fairness uh during evaluation we further assess the chosen llm on some of the E the ethical benchmarks like decoding trust and some other benchmarks it's a I mean uh a it's a probably five or six different benchmarks for ethical AI that we assess the chosen language model uh on those benchmarks and we rank them uh so decoding trust is one of those benchmarks that is designed for ethical AI so so how you may have just just answered this but um question in the chat says how does realm assess these factors accuracy for example for car loans versus accuracy for a home loan so in the promp in the given prompt you uh the promt usually you define the loan application right uh you say okay this is a car loan this is a home loan so you have the type of the loan in your prompt and then you have the financial attributes of the client so that way the language model can differentiate between the uh the car and the home I mean mostly most large language models we observe they cannot differentiate they may the value I mean those financial metrics the same way they may not differentiate between those metrics uh so that is very important when choosing the language model to understand those financial metrics properly associated with the loan time so that is one part so I mean I talk next about how also we are making sure of that that the large language model understands the differenti the differentiation between those two types for example I I'll I'll talk next in the other slide yeah this is where also we can Define that aspect if for example uh you have a concern that your llm doesn't understand well or cannot differentiate between different types of loan here on the third stage the refining refinement stage also we address those concerns we have a for example the loan agent llm is your chosen or recommended llm and we have number of critic LMS they're acting as a judge and you have also different context to it to your large language to your recommended llm such as guard rail prompts chain of thoughts F shot learning and the workflow prompts so we are making sure that your recommended llm can understand or be directed to the thought process properly uh for your use case for example here we are making sure that it can differentiate between these two types of loan and can differentiate between short-term payment affordability and long-term Financial stability so these are the uh refinement framework that we are proposing uh has different components that we are making sure that the recommended llm agent is aligned properly with your use Cas case and these are adjustable is this all part of a the fine-tuning process so you would this part also for example I mean we don't want to go the fine tuning part because it could be I mean time consuming for customer and also needs some resources extra resources could be costly so we don't want to go that path but that's an option I would say but here we are just trying to make it work what we recommended what we have what I mean it work best on those benchmarks and user data now let's improve it uh so that's what we are trying to work with what we have uh to improve it further not going the fine tuning path it's just uh giving the F shot examples Chain of Thought teaching teaching the llm how to think how to reason and some guard R promps to ensure safety robustness and the ethical part also can be the ethical part also can be addressed here as I show in my next slide I hope I answer your question yes so for example here the this is a car Lo uh example on how we are refining it uh you can see there are some ethical concern to it you see demographic outputs on the left column you can see uh it it shows some biases uh you get Negative score for for the race and the education which affects it could affect the loan score right the overall loan score and also the payment affordability has been esort uh I would say it's not aligned properly with the type of the loan if you were talking about if you were talking about the home loan that could reflect the actual value but when we are talking about the car loan that payment affordability uh is pretty biased so if you go to next slide oh oh I guess that was so I guess that's my last slide so you can see as I showed you we have multiple critics the multiple critics you are given the output of the recommended llm the critic they assess analyze the output of the recommended llm and they say okay we think that's better these are are not reflected properly or these are not aligned proper with the and these metrics these scores need to be adjusted so in the in the final refinement you can see those scores have been adjusted properly you can see the bias or ethical aspect of it has been addressed properly the race gender the education event those biases are removed which is very important and as we observed a lot of large language models they show biases the they may show Prejudice and they usually take the historical context because they have been trained on a lot of data so they may show that bias because of just the historical context right so we try to address those parts those con concerns and also those if you see the financial metrics also have been adjusted properly based on the type of the loan the affordability score has been changed to reflect the loan status properly and you can see the loan status for example for this type of loan for the car loan has been changed from bad to good so I mean that's very important right so to have the right llm to give you the actual value of your loan application so that's what we are working on this could be uh applied to any use cases not just loan use cases uh and there a lot of attraction to this a lot I mean definitely can help a lot of hp's customer uh cost it's definitely on hp's interest strategic Leist I would say MH well that is we are at the top of the hour already um these this hour always goes really really quickly and our panelists always have so much interesting information and knowledge and expertise to share um if you want to learn more so um the URL for our AI ethics there is there are some blogs and and white papers out there there's some videos that uh talk through these principles uh you can use this um if you want as a model to for your own development practices of of uh principles the second one so if you want to learn more about what what happening in HP Labs uh simple URL there too goes through some of the different you know experimentations and research that's happening and then lastly if you want to get more engaged in these types of conversations our advisory and Professional Services offers an AI transformation Workshop within that Workshop it goes into some of the strategy building and ethical considerations while you're developing your use case you can always uh reach out to me on LinkedIn um feel free to get in get in touch with us please um attend our our next sessions and thank you everybody for joining today
2025-01-30 23:11