The Illusion of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Aengus Ward: Well, good evening everyone and welcome to this evening's lecture I will just let a few moments pass for a few more attendees to arrive, and then we will begin. Aengus Ward: Oh, good evening you're all very welcome to the first of our series of lectures in the University of Birmingham Trinity College Dublin digital research partnership. Aengus Ward: My name is Aengus Ward member of the University of Birmingham side of this partnership.
Aengus Ward: Just to tell you, by the way, so did you all know, the lectures today the three lectures that we have will all be recorded. Aengus Ward: And will be given me the lectures, in each case, the opportunity to come back and and comment and today's what it is they said, if they wish, in the future, little bit of background on our partnership with Trinity College Dublin and university of Birmingham began in 2019. Aengus Ward: A group of us here who work predominantly on textual scholarship textual editing and the University of Birmingham and a group of scholars Jennifer edmonton and Michelle dorn in particular. Aengus Ward: In Trinity College Dublin work on digital humanities, more generally, came together to.
Aengus Ward: To discuss them to research, questions of trust and authority in the digital age 16 to us to be something which was quite central, not just to the research that we carry that. Aengus Ward: But also, I think, as a central question of of the early 21st century. Aengus Ward: we've had a series of meetings in different in different places, these of course we're an interrupted by the pandemic, we had planned conference on a series of lectures and now a year later, we are finally getting around to this. Aengus Ward: We have invited three very distinguished speakers and to address this topic from different disciplinary perspectives over the next four weeks so there will be further to in two weeks time and in in four weeks and the further to lectures.
Aengus Ward: So this is the first and we're very grateful to have Rob Sanderson from Yale University speak to us today i'm going to draw to a whole now and my colleague Rebecca Mitchell and also from the University of Birmingham will present public lecture. Rebecca Mitchell (she/her): he's the Director for Cultural Heritage metadata projects that Yale in the office of the Provost. Rebecca Mitchell (she/her): office played leadership roles in several key initiatives around leading to data and cultural heritage, including editor and Chair for standard, such as triple if web annotations. Rebecca Mitchell (she/her): And was recently linked art is an application profile focused on usability and ease of implementation for cultural heritage link data. Rebecca Mitchell (she/her): He is a semantic data and systems architect previously and information science researcher and medieval historian thanks Rob we will turn it over to you. Robert Sanderson: Thank you Rebecca and I guess in the everyone see what some familiar names in the attendees list so you probably know, some of the things that i'm going to say.
Robert Sanderson: So a quick overview and for those who are less familiar with what linked up and he says i'll give a very brief introduction of exactly what you need to know and learn more. Robert Sanderson: And then look at the three primary questions and we have before us. Robert Sanderson: about trust and like open nature linked open data in the cultural heritage sector, which are, how do we trust What was it, which are divided into three topics of sustainability, diversity and usability. Robert Sanderson: hiring process the walls it at all, only briefly at this, but in particular the distinction between digital signatures and we've archiving.
Robert Sanderson: And trustful It means, which of course is my back to my area of expertise being ontology is vocabularies profile was my That was the conclusion the trust has shared in you know full well what the acronym is, by the time we get to the end. Robert Sanderson: But first some disclaimers I am in no way a sociologists psychologists work photographer. Robert Sanderson: I do know about like data and, but I do feel somewhat of a charlatan in talking about the psychological win so she order who speaks. Robert Sanderson: So this talk is really reflections on what I have observed in the Community and in usage of the beta generally and it's more in the context of information theory, in practice, rather than in sociology. Robert Sanderson: By which I mean it's all probably quite wrong or at the very least my age, so I would very much welcome and questions throughout.
Robert Sanderson: The question and answer tab open and comments and discussion at the end. Robert Sanderson: So without further ado linked open data, what do you need to know. Robert Sanderson: The basics and linked open data is simply just it's the whim, which we all know and love. Robert Sanderson: pop the whim of data, rather than html so institutions publish date wrong way of just like everyone publishers web pages, so those. Robert Sanderson: Are at URLs under the institutions domain name that becomes important when it's about trust the data using shield standards which will come back to at the end. Robert Sanderson: And importantly in the data has references to other bits of data also published on the way they can be published by the same institution so from one day to say preventing another.
Robert Sanderson: or published by other institutions, and then we end up with trust issues as well, but I will not be talking about that, yes, and France ontology is l anything like that and I may use the word graph that just is. Robert Sanderson: The connections between the bits of data. Robert Sanderson: So now we know everything there is to know essentially about like open data. Robert Sanderson: So who is involved, I think it's important for trust to think about who is doing the trusting, who is the trust or who is the trust deed was.
Robert Sanderson: More simply will fill out these Gray actors, by the end of the presentation and with the researcher, so the user of the data and the publisher the institution, making the data available. Robert Sanderson: Here is a pretty basic interaction model for windows open data on the left, we have the publisher they make the data available online that data describes some real world cultural objects or activity. Robert Sanderson: The consumer the enhance some information need which are present with this question mark thought, but then via the network retrieve the data in hopefully able to use that information to answer their question to fulfill their their information need. Robert Sanderson: So what do we mean by trust when it comes to to the data, I think we can split it up into three separate and years, three separate assets so first of all, this accuracy. Robert Sanderson: Does the data correctly represent the state of the real world, rather than the digital world for the things that describes and while i'm there is of course no worries.
Robert Sanderson: opinion and inaccuracies the degree to which the data is accurate is reasonably objective it's about one thing and another thing in the real world. Robert Sanderson: So it's less about trust and interaction between people and the data and more just about equality of the data itself. Robert Sanderson: Then is certainty, so this is the belief of the publisher as to the extent of that accuracy of the data so you'll see this thing with this with. Robert Sanderson: People using maybe or, possibly, probably in data, and this is reasonably subjective it's the belief of the publisher about their own ability to publish accurate data, for whatever reason. Robert Sanderson: And then utility is the other end where is the belief of the researcher that the data is useful for fulfilling their current information it. Robert Sanderson: So this is also quite subjective it's a belief and it's time in question specific so it's about the current need rather than is the database will generally that's that's interesting but it's not.
Robert Sanderson: Not the point of utility see the notion of utility also comes from economics and what i'm also absolutely not in any way an expert, but is useful for for this where utility is the desired nurse of a particular good or in this case service, rather than its value which could be quantified. Robert Sanderson: As that that seems to fit in with the stuff the same thing, so how can we add those into the diagram so again from the left, the institution has some degree of certainty about the accuracy of how well the data describes the object. Robert Sanderson: Then the user again with respect to their question they can't reach them has some belief about the usefulness utility of the data to answer that particular question and, in this case, hopefully, the question is about object, otherwise it will simply not be useful.
Robert Sanderson: So a couple of examples, just to get us thinking about this here's an example from the the art gallery here at Yale which I think is possibly the. Robert Sanderson: most comprehensive example of uncertain or the varying degrees of certainty in any cultural heritage Center so from least the most the artist is completely alone, but it was formally attributed to tell folks and feel, and so we know that it has not Charles woodson. Robert Sanderson: It possibly depicts Mrs James getting's it was made around 1790 but we're not sure it was probably made in the United States and it's definitely watercolor library and place it sick plated here with a gold cipher on the research side. Robert Sanderson: here's another example, and in this case inaccurate data so that previous one, we found through looking for human remains in our data, so the hair was what what found it, so in that case it was actually useful. Robert Sanderson: In this record, we found in a somewhat or related experiment looking for people who were still alive and thereby the oldest people we are as people and so on.
Robert Sanderson: So the point in words, according to this report on lending 22 died in the 39 trillion, which is somewhere about the time when the universe runs out of hydrogen, to give some sense of scale. Robert Sanderson: And indeed here is the picture of Mr edwards looking quite amused probably, it is a is described as. Robert Sanderson: A he actually died 4 trillion in 2010.
Robert Sanderson: And the year 39 trillion is actually the barcode of the laserdisc in completely the wrong place in the recording. Robert Sanderson: So, here it presented is certain information it's entirely an accurate and for the purpose that we had it was not very useful though it did, mean that we could correct the record and move the Barker to the right place. Robert Sanderson: Okay, so trust, then we have an intern trust, so we can further split the relations into those of actors to the data and actors to other actors. Robert Sanderson: So here we can trust into confidence interest so confidence is the belief of the consumer. Robert Sanderson: In the current and past competence of the publisher so how accurate is the data, the accuracy is not about any thing which is going to come in the future. Robert Sanderson: it's about things which the publisher has already done so, where they proficient in publishing data accurately so here, this is the the flip side of certainty how confident is the consumer in the data, rather than have certain as the publisher.
Robert Sanderson: trust them is the flip side of that it's the belief of the consumer in the current and future benevolence of the publisher So this was that the publisher will continue to make available accurate data. Robert Sanderson: there's a further step, which is dependence, which is when one actor relies on another for the successful outcome of some function will come to see that come up a few times and but that's not the focus of this. Robert Sanderson: So then. Robert Sanderson: This relationship into the diagram that the consumer his past confidence in the past competence of the institution and trusts the future benevolence and continuing to publish accurate. Robert Sanderson: But do we really mean the certainty of institution, surely, if I were to walk down our.
Robert Sanderson: Ivy covered holes and asking you about Ivy covered professors, what do you think about the death dates of Mr blake edwards or wasn't really Mrs James getting depicted they would have absolutely no idea what I was talking about living alone any degree of certainty about that relationship. Robert Sanderson: Instead of course what we mean is the content specialists who worked on the data have some in published it have some certainty about that accuracy. Robert Sanderson: But did they publish it well there's other people involved, because there is the data and software which was. Robert Sanderson: Working with databases and other technical systems actually do the publishing, whereas the content specialist working with that database to create the information in the first place. Robert Sanderson: So this link still going to the institution.
Robert Sanderson: Perhaps should be related somehow to these people because it's their abilities and that we should be confident and trust. Robert Sanderson: so that you know, do we should we be thinking about personal trust or institutional trust. Robert Sanderson: So, trusting the institution of course comes by a large from the reputation of the institution, including not exclusively by considering confidence in their abilities over time if they have been good in the past, it is likely that they will be continued to be good. Robert Sanderson: The data, though, created by the actions and expresses the beliefs of many individuals actually assuming that the data is it continues and is maintained and updated over time. Robert Sanderson: So now we come to the first.
Robert Sanderson: Trust paradox which is the window open data is relatively new it's not that we have a long history of. Robert Sanderson: Institutions for many decades publishing with open data, and we can say this institution has done as well for 100 years they will continue to over 100 years. Robert Sanderson: that's clearly muscle but trust in the institution will take into account factors that have absolutely nothing to do with like weapon data and predates it.
Robert Sanderson: So, should I trust really be in the people who are responsible for the open data, rather than institution, this is the first topic I want to explore. Robert Sanderson: Sir. Robert Sanderson: The benevolent action, and I think we are trusting is that the institution will continue to make accurate data openly available. Robert Sanderson: So, in order for this to happen, I think, sustainability is one of the key factors. Robert Sanderson: The first of the first aspect of that is that the data has to be a product and not a project it's something that we, the institution will continue to invest them. Robert Sanderson: In terms of people and technology and the data itself rather than something which is time bound and can can go away right if.
Robert Sanderson: We want the institution to continue to make the accurate data openly available, essentially in perpetuity. Robert Sanderson: Being This is something that the, this is a big gap, this is something that the institution needs to treat as a core product rather than something that they can ignore after three years of funding ends. Robert Sanderson: In order to do that the data must have both internal and external impact EG, it must be the monster will useful to the organization, otherwise, why is the organization investing in this time and effort, and it has to be. Robert Sanderson: super useful to others, otherwise why invest the time and effort in publishing it, which is not as your cost.
Robert Sanderson: Further institutional longevity will actually play a part in our trust in the institution and the binary data, so an established organization, as does more likely to continue to exist entertainers focus and its ability. Robert Sanderson: to publish the data are also more likely to approve it, rather than a startup or newcomer to the same. Robert Sanderson: institutional resources, of course, is important about funded organization is more likely to be able to continue to invest in the product when hard times come. Robert Sanderson: Because they have more resources. Robert Sanderson: and luckily, most important, and there needs to be appropriate governance for the product so, especially in the urban data.
Robert Sanderson: There is indeed to balance internal and external participation it's much easier if you don't need to rely on depend on others, however. Robert Sanderson: To the last point it's much more valuable to have a broader range of participation and to have more data available with further experience and expertise in vista that. Robert Sanderson: The balancing of the recognition of the institution versus the individuals that have worked on at once in quantity versus quality given a finite amount of resources that you can have lots of things, or you can have fewer excellent things, but you can't have all that excellent things. Robert Sanderson: and balancing accuracy versus usability. Robert Sanderson: which will get to the next section and ensuring the diversity of the work. Robert Sanderson: So, to talk further about diversity in terms of trust.
Robert Sanderson: people tend to view the actions of other Members of groups with which they identify. Robert Sanderson: We call in groups more favorably than those of members of groups which, with which they do not so I apologize for reading my slides, but this is an important one to get right and the key factor is even when the action is identical. Robert Sanderson: If the gc truck public some level data about anything.
Robert Sanderson: I wouldn't be because I identify with the GT as a group, having been employed there and the being in the cultural heritage space. Robert Sanderson: More than, say, Google, even though Google demonstrably has vastly more resources vastly more people advancing technology experience. Robert Sanderson: So Google were to publish a description of the Mona Lisa say an easy way to publish description implement waster and they were the same, we would still be more likely to trust God with what your identify rather than Google, even though the data is identical. Robert Sanderson: we're more likely to trust organizations that seemed to share some admissions could consist constituencies or worldviews to her, so we trust things that are some water loss. Robert Sanderson: And this is this has been our favoritism.
Robert Sanderson: But. Robert Sanderson: The paradox is cultural heritage is necessarily diverse because it spans all human activity. Robert Sanderson: So in being able to understand cultural heritage, surely requires systemic diversity. Robert Sanderson: So we shouldn't be trusting there's organizations that are most similar to us, we should be trusting organizations that are done works by verse because they have them most likelihood of understanding and being able to publish the descriptions of those those objects that their heritage. Robert Sanderson: And this applies to both the institutional and the individual level by That means that we need diverse institutions, not just centralized single.
Robert Sanderson: well funded white institutions so, for example in Los Angeles there's also the option museum of the southwest I lovely museum dedicated to them native American arts and culture, but they are not well funded they're interested in open data but it's hard for them to participate. Robert Sanderson: And the individual, we need to have. Robert Sanderson: curators and content specialists. Robert Sanderson: folks engaged with the data from diverse backgrounds, in order to fully understand and appreciate it. Robert Sanderson: And I see Aaron in the participants list I have gratefully someone who slide from a wonderful conference about a month ago for the links project.
Robert Sanderson: To try to explain the importance of assessment. Robert Sanderson: So structuring data is a political act with ethical implications. Robert Sanderson: By this I, and I believe means that information systems helps us in existence of power, because when we choose how to structure that data we have made some choice. Robert Sanderson: That choice comes from a background and our biases and understandings of the models, the ontology is the vocabularies and so on and so forth. Robert Sanderson: All of these existing structures standards and and understanding have come from somewhere and that somewhere is contextual and contingent upon the biases and.
Robert Sanderson: level of engagement of the different individuals and institutions in their work. Robert Sanderson: So that ethical implications, then of choosing how to structure something not just a publisher and whether it's correct or not a critical and we need to engage with with us. Robert Sanderson: So that's like, finally, you might be thinking come to my title illusions of granger so the aspects of trust have talked about. Robert Sanderson: similarity with ourselves that external impact of the institution in its work that longevity and resourcing of the institution right the reputation of the institution, by which we trust it. Robert Sanderson: However, more related to the data, the governance of the product, the internal impact which enables us to be used and sustained. Robert Sanderson: The diversity of the people institutions that we intend to its creation and maintenance and the people directly responsible for separation enlightenment, and these are, by and large and visible.
Robert Sanderson: So here is my. Robert Sanderson: Business that we base our usage of windows open data on the illusory granger of the organization and that has pretty much nothing to do with the data itself. Robert Sanderson: But don't worry, we can add in some more actors and it gets worse. Robert Sanderson: So the next person or people at the next role is the client software engineer, because I met at least one can you don't or similar digital Community experience you don't interact directly with the data.
Robert Sanderson: You see it via some application that's been built by a developer have projected onto the screen. Robert Sanderson: That developer, who could come from a completely separate organization from the data relies on the data being usable. Robert Sanderson: So this is this utility is how good is the data for solving this question.
Robert Sanderson: usability and the data is how easy, is it for the developer, to use the data for their task of building an application for the researcher to use. Robert Sanderson: And just as a intuition point. Robert Sanderson: cheek use your thinking about this versus this didn't you intuitively distrust this symbol, because I had a dollars and intuitively trust us because it's got a academic. Robert Sanderson: let's encrypt similarity.
Robert Sanderson: Okay usable so the data has to be usable by software engineer in order to build the interface to then be presented to the researcher. Robert Sanderson: So the user of the data isn't the researcher it's the application developer, the researcher experiences, the data only through the lens and thereby the biases of the developers application. Robert Sanderson: But doesn't that mean that the utility, which is the belief about the data of the researcher is not actually in the data that's in the presentation of it. Robert Sanderson: So the perceived utility of the data, then directly depends on the usability of the data or the, just like the research is dependent on the fly developer their belief about the utility is dependent upon the believer of the plant developer about its usability. Robert Sanderson: So here, for example, is. Robert Sanderson: A an application developed it easy and builds on on with depth later and that presents archives.
Robert Sanderson: Attractive seems reasonable probably interested. Robert Sanderson: In here's the data on which is built so it's an open data hasn't even been without it users. Robert Sanderson: both internal and. Robert Sanderson: custom and internal but authorized vocabulary. Robert Sanderson: Jason and so on, so this structure has been designed to be usable in order to build user interfaces like this in order to allow the researcher to do useful, interesting research upon that information. Robert Sanderson: So if the utility depends on the visibility, then surely that utility also contributes to the external impact the external impact of the application and data is the aggregate perceived utility of the users.
Robert Sanderson: But as we've seen it takes no impact as a factor and trust, so now our trust depends, to some degree on the utility of the data which depends on its usability. Robert Sanderson: So, really, we should be considering how usable is our data, and given that usability how accurate, can we make it without reducing this to the point that the utility goes down and the trust goes down. Robert Sanderson: So there is in the AAA realm and like that there are a team design patents that we've identified as being particularly useful for usability.
Robert Sanderson: i'll go through them briefly the first is to scope design through sheer use cases if they're out use cases in the Community, then it's unlikely that you need it. Robert Sanderson: was like the International use, because if it's only if you're limiting your audience to only one language, then one geographic region, you are thereby limiting your external impacts of African. Robert Sanderson: Make it as simple as possible, but no simpler, and this is directly to the experience of the developer of the data, make it easy to use, so make the easy things easy but to accuracy, make the complex things possible. Robert Sanderson: avoid dependency on specific technologies to make it easier to migrate, the data and software specialists, make it easier on them as well. Robert Sanderson: don't break the women so don't do things which would not work with the rest of the Web super concerned design projects nowadays are Jason over the years, but Jason form of data. Robert Sanderson: And then down at nine follow existing standards and best practices in team defined success rather than failure so again, this is the triple if and design patients, which are based around this notion of of usability.
Robert Sanderson: So. Robert Sanderson: The next question how do we trust that was it rather than how do we trust the trust the data tearing of the data is useful. Robert Sanderson: So we've got this big cloud black box in the middle of the diagram essentially think like everyone needs to trust the network for this diagram to even begin to work. Robert Sanderson: So what do we mean by this trust. Robert Sanderson: As an observer of the smithsonian see that one of the link that many things don't fear the network, and this is one of like dots design principles.
Robert Sanderson: And from cyber security there is an interesting at here acronyms CIA the cyber security try ahead of confidentiality integrity and availability. Robert Sanderson: Now we don't really care about confidentiality, because the data is like them right we don't need to limit who can access it so that one we can we can ignore but both availability and integrity or. Robert Sanderson: So data availability is essentially We trust that the data will be delivered successfully in an appropriate time. Robert Sanderson: in a timely fashion, when it is requested, so we trust the end is that the institution will continue to make it available and the network will continue to transfer, I think we can report about availability already. Robert Sanderson: So the interesting one is integrity, how do we trust that the data will not be modified on it so that leaves essentially the publisher it gets to the consumer, how do we know that what lift is what arrived at the other end. Robert Sanderson: And here we have the baseline assumptions of it being delivered from the institutions own domain, whereby we know that this institution is responsible for it and it's delivered via https.
Robert Sanderson: so that we do not need to worry about privacy content. Robert Sanderson: And Okay, so I see people who would see people smiling and essentially raising their hands, they either I know we should use of what Jane. Robert Sanderson: Or, I can make that diagram some law do you need a blockchain for this no, but if you happen to have a spare $69.3 million, I do have a bunch of link me later about images that I happy to sell. Robert Sanderson: So there is having a one actually important aspect of blockchain for this purpose, which is that blockchains a bill on signatures digital signatures of.
Robert Sanderson: The data so digital signature is mathematical proof that the data has not changed. Robert Sanderson: Even where's both the signature and the data transfer over an untrusted communication channel. Robert Sanderson: It does this via public and private keys, which we don't need to worry about, but essentially the takeaway is, it is a small relatively string where any change to the data will change that signature of it, so you can validate the data against signature and to know that it hasn't changed. Robert Sanderson: This becomes trickier in the linked urban data space, because the data is a graph.
Robert Sanderson: With blank nodes that don't have identity. Robert Sanderson: In fact let's say a particularly challenging with medical problem to do, for if. Robert Sanderson: There has been a lot of research done and later signatures are coming in the w three see if you happen to be a dominant free see Member institution and do have your AC vote for the signatures chatter. Robert Sanderson: box, thankfully, that also unnecessary for our use cases. Robert Sanderson: Because the data is local and we don't need to ensure that it has not changed on route bi weekly.
Robert Sanderson: Because it just as easy to archive it. Robert Sanderson: With like over So here we can use a trusted third party which doesn't work with with what changed that's one of the one of the blockchain discs to enjoy both integrity and the availability of the data. Robert Sanderson: So you can validate that the data that you received is the same as the data in the archive you know that that's What was it. Robert Sanderson: So that you tried to retrieve it from the institution and you can't well there isn't the archive, so now we have some both integrity and availability, with one process. Robert Sanderson: So the other, in effect, of course, is that career organizations often or really engaged with archiving for.
Robert Sanderson: Digital preservation purposes, but not not, this is a new technology or a new sector to have to do with. Robert Sanderson: It also allows us to essentially time travel through different versions by rolling back the time in the archive to see the previous versions, we can do queries in the past. Robert Sanderson: And to see what the data would have said, for example, the artist who was previously attributed peel at some point clearly we did a tribute, the object appeal and we no longer in the past, the data words appeal.
Robert Sanderson: This can work across multiple archives, including personal ones via my mentor, which is also known as rfc seven or eight nine. Robert Sanderson: But for this to be useful, we need to know that all of the data is archived and available in order to do that we need to know, two things first what data exists. Robert Sanderson: Where are all of these open data records and When does it change EG wind, we need to realize that.
Robert Sanderson: There are various solutions in the space away IBM, he is pretty easy to implement but aging now some 22 years old, could have a PhD. Robert Sanderson: resourcing is newer on google's segments list well implemented and different extremes, which is a double three say specification and then customize for this purpose and the drupal community as the AAA have changed discovery API, which is the URL right there what. Robert Sanderson: This is about to hit 1.0 and it's been through two. Robert Sanderson: pages and we have all the implementation is needed to pull the trigger on it, so we are hopeful that this will be an.
Robert Sanderson: Official triple if specification and the next month quarter. Robert Sanderson: here's another advantage to go back to trust we can separate notions of confidence and trust by adding the archive so if we don't trust that the institution is publishing the data will continue to make it available. Robert Sanderson: But we do trust that they published accurate data currently you're in the past in my having this archive copy available and a third party. Robert Sanderson: We can separate that trust the future availability to believe in the benevolence of this institution say it's the Internet archive or National Library or some other and more data in archives specific organization.
Robert Sanderson: That gives us an advantage and opens the door to more institutions playing this role, I believe. Robert Sanderson: Okay, so what tell me what the data means what happened, the research to be sure that the understanding of the information is what wasn't in the by the content specialist. Robert Sanderson: Essentially, if I have a Christian as a researcher, how do I know that there is any relationship at all between my worldview my Christian. Robert Sanderson: And the world view the understanding of the object of the content specialist over here, because these two people if they can just sit down with a whiteboard and the object i'm sure it will be very easy to answer this question. Robert Sanderson: However, they are opposite ends of this network.
Robert Sanderson: seems like a problem. Robert Sanderson: We can reduce that, though, because the data model about the last entry and our almost at Ellis to actors. Robert Sanderson: At the institution working can work directly with the content specialist the content specialist has the understanding of the the real world, the the outer storing the data model law needs to have a reasonable understanding of that and be able to translate it into the data.
Robert Sanderson: Application developer needs to have a reasonable understanding of the data and be able to translate it into the application for the research of the Christian. Robert Sanderson: The application developer needs to understand the model in the same way that the data model does, but these two don't need to communicate directly. Robert Sanderson: So how do we solve this question.
Robert Sanderson: Which is a cost associated standard, so if the data model and using the standard application developer from a different institution uses the same standard. Robert Sanderson: Then, assuming they're both applied it accurately, then any number of these can work with any number of these applications, so long as they've implemented this world. Robert Sanderson: However, standards need the same considerations in terms of trust is data, so if there's any conveys the meaning is the central point for how all this works, how do we trust the standards. Robert Sanderson: standards they need to be sustainable products they need to have governance which has all of those features of diversity and inclusion. Robert Sanderson: and needs to be published by only going well resourced institutions. Robert Sanderson: They should be archived different versions should be available any machine accessible information we shouldn't computer information should be hospitable.
Robert Sanderson: below the representations for humans, they need to be usable by late models and development, as the people who are most directly affected by them. Robert Sanderson: They need to provide utility to the content, specialists and researchers who need to be able to communicate via essentially via the work of the data model wasn't development. Robert Sanderson: and critically, there needs to be this diversity of institutions and people in every role, otherwise we looking at the centralization and marginalization of knowledge at the exclusion. Robert Sanderson: Of those diverse communities who are not traditional participants in white middle aged male developed standards. Robert Sanderson: So then it's plural there's a few standards and directions that's worth describing because of that diversity Christian, we can make it easier to participate.
Robert Sanderson: When we understand how the standards environment, the same speaker system works and how it plays into the work of the rest of the larger ecosystem. Robert Sanderson: So there are three tiers of abstraction standard there's the conceptual model which is just the Amtrak way you're thinking about the world, and this is where we absolutely required diversity. Robert Sanderson: needs to be consistent and coherent and like say a crowdsourced constantly changing conceptual model.
Robert Sanderson: It needs to be the encoded an ontology, which is a set of terms that encourage the thinking about the concepts in a logical and importantly machine actionable way so that we can then create knowledge instructor graphs. Robert Sanderson: And then there is the vocabulary, so these a curated state of Sub domain specific terms such as art museum terms, as opposed to. Robert Sanderson: Natural history museums or libraries, archives, which then make the ontology more concrete, so the distinction between a and a physical object generally and a painting, and while I might have use of a painting at the notion of painting, it certainly needs notion of a physical object.
Robert Sanderson: And then these are refined or specialized by implementation standards. Robert Sanderson: So then, there are two of those a profile, which is a selection of the appropriate abstractions and appropriate model for the ontology is appropriate vocabularies. Robert Sanderson: That encode the scope of what should be able to be described when you use this profile, so it doesn't create new things, unless employee history that nearly selects.
Robert Sanderson: Experts from the previous layer Robert Sanderson: That, as they made available via an API application programming interface or agreement preceding interaction, which is a selection of appropriate technology is the access to the data which has managers and profile visualizing these guys. Robert Sanderson: So as an example, like that we use the side of conceptual reference model psilakis international museums Standards Organization, we use an audio is encoding of CRM versions of one plus a few occasions. Robert Sanderson: For vocabulary, we use it and then some very minimal extensions, but things that haven't made it into it, so the profile within the scope is art museum oriented.
Robert Sanderson: But we serve the outdoors adjacent domains libraries and archives people places subjects and so on, because without that those connection points there's no way that we can be part of the wider we took place. Robert Sanderson: And the API user is no D just microwave does within primary divisions of their content. Robert Sanderson: So how do we connected connected all the dots connect the Lego bricks by a shared standards, so the application developer needs absolutely to understand the API because that's what. Robert Sanderson: Their application is going to use to get access to the data they need to understand the profile well enough that they can create an appropriate interface, so that the user knows how to engage with data in this profile. Robert Sanderson: In the data model doesn't really need to know how the API works that would be the software engineers who published the the API or the top of the data that they do differently also need to know how the profile works, so that they're not selecting. Robert Sanderson: ontology terms of vocabulary terms which they need to know to describe the words that are not part of the profile.
Robert Sanderson: So then. Robert Sanderson: The question and the understanding of the object apart of the overall conceptual model. Robert Sanderson: of places and communication and digitalization and books and people, and so on, this is where we need to diversity, the main theorists and logistics autonomous technical fire systems spark content specialist researchers model as.
Robert Sanderson: Everyone as many as many people who can contribute to the understanding of the model makes this beta which makes this dinner. Robert Sanderson: So we can trust the in the what was moved in this space, because the model has diversity and rigor the ontology and vocabulary, the encode this is specialized by the profile for the use cases. Robert Sanderson: This sub domain is made available via a documented API so that this person can interact with the data easily.
Robert Sanderson: And because of these shared connections, the researcher can then ask the same question is the content specialist here's the answer to without needing to talk directly through essentially this incoming. Robert Sanderson: So. Robert Sanderson: Now we have gone through all pretty much all of what they mean by trust is shared so trust them veterans engines are sustainable it's necessary for ongoing use with all of the.
Robert Sanderson: features that we're talking about about availability hostile again it's the ease of access and the backup plan to allow confidence in the publishing institution and trust and archive. Robert Sanderson: It needs to be available online with clear usage licenses on into reconcile reconcile, but this is the linking factor across the systems and how we should incorporate others knowledge, not just our own, and especially diverse when it comes to institutions people and practices. Robert Sanderson: So, thank you very much for your attention and I do look forward to any comments criticism Christians or any other engagement, either now or there's my email address is, if you want to look at the slides there on slideshare and do do reach out or or ask questions in the q&a. Robert Sanderson: Thank you.
Rebecca Mitchell (she/her): Thanks so much that was absolutely fascinating and we do have one question in the Q amp a which came mid talk, I think you might actually have have addressed this and. Rebecca Mitchell (she/her): And, and this is Matthew Lincoln and i'm not sure if he's here, or if matt if we're empowering people to. Rebecca Mitchell (she/her): To speak or not, but basically he's asking about the connection between the closeness of the the data engineering the client developer, which you did address, but could you say, maybe more about that if you've seen it in practice or how that might work. Robert Sanderson: yeah and going back to the slide, this is what I think as well, so and then, if the client developer actually set over here.
Robert Sanderson: Working with 30 with these. Robert Sanderson: ips initially very beneficial, of course, because then you have the. Robert Sanderson: The same close connection. Robert Sanderson: The disadvantage is, which I think is where backwards was thinking.
Robert Sanderson: That when this becomes essentially Groupthink or otherwise, the focus is on getting it done quickly and cheaply, rather than correctly. Robert Sanderson: Then you can cut corners very easily when this person this person and this person or people. Robert Sanderson: are sitting in the same room and essentially colluding on the answer, however, that collusion, while valuable for this one piece of data when it comes to the other people who aren't part of the group, working with the data becomes much harder because we don't have. Robert Sanderson: This. Robert Sanderson: So if there isn't a shared standard that's being built to. Robert Sanderson: Then this person has essentially has no child's working with this data it's unusable to them, because they don't have the frame of reference, they can't get I can't understand how to get from here to here to here to here.
Robert Sanderson: The other advantage of having the standard model is you have many institutions publishing data and many applications built if they all use the same standard in the middle, then you can swap them in and out, you can use the. Robert Sanderson: One application on a totally different than the one that was initially built for it i've seen this really well in the tripwire youth space where there are multiple probably a dozen or more. Robert Sanderson: Applications that can consume replay of presentation and imagery API resources and several dozen backend systems, all of which communicate via the shared standard, and you can mix them and sort them out with relative relative ease. Robert Sanderson: So it's there's an initial advantage because you get to this more quickly, but a long term disadvantage because you're limited in the number of applications and. Robert Sanderson: interoperability.
Rebecca Mitchell (she/her): And Jennifer did you want to ask. Jennifer Edmond: yeah i'd love to thanks Rebecca and thanks so much rob for that incredibly clear and and well organized talk I I think I will be, I will be looking for your image in order to. Jennifer Edmond: To unpack it and use it as a way of explaining this very complex ecosystem of.
Jennifer Edmond: Researchers and technical developers and content specialist because I think sometimes it's a bit difficult to grasp the complexity of that of all the different actors who who work together. Jennifer Edmond: But I did want to ask you, because i'm I find your ideas around in group and out group quite interesting but. Jennifer Edmond: In some ways i'm not sure if I agree, because when I think about well why do I not trust Google it's not because my role is different from theirs or my institution is different from theirs, because I perceive. Jennifer Edmond: Their goals and their values as being different, and in fact I would much sooner trust a library or an archive with data management that I would myself as a researcher because I recognize that that's not that's not what i'm built for, as it were.
Jennifer Edmond: So, in some ways i'd like to just kind of ask you what you think about that idea of it not being so much about in group and output group, but what what we perceive institutions is being for. Jennifer Edmond: And then maybe as a sort of a further follow up of that the question of Does that mean that, in fact, if we're talking about linked open data well, maybe we don't under we don't. Jennifer Edmond: want to trust institutions for meetings we don't want to instruct US institutions now because maybe it's that we don't necessarily see. Jennifer Edmond: them having the resources to hire in the developers to be at the technical cutting edge, but actually I know that that's that's the set of players in the ecosystem that I want to be at that cutting edge. Jennifer Edmond: In five years 10 years 20 years would we really need that value system of preservation and completeness and checking and authenticity and protection and all of those values that we we look towards these institutions for. Jennifer Edmond: So I just you know, again, this is, this is what was rattling around in my head, as I was absorbing what you're saying I was just wondering about your response to it.
Robert Sanderson: yeah definitely. Robert Sanderson: So yes, I agree, I think. Robert Sanderson: This is it i'm absolutely not a psychologist or a sociologist so when when Rebecca asked me to talk about trust and like open data of my response we can. Robert Sanderson: assure the accuracy of this statement was i'm happy to do that, but you should probably find someone who knows more about what they're talking then then either. Robert Sanderson: So, however, I think there's.
Robert Sanderson: A degree of the. Robert Sanderson: group favoritism. Robert Sanderson: Because of the shared worldviews as opposed to them being completely separate but somewhat to diversity and it's not that we need. Robert Sanderson: A completely diverse set of folks looking at every single. Robert Sanderson: object and coming to a universal understanding of it, we need appropriate people with appropriate different backgrounds to work with the object, or the the culture heritage. Robert Sanderson: And I think we need the same.
Robert Sanderson: structure for the groups, the organizations, the people that are publishing the data it's the similarity between the question and the effects and mission, the world view of the organization, how do we get from the the question to the understanding of the people, the similarity is important. Robert Sanderson: Through the excommunicated by appropriate standards, but the. Robert Sanderson: was a commercial organization publishing data for commercial purposes, they will have a different worldview than one which a cultural or academic institution might have. Robert Sanderson: Any hints they'll probably publish different data, and we might trust them a different amount so yeah it's it's all untangled like a weird. Robert Sanderson: it's i've tried to pick out some of the threads but. Robert Sanderson: it's it's absolutely a beginning and understanding, rather than anything with it there.
Robert Sanderson: No mostly non answer. Robert Sanderson: was somewhat satisfying. Jennifer Edmond: He did it did, and I mean, I would just say I guess you know thinking kind of continuing to think along with you, is it to the best I can.
Jennifer Edmond: You know you're bringing up the Web archives, obviously we have a little bit of you know there's some hesitancy about. Jennifer Edmond: Using web archives for research, sometimes because of things it's precisely because of things like memento because of the sampling methods. Jennifer Edmond: But what you're presenting I think that's really compelling is that. Jennifer Edmond: there's this ecosystem of actors is ecosystem of roles and that's what's evolving and you know the Web archives.
Jennifer Edmond: doesn't mean we don't need the library, it means that to have both of them gives us an even stronger stronger future support for not only the the access, we need to heritage and the diversity, am I am I am I am I am I kind of coming along behind you. Robert Sanderson: Excellent absolutely. Jennifer Edmond: Fantastic thanks so much for the answer. Robert Sanderson: Did you.
Rebecca Mitchell (she/her): Ever i'm we are coming up on time and, but I think there's one question in the Q amp a that i'll just flag. Rebecca Mitchell (she/her): Well there's two comments, one was noting the question of environment that as a determining factor and the way we understand the process trust. Rebecca Mitchell (she/her): And some praise, also for the shared model and but to come back to to the person's question.
Rebecca Mitchell (she/her): Which i'm going to try and summarize it briefly, but, but just say, is there a space to maybe combine crowdsourcing with. Rebecca Mitchell (she/her): The kind of the well curated or authoritative data models that you've described here is, is there a way of doing a hybrid model that would bring those both together to and maybe provide some support for less let's find it or less. Rebecca Mitchell (she/her): Structured supported.
Robert Sanderson: yeah yes and. Robert Sanderson: Yes, and. Robert Sanderson: No, let me give it to the user body one so. Robert Sanderson: My my opinion. Robert Sanderson: Is that it's this first point in the usability, which is where we need the crowdsourcing.
Robert Sanderson: So we need as. Robert Sanderson: Global as consistent and coherent complete understanding of the use cases in order to do all the rest of these so this, this is not just a seat it's more than just. Robert Sanderson: start with scoping your design through those those shared use cases so in terms, though, of building ontology and vocabulary is if everyone that then we wouldn't have any standards and there's a lot of.
Robert Sanderson: Expertise not necessarily needed but useful in terms of just practical experience, as well as theoretical understanding of what works and what doesn't and. Robert Sanderson: Having folks engaged with the modeling side makes it consistent and coherent, which is important for usability, whereas a completely crowd sourced open ended. Robert Sanderson: let's just make whatever relationships and classes currently useful to me is not likely to go here into something that you would easily use in other environments or essentially it's it would be very hard to turn into a standard, so I agree about the agility. Robert Sanderson: And indeed.
Robert Sanderson: All of the evolution of standards and needs to co evolved with the data in order to stay current and useful to the api's and profiles need to keep up with the use cases and the two projects so it's. Robert Sanderson: This constant ebb and flow with an ecosystem of a change here reports to change their respect to a change in the in the data, and I think the Christian is. Robert Sanderson: Mostly your which butterflies need to flap their wings, in order to get the most beneficial changes for everyone involved. Robert Sanderson: Rather than which moves need to run around in the china shop and break things and pick up the pieces and make something good for for their own personal or that one institution. Rebecca Mitchell (she/her): Thank you so much, and we have run out of time, but I think this is just been an extraordinary starting point for what well should be an ongoing series of discussions and Angus did you want to see.
Aengus Ward: spec this discussion on for a very long time again many, many questions came to me as well and Robert so so thank you so much for so so so detailed a mapping of an ecosystem, which I think will will be the object of discussion amongst us all for for quite some time. Aengus Ward: The the lecture was recorded as I mentioned, there is available for anybody to come back to you and I think I will be coming back to it, but just to say. Aengus Ward: Once again, thank you so much for for being the opening speaker in in our lecture series, and thank you again for the clarity of your presentation. Aengus Ward: Just to say to to all of you for those of you who are interested in two weeks time on the eighth of June. Aengus Ward: Our second speaker will be aligned to Horn from Stanford university discussing uncertainly in uncertainty and manuscript technologies and the potential of computational tools and you can find the sign up in the same place again that we will be advertising that actually is widely again. Aengus Ward: And then, just to to bring it to a close, just once again today, thank you very much rob for for a truly excellent lecture and we're not sure that the.
Aengus Ward: That this format allows for public clapping of any type that but i'm sure there is this widespread applause, thank you very much again. Robert Sanderson: Thank you, thank you, I look forward to elaine and the presentation. Robert Sanderson: grow i'm sure will be. Robert Sanderson: Even more on topic than. Aengus Ward: That Caroline dusters from Cambridge john customer early on the automation event of expertise in in in four weeks time.
Aengus Ward: Thank you all very much.