IASSIST GVC 2021 Research Reproducibility 2021-05-17

Show video

are the only options for are the only options for people to choose so people can kind of people to choose so people can kind of people to choose so people can kind of Florio Arguillas Florio Arguillas Florio Arguillas i'm the coordinator of this session i am i'm the coordinator of this session i am i'm the coordinator of this session i am a research associate of the cornell a research associate of the cornell a research associate of the cornell center for social sciences center for social sciences center for social sciences i manage our centers results i manage our centers results i manage our centers results reproduction service which verifies and reproduction service which verifies and reproduction service which verifies and certifies the reproducibility of our certifies the reproducibility of our certifies the reproducibility of our researchers studies prior to submitting researchers studies prior to submitting researchers studies prior to submitting them for publication them for publication them for publication i'm also the co-founder of the curating i'm also the co-founder of the curating i'm also the co-founder of the curating for reproducibility consortium or cure for reproducibility consortium or cure for reproducibility consortium or cure along with Limor Peer of yale and Thu-Mai along with Limor Peer of yale and Thu-Mai along with Limor Peer of yale and Thu-Mai Christian of the university of north Christian of the university of north Christian of the university of north carolina carolina carolina together we co-chair the rda's cure fair together we co-chair the rda's cure fair together we co-chair the rda's cure fair working group working group working group thus this session is very dear and near thus this session is very dear and near thus this session is very dear and near to me to me to me and you are here because you are either and you are here because you are either and you are here because you are either practicing practicing practicing interested or curious about research interested or curious about research interested or curious about research reproducibility reproducibility reproducibility thank you to the program committee for thank you to the program committee for thank you to the program committee for the excellent job in selecting the the excellent job in selecting the the excellent job in selecting the papers for this session papers for this session papers for this session not only are the papers on point they not only are the papers on point they not only are the papers on point they also reflect the i in iASSIST also reflect the i in iASSIST also reflect the i in iASSIST which stands for international from the which stands for international from the which stands for international from the netherlands we have a presentation about netherlands we have a presentation about netherlands we have a presentation about a reproducibility hackathon a reproducibility hackathon a reproducibility hackathon where researchers learn and improve where researchers learn and improve where researchers learn and improve their reproducibility skills their reproducibility skills their reproducibility skills and get feedback for their efforts from and get feedback for their efforts from and get feedback for their efforts from canada canada canada we have a presentation about concrete we have a presentation about concrete we have a presentation about concrete and actionable steps and actionable steps and actionable steps to help improve our understanding and to help improve our understanding and to help improve our understanding and practice of computational practice of computational practice of computational reproducibility reproducibility reproducibility and from the united states we have a and from the united states we have a and from the united states we have a presentation about errors that authors presentation about errors that authors presentation about errors that authors commonly make in replication packages commonly make in replication packages commonly make in replication packages and challenges faced by authors in and challenges faced by authors in and challenges faced by authors in making their research reproducible making their research reproducible making their research reproducible before we begin let me just remind you before we begin let me just remind you before we begin let me just remind you to mute your mics to mute your mics to mute your mics please type your questions in the chat please type your questions in the chat please type your questions in the chat or q or q or q a box at the end of each presentation i a box at the end of each presentation i a box at the end of each presentation i will read a question will read a question will read a question about the presentation from the audience about the presentation from the audience about the presentation from the audience while the next presenter is setting up while the next presenter is setting up while the next presenter is setting up then after the last presentation i will then after the last presentation i will then after the last presentation i will open the floor for q open the floor for q open the floor for q a we will end the session around 1 55 pm a we will end the session around 1 55 pm a we will end the session around 1 55 pm eastern time so you have time for a eastern time so you have time for a eastern time so you have time for a quick break before hopping off quick break before hopping off quick break before hopping off to the next session so let's begin to the next session so let's begin to the next session so let's begin our first presenter is dr christina our first presenter is dr christina our first presenter is dr christina hettne hettne hettne digital scholarship librarian leiden digital scholarship librarian leiden digital scholarship librarian leiden university libraries university libraries university libraries the title of our presentation is the title of our presentation is the title of our presentation is reprohack NL 2019 reprohack NL 2019 reprohack NL 2019 enhancing research reproducibility at enhancing research reproducibility at enhancing research reproducibility at dutch universities dutch universities dutch universities christina is an expert on fair data christina is an expert on fair data christina is an expert on fair data management and open access management and open access management and open access with a background in data science with a background in data science with a background in data science research reproducibility research reproducibility research reproducibility and bioinformatics she has facilitated and bioinformatics she has facilitated and bioinformatics she has facilitated many workshops related to reproducible many workshops related to reproducible many workshops related to reproducible science such as the creation of fair science such as the creation of fair science such as the creation of fair metadata metadata metadata and regularly advises researcher on open and regularly advises researcher on open and regularly advises researcher on open science science science and is an active participant in and is an active participant in and is an active participant in international networks international networks international networks such as the rda and go fair such as the rda and go fair such as the rda and go fair she is an author of more than 40 she is an author of more than 40 she is an author of more than 40 research publications research publications research publications christina let me see i will start by let me see i will start by sharing my screen let me say that it's it's really great let me say that it's it's really great to be here today it's my first i assist to be here today it's my first i assist to be here today it's my first i assist and um i haven't been in the library and um i haven't been in the library and um i haven't been in the library world uh world uh world uh so long just uh about two and a half so long just uh about two and a half so long just uh about two and a half years and um years and um years and um yeah and as a supporter so to say so i yeah and as a supporter so to say so i yeah and as a supporter so to say so i thought it was really thought it was really thought it was really a lot of fun the the first hour of this a lot of fun the the first hour of this a lot of fun the the first hour of this uh this conference so i'm looking uh this conference so i'm looking uh this conference so i'm looking forward to it forward to it forward to it um yeah so indeed i will talk about the um yeah so indeed i will talk about the um yeah so indeed i will talk about the the reprehect and a workshop that we the reprehect and a workshop that we the reprehect and a workshop that we held held held actually quite a while now ago actually quite a while now ago actually quite a while now ago but first i would like to talk about but first i would like to talk about but first i would like to talk about what what what actually the the computational actually the the computational actually the the computational reproducibility reproducibility reproducibility just to get the definitions and that i just to get the definitions and that i just to get the definitions and that i will use to get this on the same page will use to get this on the same page will use to get this on the same page and i use this figure from the turing and i use this figure from the turing and i use this figure from the turing way way way a handbook from reputational data a handbook from reputational data a handbook from reputational data science that science that science that i guess many of you might know already i guess many of you might know already i guess many of you might know already where we talk about a reproducible study where we talk about a reproducible study where we talk about a reproducible study when the analysis and the data are the when the analysis and the data are the when the analysis and the data are the same same same and the replicable study when the and the replicable study when the and the replicable study when the analysis analysis analysis is the same but the data is different robustness comes from a different robustness comes from a different analysis on the same data analysis on the same data analysis on the same data and we generalize when you have a and we generalize when you have a and we generalize when you have a different analysis different analysis different analysis on a different data set so reproducibility uh just a so reproducibility uh just a sort of overview checklist to think sort of overview checklist to think sort of overview checklist to think about about about also this coming from the the touring also this coming from the the touring also this coming from the the touring way way way and also from my own experience in and also from my own experience in and also from my own experience in reproducible reproducible reproducible science well you need to have of course science well you need to have of course science well you need to have of course a a research question or an abstract a a research question or an abstract a a research question or an abstract describing um what you did your input describing um what you did your input describing um what you did your input data data data metadata and the tooling the code metadata and the tooling the code metadata and the tooling the code code documentation and the results but what actually happens in practice but what actually happens in practice and now focusing a bit on the code here and now focusing a bit on the code here and now focusing a bit on the code here and i found this very interesting blog and i found this very interesting blog and i found this very interesting blog by the plus director of open research by the plus director of open research by the plus director of open research solutions that was solutions that was solutions that was published quite recently and they published quite recently and they published quite recently and they asked the users you know why have you asked the users you know why have you asked the users you know why have you not shared your code publicly in the not shared your code publicly in the not shared your code publicly in the past past past select all that apply and the top reason select all that apply and the top reason select all that apply and the top reason being that it takes too much time to being that it takes too much time to being that it takes too much time to prepare code for sharing prepare code for sharing prepare code for sharing the second would be the concerns with my the second would be the concerns with my the second would be the concerns with my ability to prepare the code for sharing ability to prepare the code for sharing ability to prepare the code for sharing and then oh sorry the second is the and then oh sorry the second is the and then oh sorry the second is the software and system dependencies and the software and system dependencies and the software and system dependencies and the third is concerns third is concerns third is concerns with my ability to prepare the code for with my ability to prepare the code for with my ability to prepare the code for sharing sharing sharing and so we can see that yeah these are and so we can see that yeah these are and so we can see that yeah these are things that um that researchers merely things that um that researchers merely things that um that researchers merely need to uh yeah to practice upon need to uh yeah to practice upon need to uh yeah to practice upon and or not only that so how to really and or not only that so how to really and or not only that so how to really counter this well counter this well counter this well award researchers for their produces award researchers for their produces award researchers for their produces ability ability ability efforts i guess will be talked about efforts i guess will be talked about efforts i guess will be talked about today maybe as well today maybe as well today maybe as well that researchers are of course judged that researchers are of course judged that researchers are of course judged mainly upon mainly upon mainly upon their outputs in terms of publications their outputs in terms of publications their outputs in terms of publications and not really on and not really on and not really on the you know making a code available for the you know making a code available for the you know making a code available for example example example another point you know start on time another point you know start on time another point you know start on time data management plans are becoming data management plans are becoming data management plans are becoming common practice common practice common practice and research software plans are actually and research software plans are actually and research software plans are actually are also starting to pop up are also starting to pop up are also starting to pop up which is a really good thing i think which is a really good thing i think which is a really good thing i think since researchers since researchers since researchers are a research software it's not really are a research software it's not really are a research software it's not really the same as production software it's the same as production software it's the same as production software it's usually made for a specific purpose usually made for a specific purpose usually made for a specific purpose but still needs planning but still needs planning but still needs planning and then you come to the core of my talk and then you come to the core of my talk and then you come to the core of my talk which is about which is about which is about you know educate educate educate and an you know educate educate educate and an you know educate educate educate and an example about that example about that example about that the the reaper hack so how does a reaper hack work so how does a reaper hack work well researchers try to reproduce well researchers try to reproduce well researchers try to reproduce results reported by papers results reported by papers results reported by papers that have been submitted to the reaper that have been submitted to the reaper that have been submitted to the reaper hack hack hack and the idea is actually that it is a and the idea is actually that it is a and the idea is actually that it is a really a low pressure really a low pressure really a low pressure sandbox environment for practicing sandbox environment for practicing sandbox environment for practicing reproducible research practices reproducible research practices reproducible research practices and as a participant that you learn and as a participant that you learn and as a participant that you learn practice practice practice and try out yourself and try out yourself and try out yourself as an author submitting your paper you as an author submitting your paper you as an author submitting your paper you get feedback and acknowledgments for get feedback and acknowledgments for get feedback and acknowledgments for your efforts your efforts your efforts and we of course hope that if we help and we of course hope that if we help and we of course hope that if we help furthering science by increasing the furthering science by increasing the furthering science by increasing the skills for reviewing skills for reviewing skills for reviewing and producing reproducible research and here i put down just an example and here i put down just an example reaper schedule to make it reaper schedule to make it reaper schedule to make it more concrete so we held a reaper hack more concrete so we held a reaper hack more concrete so we held a reaper hack at in november 2019 in the library at in november 2019 in the library at in november 2019 in the library we started with some you know coffee and we started with some you know coffee and we started with some you know coffee and tea and welcome tea and welcome tea and welcome i really took some time for that and we i really took some time for that and we i really took some time for that and we really want to create really want to create really want to create a sort of warm welcoming atmosphere uh a sort of warm welcoming atmosphere uh a sort of warm welcoming atmosphere uh to help and not um to help and not um to help and not um have this not having a judgmental idea have this not having a judgmental idea have this not having a judgmental idea about about about you know how to reproduce things a paper you know how to reproduce things a paper you know how to reproduce things a paper but but but just try to help each other and learning just try to help each other and learning just try to help each other and learning environment environment environment we then started by a presentation about we then started by a presentation about we then started by a presentation about tools for reproducible research tools for reproducible research tools for reproducible research by dr anna cristali this time by dr anna cristali this time by dr anna cristali this time we then started forming groups started we then started forming groups started we then started forming groups started hacking um hacking um hacking um had some a very important time for lunch had some a very important time for lunch had some a very important time for lunch again you know getting your blood sugar again you know getting your blood sugar again you know getting your blood sugar up to help you up to help you up to help you concentrate in front of the computer concentrate in front of the computer concentrate in front of the computer and also get to know people we would and also get to know people we would and also get to know people we would then then then continue hacking and having some break continue hacking and having some break continue hacking and having some break for another presentation for another presentation for another presentation this time on a vision of open science this time on a vision of open science this time on a vision of open science beyond the reproducibility criASSIST by dr beyond the reproducibility criASSIST by dr beyond the reproducibility criASSIST by dr john john john boye then we continued hacking and boye then we continued hacking and boye then we continued hacking and reported back reported back reported back with some drinks and bites it's a short summary of the event is it's a short summary of the event is that the setting was the lighting that the setting was the lighting that the setting was the lighting university library university library university library and the organization the reaper and the organization the reaper and the organization the reaper core team of researchers and students core team of researchers and students core team of researchers and students with an active involvement from library with an active involvement from library with an active involvement from library support staff support staff support staff we had 44 participants and they were we had 44 participants and they were we had 44 participants and they were from diverse backgrounds psychology from diverse backgrounds psychology from diverse backgrounds psychology engineering by medicine computer science engineering by medicine computer science engineering by medicine computer science we had 31 papers submitted to the reaper we had 31 papers submitted to the reaper we had 31 papers submitted to the reaper hack hack hack and 19 were so reprohacked and 19 were so reprohacked and 19 were so reprohacked and 11 were successfully or almost and 11 were successfully or almost and 11 were successfully or almost actually reproduced actually reproduced actually reproduced so put down the link to the to the paper so put down the link to the to the paper so put down the link to the to the paper here at the end if you want to here at the end if you want to here at the end if you want to check out some more statistics now one important thing now one important thing coming out of this reaper hack was top coming out of this reaper hack was top coming out of this reaper hack was top 10 10 10 tips from the participants that um tips from the participants that um tips from the participants that um yada tried to re reproduce the papers so yada tried to re reproduce the papers so yada tried to re reproduce the papers so they all have to fill out a form with uh they all have to fill out a form with uh they all have to fill out a form with uh questions on for example you know what questions on for example you know what questions on for example you know what did you think about the documentation did you think about the documentation did you think about the documentation and uh what would you how do you think and uh what would you how do you think and uh what would you how do you think it would improve it would improve it would improve and so forth so the top 10 tips and so forth so the top 10 tips and so forth so the top 10 tips and as we mentioned were like you know and as we mentioned were like you know and as we mentioned were like you know package data so that it's easy and fast package data so that it's easy and fast package data so that it's easy and fast to download to download to download provide non-platform specific code that provide non-platform specific code that provide non-platform specific code that is written using is written using is written using an open software use a code book an open software use a code book an open software use a code book explaining the data structure include a explaining the data structure include a explaining the data structure include a readme text file to explain the context readme text file to explain the context readme text file to explain the context of data collection of data collection of data collection comment the code generously perform a comment the code generously perform a comment the code generously perform a typo check typo check typo check report on time needed to run the code report on time needed to run the code report on time needed to run the code explain which parts of the code explain which parts of the code explain which parts of the code corresponds to which results in the corresponds to which results in the corresponds to which results in the paper paper paper attach a permissive license to code and attach a permissive license to code and attach a permissive license to code and the permissive the permissive the permissive license to data and then i actually wanted to read this and then i actually wanted to read this out this is from from the paper out this is from from the paper out this is from from the paper that's in the eye assist quarterly that's in the eye assist quarterly that's in the eye assist quarterly because because because i it's really i try to sort of i it's really i try to sort of i it's really i try to sort of um transfer the feeling of like how um transfer the feeling of like how um transfer the feeling of like how libraries can can make a libraries can can make a libraries can can make a while libraries is such a great place to while libraries is such a great place to while libraries is such a great place to organize these things organize these things organize these things so i would say you know the library as a so i would say you know the library as a so i would say you know the library as a place to meet and exchange ideas place to meet and exchange ideas place to meet and exchange ideas so that libraries have always been a so that libraries have always been a so that libraries have always been a place for people to meet and place for people to meet and place for people to meet and exchange ideas in a sense they often are exchange ideas in a sense they often are exchange ideas in a sense they often are neutral space neutral space neutral space outside research institutes for outside research institutes for outside research institutes for researchers to focus on sub parts of researchers to focus on sub parts of researchers to focus on sub parts of work work work reaper hacks and other grassroots reaper hacks and other grassroots reaper hacks and other grassroots initiatives need exactly that initiatives need exactly that initiatives need exactly that a place to meet work think and discuss a place to meet work think and discuss a place to meet work think and discuss libraries are connected with the libraries are connected with the libraries are connected with the faculties and they can use their network faculties and they can use their network faculties and they can use their network to reach to reach to reach researchers throughout the university researchers throughout the university researchers throughout the university for this first reaper hack in the for this first reaper hack in the for this first reaper hack in the netherlands the cds that's the center netherlands the cds that's the center netherlands the cds that's the center for digital scholarship for digital scholarship for digital scholarship in the library that i'm connected to in the library that i'm connected to in the library that i'm connected to they contributed greatly by offering its they contributed greatly by offering its they contributed greatly by offering its infrastructure infrastructure infrastructure and enhancing the organizers outreach and enhancing the organizers outreach and enhancing the organizers outreach through posters flyers and flyers and through posters flyers and flyers and through posters flyers and flyers and their their their twitter account but also informing twitter account but also informing twitter account but also informing faculty liaisons faculty liaisons faculty liaisons to give them opportunity to spread a to give them opportunity to spread a to give them opportunity to spread a word via the channels as well word via the channels as well word via the channels as well and next time maybe some more directed and next time maybe some more directed and next time maybe some more directed advertisements can be made by informing advertisements can be made by informing advertisements can be made by informing participants participants participants taking part in other workshops organized taking part in other workshops organized taking part in other workshops organized by the center for digital scholarship by the center for digital scholarship by the center for digital scholarship and this reproducts sparked discussions and this reproducts sparked discussions and this reproducts sparked discussions at other dutch universities around at other dutch universities around at other dutch universities around organizing their own repract organizing their own repract organizing their own repract there are continuously reproductive there are continuously reproductive there are continuously reproductive being organized being organized being organized um and in the um and in the um and in the yeah in different places in uh yeah in different places in uh yeah in different places in uh definitely i know of in definitely i know of in definitely i know of in europe and yeah it's um europe and yeah it's um europe and yeah it's um wanted to say that well if you would wanted to say that well if you would wanted to say that well if you would like to organize your own ripper hack like to organize your own ripper hack like to organize your own ripper hack where can you then actually find help where can you then actually find help where can you then actually find help inspiration and support inspiration and support inspiration and support and there is a github on about the and there is a github on about the and there is a github on about the reaper reaper reaper headquarters and organizing instructions headquarters and organizing instructions headquarters and organizing instructions so there's like a template that you can so there's like a template that you can so there's like a template that you can follow to all the things you need to follow to all the things you need to follow to all the things you need to think about think about think about the practical things but also uh yeah the practical things but also uh yeah the practical things but also uh yeah maybe maybe maybe do you need funding or is it how do you do you need funding or is it how do you do you need funding or is it how do you reach participants how do you get people reach participants how do you get people reach participants how do you get people to to to to submit their papers and so forth to submit their papers and so forth to submit their papers and so forth there's a twitter account is a slack there's a twitter account is a slack there's a twitter account is a slack invite invite invite the repract team has a gmail address the repract team has a gmail address the repract team has a gmail address and there is also like i said and there is also like i said and there is also like i said this paper that we published last year this paper that we published last year this paper that we published last year in in in iASSIST quarterly you can read for some iASSIST quarterly you can read for some iASSIST quarterly you can read for some form inspiration form inspiration form inspiration um also to say that um also to say that um also to say that yeah the researchers are actually really yeah the researchers are actually really yeah the researchers are actually really the ones that are the ones that are the ones that are organizing these uh real perhaps and we organizing these uh real perhaps and we organizing these uh real perhaps and we supported supported supported we are actively involved and um we are actively involved and um we are actively involved and um yeah it's a really nice collaboration yeah it's a really nice collaboration yeah it's a really nice collaboration that i hope that i hope that i hope more people would take up and more people would take up and more people would take up and yeah that was actually yeah that was actually yeah that was actually what i wanted to say about this today so what i wanted to say about this today so what i wanted to say about this today so thank you very much for your attention thank you very much for your attention thank you very much for your attention and and and i think i will just stop sharing i think i will just stop sharing i think i will just stop sharing thank you so much uh christina uh if you thank you so much uh christina uh if you thank you so much uh christina uh if you have a question for president just type have a question for president just type have a question for president just type in in in the chat box now i just have a quick question for now i just have a quick question for christina while um christina while um christina while um sandra is setting up um i noticed that sandra is setting up um i noticed that sandra is setting up um i noticed that there's a there's a there's a lunch buffet in the in the schedule is lunch buffet in the in the schedule is lunch buffet in the in the schedule is that provided by that provided by that provided by your team no no sorry it's uh it's not no no sorry it's uh it's not but it's a great spirit yeah yeah but it's a great spirit yeah yeah but it's a great spirit yeah yeah in the spirit of the rip right all right and then let's see there's another and then let's see there's another question question question um um um [Music] [Music] [Music] yes these lights and links will be yes these lights and links will be yes these lights and links will be shared shared shared i'm just answering that question all right so um our next presenter all right so um our next presenter is sandra socha sandra is the user is sandra socha sandra is the user is sandra socha sandra is the user experience experience experience and engagement librarian and data and engagement librarian and data and engagement librarian and data services librarian at mount st services librarian at mount st services librarian at mount st vincent university in halifax nova vincent university in halifax nova vincent university in halifax nova scotia canada scotia canada scotia canada the title of our presentation is the title of our presentation is the title of our presentation is computational reproducibility computational reproducibility computational reproducibility a practical framework for data curators a practical framework for data curators a practical framework for data curators sandra is passionate about research data sandra is passionate about research data sandra is passionate about research data management digital humanities management digital humanities management digital humanities and emerging technologies she is and emerging technologies she is and emerging technologies she is currently involved in a grant funded currently involved in a grant funded currently involved in a grant funded research project research project research project to digitize and make accessible canada's to digitize and make accessible canada's to digitize and make accessible canada's historic census records historic census records historic census records sandra hello sandra hello sandra hello thank you um and thank you thank you um and thank you thank you um and thank you christina presentations like yours are christina presentations like yours are christina presentations like yours are exactly exactly exactly why i attend i assist i think that the why i attend i assist i think that the why i attend i assist i think that the reaper hack is reaper hack is reaper hack is such a neat idea so thank you so much such a neat idea so thank you so much such a neat idea so thank you so much for your presentation and you've given for your presentation and you've given for your presentation and you've given me lots of ideas me lots of ideas me lots of ideas about my own practice so this is great about my own practice so this is great about my own practice so this is great i'm here to continue the conversation i'm here to continue the conversation i'm here to continue the conversation about computational reproducibility so about computational reproducibility so about computational reproducibility so uh my co-presenter shahira carr uh my co-presenter shahira carr uh my co-presenter shahira carr who is presenting at another session i who is presenting at another session i who is presenting at another session i think think think in iss so she's wonderful she's from the in iASSISTso she's wonderful she's from the in iASSISTso she's wonderful she's from the university of victoria university of victoria university of victoria her and i presented this framework at her and i presented this framework at her and i presented this framework at the canadian data curation forum that the canadian data curation forum that the canadian data curation forum that was held was held was held what feels like a lifetime ago uh in what feels like a lifetime ago uh in what feels like a lifetime ago uh in hamilton and i honestly can't remember hamilton and i honestly can't remember hamilton and i honestly can't remember if it was if it was if it was 2018 or 2019 um 2018 or 2019 um 2018 or 2019 um all of the years are are the same to me all of the years are are the same to me all of the years are are the same to me now now now but um you know through a portage expert but um you know through a portage expert but um you know through a portage expert group called group called group called uh the curation expert i don't know uh the curation expert i don't know uh the curation expert i don't know we're not seeing we're not seeing we're not seeing yours okay um i what are you seeing yours okay um i what are you seeing yours okay um i what are you seeing this i feel like i practiced this so let this i feel like i practiced this so let this i feel like i practiced this so let me see here me see here me see here is this the screen that you uh are you is this the screen that you uh are you is this the screen that you uh are you seeing in slides seeing in slides seeing in slides okay wonderful i wanted to use my notes okay wonderful i wanted to use my notes okay wonderful i wanted to use my notes but we're just gonna go but we're just gonna go but we're just gonna go without the notes um so you can see without the notes um so you can see without the notes um so you can see there the name of my uh co-presenter who there the name of my uh co-presenter who there the name of my uh co-presenter who is presenting is presenting is presenting at another session i think um so we at another session i think um so we at another session i think um so we created this workshop for the canadian created this workshop for the canadian created this workshop for the canadian data curation forum a couple years ago data curation forum a couple years ago data curation forum a couple years ago and and and we have this webpage that we created as we have this webpage that we created as we have this webpage that we created as well well well to support some of the work um and some to support some of the work um and some to support some of the work um and some of the experiments that we did of the experiments that we did of the experiments that we did so there's actually some reproducible so there's actually some reproducible so there's actually some reproducible data sets data sets data sets that we have put into our github there that we have put into our github there that we have put into our github there is a is a is a bigger version of the slides there bigger version of the slides there bigger version of the slides there um from what i have to present today and um from what i have to present today and um from what i have to present today and there's you know a reference list so there's you know a reference list so there's you know a reference list so i highly recommend uh if you are i highly recommend uh if you are i highly recommend uh if you are interested in any of this that you visit interested in any of this that you visit interested in any of this that you visit the page because the page because the page because it's really an expanded kind of version it's really an expanded kind of version it's really an expanded kind of version of what i'm going to talk about today of what i'm going to talk about today of what i'm going to talk about today um we have already talked a little bit um we have already talked a little bit um we have already talked a little bit about what about what about what is reproducibility um but what i want to is reproducibility um but what i want to is reproducibility um but what i want to talk to you talk to you talk to you first is what isn't reproducibility and first is what isn't reproducibility and first is what isn't reproducibility and christina mentioned this in her christina mentioned this in her christina mentioned this in her presentation um so there's a difference presentation um so there's a difference presentation um so there's a difference between between between reproducible and replicable and reproducible and replicable and reproducible and replicable and replicable is using new data replicable is using new data replicable is using new data with the same methods to get the same with the same methods to get the same with the same methods to get the same results so results so results so replicability uh is great but it's replicability uh is great but it's replicability uh is great but it's more difficult than reproducibility more difficult than reproducibility more difficult than reproducibility because the experiment needs to be because the experiment needs to be because the experiment needs to be arranged in such a way that new data can arranged in such a way that new data can arranged in such a way that new data can be you know be you know be you know kind of injected in there and for kind of injected in there and for kind of injected in there and for everything to work in the same way everything to work in the same way everything to work in the same way so i'm not talking today about so i'm not talking today about so i'm not talking today about replicability replicability replicability um rather i'm talking about um rather i'm talking about um rather i'm talking about reproducibility reproducibility reproducibility so reproducibility is using the same so reproducibility is using the same so reproducibility is using the same data uh the same methods and obtaining data uh the same methods and obtaining data uh the same methods and obtaining the same results the same results the same results um and that would be great um and that would be great um and that would be great if we could inspire more people to if we could inspire more people to if we could inspire more people to design experiments to have some design experiments to have some design experiments to have some reproducible results reproducible results reproducible results now i'm speaking to you today as a now i'm speaking to you today as a now i'm speaking to you today as a librarian because librarian because librarian because librarians are are going to be librarians are are going to be librarians are are going to be tasked more and more with assessing data tasked more and more with assessing data tasked more and more with assessing data sets and assessing sets and assessing sets and assessing um entire kind of curatorial packages um entire kind of curatorial packages um entire kind of curatorial packages for um you know deposit into for um you know deposit into for um you know deposit into repositories repositories repositories and the thing with us as librarians is and the thing with us as librarians is and the thing with us as librarians is that that that we're not necessarily going to be have we're not necessarily going to be have we're not necessarily going to be have subject matter expertise subject matter expertise subject matter expertise in all of the areas in which we may need in all of the areas in which we may need in all of the areas in which we may need to um to um to um review so it's important for us review so it's important for us review so it's important for us i think to understand concepts in i think to understand concepts in i think to understand concepts in computation computation computation um and reproducibility but not um and reproducibility but not um and reproducibility but not necessarily necessarily necessarily to be experts not even coding experts i to be experts not even coding experts i to be experts not even coding experts i mean i really mean i really mean i really enjoy it but um not everybody needs to enjoy it but um not everybody needs to enjoy it but um not everybody needs to have have have this intense uh coding literacy in order this intense uh coding literacy in order this intense uh coding literacy in order to be able to to be able to to be able to act as a curator for research projects act as a curator for research projects act as a curator for research projects and and and i really want to just stress that i really want to just stress that i really want to just stress that throughout the presentation one of the issues why um reproducibility one of the issues why um reproducibility can be difficult is because can be difficult is because can be difficult is because there are different types of there are different types of there are different types of reproducibility so while i'm talking reproducibility so while i'm talking reproducibility so while i'm talking today today today about computation um we can see here about computation um we can see here about computation um we can see here that there are other forms of that there are other forms of that there are other forms of reproducibility reproducibility reproducibility that are you know just as important but that are you know just as important but that are you know just as important but today we're talking about the today we're talking about the today we're talking about the computational computational computational basically computational reproducibility basically computational reproducibility basically computational reproducibility involves the ability to reuse involves the ability to reuse involves the ability to reuse the assets used to derive the hypothesis the assets used to derive the hypothesis the assets used to derive the hypothesis and the results um and this includes and the results um and this includes and the results um and this includes stuff like the input data the source stuff like the input data the source stuff like the input data the source code code code that was used to you know create the that was used to you know create the that was used to you know create the results maybe that's r results maybe that's r results maybe that's r or python um maybe that's a software or python um maybe that's a software or python um maybe that's a software environment like spss environment like spss environment like spss um and then the computing environment um and then the computing environment um and then the computing environment which is the which is the which is the entire machine that you use to run entire machine that you use to run entire machine that you use to run the experiment for example i'm using a the experiment for example i'm using a the experiment for example i'm using a mac right now mac right now mac right now you might be using a pc or you might be you might be using a pc or you might be you might be using a pc or you might be running something on linux running something on linux running something on linux you might be doing your experiments in a you might be doing your experiments in a you might be doing your experiments in a cloud computing cloud computing cloud computing situation like we have in canada called situation like we have in canada called situation like we have in canada called compute canada where we have compute canada where we have compute canada where we have the shared infrastructure um for the shared infrastructure um for the shared infrastructure um for computing computing computing so this computing environment is also so this computing environment is also so this computing environment is also a really crucial to know about when a really crucial to know about when a really crucial to know about when we're talking about we're talking about we're talking about reproducing experiments it's not just reproducing experiments it's not just reproducing experiments it's not just enough enough enough to have the code or to have the data to have the code or to have the data to have the code or to have the data there's so many other details there's so many other details there's so many other details that we need um i love to share this that we need um i love to share this that we need um i love to share this quote quote quote about computational science i mean about computational science i mean about computational science i mean really really really uh the scholarship when we're uh the scholarship when we're uh the scholarship when we're using computational methods um when we using computational methods um when we using computational methods um when we publish a paper we're really publish a paper we're really publish a paper we're really kind of advertising the scholarship and kind of advertising the scholarship and kind of advertising the scholarship and we're not we're not we're not really sharing the scholarship really sharing the scholarship really sharing the scholarship itself because it involves itself because it involves itself because it involves so much complexity there's so much so much complexity there's so much so much complexity there's so much going on in the back end that gets us going on in the back end that gets us going on in the back end that gets us to where we're going um so it's to where we're going um so it's to where we're going um so it's it's very interesting to kind of like it's very interesting to kind of like it's very interesting to kind of like wrap your head around wrap your head around wrap your head around um this notion that uh when we publish um this notion that uh when we publish um this notion that uh when we publish we're just we're just we're just kind of describing the scholarship kind of describing the scholarship kind of describing the scholarship in some ways and some people may in some ways and some people may in some ways and some people may disagree with this disagree with this disagree with this um but the fact is that a lot of um but the fact is that a lot of um but the fact is that a lot of researchers are not formally researchers are not formally researchers are not formally trained as programmers and and we saw trained as programmers and and we saw trained as programmers and and we saw this great um this great um this great um chart in christina's presentation chart in christina's presentation chart in christina's presentation sharing some of the reasons why sharing some of the reasons why sharing some of the reasons why people don't share their code uh and and people don't share their code uh and and people don't share their code uh and and share their experiments and and many of share their experiments and and many of share their experiments and and many of the reasons are the reasons are the reasons are to do with the researchers feel that the to do with the researchers feel that the to do with the researchers feel that the code maybe isn't good enough or code maybe isn't good enough or code maybe isn't good enough or cleaned up enough or polished enough to cleaned up enough or polished enough to cleaned up enough or polished enough to share um and you know this is true um and you know this is true and i find that this is true for myself and i find that this is true for myself and i find that this is true for myself that probably that probably that probably a lot of researchers feel that way and a lot of researchers feel that way and a lot of researchers feel that way and if you are acting as a researcher if you are acting as a researcher if you are acting as a researcher and using code and maybe feeling like and using code and maybe feeling like and using code and maybe feeling like not an expert i think that most of us not an expert i think that most of us not an expert i think that most of us are feeling are feeling are feeling that way and how many times do i look at that way and how many times do i look at that way and how many times do i look at stack overflow stack overflow stack overflow to do the most simple thing that i've to do the most simple thing that i've to do the most simple thing that i've done 100 times done 100 times done 100 times when i'm using python or even using when i'm using python or even using when i'm using python or even using excel excel excel um i would love to know and you can put um i would love to know and you can put um i would love to know and you can put in the chat in the chat in the chat if you have ever tried to reproduce your if you have ever tried to reproduce your if you have ever tried to reproduce your own or someone else's results own or someone else's results own or someone else's results i can speak to that very clearly for the i can speak to that very clearly for the i can speak to that very clearly for the exercises that i created for the exercises that i created for the exercises that i created for the canadian data curation forum canadian data curation forum canadian data curation forum and it was very difficult and it was very difficult and it was very difficult but we have to understand that but we have to understand that but we have to understand that reproducibility is a spectrum reproducibility is a spectrum reproducibility is a spectrum and perfection is the enemy of good here and perfection is the enemy of good here and perfection is the enemy of good here um in fact uh studies um in fact uh studies um in fact uh studies can be kind of reproducible you know in can be kind of reproducible you know in can be kind of reproducible you know in the middle where the code and data and the middle where the code and data and the middle where the code and data and the computational the computational the computational environment is there and reusers are environment is there and reusers are environment is there and reusers are able to able to able to put those things together and put those things together and put those things together and reuse and re-run the code with a little reuse and re-run the code with a little reuse and re-run the code with a little bit of work bit of work bit of work and depending on um you know and depending on um you know and depending on um you know if you like that kind of thing or not it if you like that kind of thing or not it if you like that kind of thing or not it can be really really fun and i'm going can be really really fun and i'm going can be really really fun and i'm going to to to uh share a little bit of that with you uh share a little bit of that with you uh share a little bit of that with you when we take a look when we take a look when we take a look at some of the extra resources that we at some of the extra resources that we at some of the extra resources that we have available have available have available but it's important to understand that but it's important to understand that but it's important to understand that sharing does not mean reproducible sharing does not mean reproducible sharing does not mean reproducible so just sharing your code in a so just sharing your code in a so just sharing your code in a repository does not mean that it is repository does not mean that it is repository does not mean that it is reproducible reproducible reproducible uh you know there are many things to uh you know there are many things to uh you know there are many things to consider consider consider and uh the reason why i'm presenting to and uh the reason why i'm presenting to and uh the reason why i'm presenting to you today and the reason that i love to you today and the reason that i love to you today and the reason that i love to present in iss is because i have present in iASSISTis because i have present in iASSISTis because i have something something something to share with you about reproducibility to share with you about reproducibility to share with you about reproducibility so my colleague shahira and i so my colleague shahira and i so my colleague shahira and i have created a framework have created a framework have created a framework for reproducibility and this framework for reproducibility and this framework for reproducibility and this framework is available is available is available for you to use we've put a cc0 license for you to use we've put a cc0 license for you to use we've put a cc0 license on it so you can take it on it so you can take it on it so you can take it and not credit us and make it better and and not credit us and make it better and and not credit us and make it better and use it in your practice and that's use it in your practice and that's use it in your practice and that's totally great um but totally great um but totally great um but this framework can be used by a curator this framework can be used by a curator this framework can be used by a curator with perhaps a lower understanding with perhaps a lower understanding with perhaps a lower understanding of you know computation of you know computation of you know computation and coding to really go through and coding to really go through and coding to really go through and make some decisions about and make some decisions about and make some decisions about what is in the research project what is in the research project what is in the research project maybe what needs to be done or what can maybe what needs to be done or what can maybe what needs to be done or what can be approved upon by the researcher be approved upon by the researcher be approved upon by the researcher if the curator has a chance to see the if the curator has a chance to see the if the curator has a chance to see the researcher researcher researcher and by going through the framework and by going through the framework and by going through the framework itself it's actually a really good way itself it's actually a really good way itself it's actually a really good way to learn about reproducibility because to learn about reproducibility because to learn about reproducibility because that's the thing that's the thing that's the thing learning about reproducibility is learning about reproducibility is learning about reproducibility is helping researchers to become you know helping researchers to become you know helping researchers to become you know better at reproducibility and as we all better at reproducibility and as we all better at reproducibility and as we all go through this together go through this together go through this together and develop our skills things will and develop our skills things will and develop our skills things will become easier become easier become easier and things will get more and more and things will get more and more and things will get more and more reproducible so reproducible so reproducible so i want to say not to be afraid of um i want to say not to be afraid of um i want to say not to be afraid of um you know looking at things that uh and you know looking at things that uh and you know looking at things that uh and different kinds of data types different kinds of data types different kinds of data types that um one might not be too familiar that um one might not be too familiar that um one might not be too familiar with because you know with because you know with because you know speaking to the librarians in the speaking to the librarians in the speaking to the librarians in the audience we're librarians and we always audience we're librarians and we always audience we're librarians and we always just figure it out just figure it out just figure it out and this is just one of those things and this is just one of those things and this is just one of those things that that we can figure out that that we can figure out that that we can figure out so the first kind of area of creating so the first kind of area of creating so the first kind of area of creating data sets that can be more reproducible data sets that can be more reproducible data sets that can be more reproducible is in the organization of the files is in the organization of the files is in the organization of the files looking at file names directory looking at file names directory looking at file names directory structures structures structures versioning and of course the readme file versioning and of course the readme file versioning and of course the readme file is really key to enhancing is really key to enhancing is really key to enhancing reproducibility so reproducibility so reproducibility so having you know almost a like a literary having you know almost a like a literary having you know almost a like a literary description of what is happening description of what is happening description of what is happening with the project can be really really with the project can be really really with the project can be really really helpful when that project needs to be helpful when that project needs to be helpful when that project needs to be deposited deposited deposited documentation such as information about documentation such as information about documentation such as information about dependencies dependencies dependencies uh you know relative paths uh you know uh you know relative paths uh you know uh you know relative paths uh you know code execution code execution code execution and decisions around cleaning i mean how and decisions around cleaning i mean how and decisions around cleaning i mean how many times many times many times uh in your own practice do you do a uh in your own practice do you do a uh in your own practice do you do a number of kind of number of kind of number of kind of cleaning steps on a data set without cleaning steps on a data set without cleaning steps on a data set without documenting that i mean i can speak to a documenting that i mean i can speak to a documenting that i mean i can speak to a ton of work that i've done ton of work that i've done ton of work that i've done recently uh with no documentation recently uh with no documentation recently uh with no documentation at all on the data cleaning luckily i at all on the data cleaning luckily i at all on the data cleaning luckily i used openrefine which gives me used openrefine which gives me used openrefine which gives me a history of that but still those kinds a history of that but still those kinds a history of that but still those kinds of decisions can be really helpful of decisions can be really helpful of decisions can be really helpful data documentation information about the data documentation information about the data documentation information about the raw data itself raw data itself raw data itself um you know information about the um you know information about the um you know information about the formats of the raw data and the types of formats of the raw data and the types of formats of the raw data and the types of programs programs programs data dictionaries etc and finally data dictionaries etc and finally data dictionaries etc and finally licenses licenses are so important in licenses licenses are so important in licenses licenses are so important in this area of computational this area of computational this area of computational reproducibility reproducibility reproducibility is there a license for the software is is there a license for the software is is there a license for the software is there a license for the data there a license for the data there a license for the data all of these things are tremendously all of these things are tremendously all of these things are tremendously important important important um so i invite you to share um so i invite you to share um so i invite you to share that reproducibility framework with your that reproducibility framework with your that reproducibility framework with your colleagues colleagues colleagues please use it build on it please use it build on it please use it build on it help it grow and you know use it as a help it grow and you know use it as a help it grow and you know use it as a learning tool learning tool learning tool to uh to further your practice in to uh to further your practice in to uh

2021-06-27

Show video