HILT 2017 Conference: Emerging approaches to evaluating learning
Welcome. To emerge emerging, approaches, to evaluating, learning my, name is evan sanders i am the associate director of curriculum services, for teaching. And learning technologies. At harvard medical school, and with, me is josh. Bookin the, associate, director, assistant. Director i just gave myself for promotion yeah, this. Is director of instructional. Support and development at the teaching and learning lab at h jesse so. We're really excited to, have this hour to. Think about how. Evaluating. Learning is really a critical part, of the feedback loop during, a day where we're mostly talking about evaluating. Teaching, but, of course then we think about the efficacy of what we're doing we, have to keep evaluating. Learning very much in mind and so, when josh and i thought, about how we were going to approach this session we, were you, know thinking about well what. What. Really qualifies, as an emerging approach you know what sort of inventive, in this context, is it high technology. Is it, big data. Is. It novel, pedagogy, and so of course it can be all of the above but. Rather than have us stand up here and do some boring. Literature, review about all of those things we, thought the best approach would be to highlight. A couple people. Who are here at the university who, are really wrestling, with a lot of the things that you, were just reflecting, upon on. That intro polling slide so with that and, I think it would be good to actually just reflect, a bit together on, what. Some of those things were that you said. Can. We see that, we. Can. Okay. It. Would be pretty hard to summarize this or to pick. Specific. Themes, but, I think general themes that that you can see just by glancing at this quickly is that these are higher-order, considerations. These, are things that we want to assess, about efficacy of teaching. And learning that are definitely. Hard to do with what we might consider to be classical. Methods of assessment and so that's precisely why we're, here today so. We didn't, want to spend much time on, the intro so we could get into the case. Studies which we have found individually. Very rich and thought-provoking and, we hope you will as well but, we did want to do two things in this intro and one. Was, to, think a little bit about. The. Landscape. Of, evaluating. Learning and the, other one is to give a bit, of a framework, with. An idea of we have these specific cases and we want to be able to abstract some general principles that, can be applicable to our own context, into our own practice, and, so that's what we're gonna hope to do in, short order and so, first we wanted to start with terminology and, the. Idea of evaluation, we were trying to figure out how does that. Translate. Into assessment. And so, for, us we're, thinking, about it in this session of evaluating, learning as similar. Or synonymous, to summative, assessment, and a. Common. Definition of summative assessment is evaluating, student, learning at.
The, End of some instructional, unit and so. We. Feel that that's a good bite. Sizeable chunk for us to think, about here in our hour together we're, not thus can be talking about formative, assessment also, very important, ongoing. Understandings. And checks for understanding of, what, your students know and are able to do. So. With this idea of thinking about summative, assessment. Understanding. By design is, a framework, that we, felt as we talked about these cases and I am my own experience, as an instructor and instructional, coach have. Found to be foundational. In how. I, think, about designing and help people think about designing, units. Of study whole, courses or even entire programs, and so. Understanding. By design is. A simple, framework but it's not simplistic and, it. Has three main steps and so. The idea is you, start at the end it's backwards planning, of identify. Your desired results. Then. Move to determining, what's the acceptable, evidence and then. Only then going, to the idea of what, are the learning experiences, that I'm going to engage my learners in and. For. The focus, of our time together we're going to focus ekend. Step of determining, acceptable. Evidence, and, in. Doing this in thinking like an Assessor there's some foundational, questions, that. We all need to consider, and. So we're gonna throw. Them out there and I invite. You to think about them throughout these case studies and think, about how these, three steps and the three questions I'm about to do have informed. What. These. Two different groups, of folks have thought, through as they've, tried to figure out something that can really fit their context, and their student needs, so. In terms of thinking like an Assessor. One. Is what kind of evidence do we need to. Actually see. That, our goals are being accomplished, a. Second. Is what specific characteristics in, student responses. Products. And performances, should we examine. So. This. One stood, out to me as the. Idea of it's. Not just if we have rich assessments, there's a lot of data that is being produced and we need to think about with what parts, of that picture, are important. For us to be able to you. Know take, stock of it and be able to evaluate it against some criteria, and. Then. The third one we can drill down to the individual students producing this data and think about does this proposed evidence, enable us to infer this. Knowledge the, skill or, understanding, that we're hoping to instantiate in our students. So. With, these questions, sort of taking forward as a light framework, into, the case studies the, one other thing we wanted to accomplish in the intro was. Some, typology zuv emerging approaches to evaluating learning so there's a lot of different ways in which we can and do. Evaluate, learning and what we try to sample from here, were, things in the higher ed context. Are gaining importance. And that are in line with research-based. Best practices of what we know from the learning Sciences and so. The first one that I wanted to highlight was portfolio, based assessment, and. The. Idea is not. Just a single artifact, but collaborate, compiling. A set, of artifacts from a given, student, or group, of students and, often. You, know it allows us to get a more holistic picture of what that student knows. And you, can see a progression over time and, this. Type of assessment, started. Primarily in, the. Arts, and in design, schools but, it's used much, more widespread. In a variety of disciplines especially with the advent of electronic, portfolios or eportfolios. Where it's a lot easier to compile student, work a. Second. Emerging, approach that, is gaining significance, is competency-based.
Assessment And, this, idea is about mastery, and divorcing. Mastery, from sort, of seat time if you will and for. Those of you familiar with Boy Scout merit badges that's. Sort of a, original. Type of competency-based assessment if I'm doing knot tying it's. Not that I have to go through an entire course whenever I think that I am proficient, in tying knots, I can, go for an evaluation and. Get my badge and there. Are similar types of things, like that in higher ed especially with online courses you. Can go through the material at your own pace or even skip over the material, and see. Whether or not you actually have mastery. The. Third of the four that I'm going to touch on is peer, and self-assessment and. The. Idea this has been growing in. Uptake. In higher education for, at least two significant, reasons one. The learning Sciences is showing how important metacognition, is having, students know what they know and know what they don't know and be, able to build, those skills in them as lifelong, learners with the rapid rate, of change and learning that they'll need to do as professionals and, the, second with is the growth of group work and being, able to as an instructor disaggregate. All, these different, folks who came together and created a product you, know who we. Still have individual, grades who was responsible, for what part of of the. Experience. And then, the last one I want to talk about is performance assessment. It's also known as authentic, assessment, and. This. Is the idea of having some problem based tasks it's, usually approximating. A real-world, skill, and. In. In, this type of setting when we looked over the Harvard landscape, about case studies we wanted to highlight we saw a, wide. Variety, of performance. Assessments, being used and so actually, both of our case studies focus, on performance, assessments, and we. Think it's it's particularly, important, here as we look at the types of students we have in our goals in higher, ed is undergrad, and graduate education to. Be able not just to get people to know and understand. Things but, to use it flexibly, and creatively, out in the world and so, our idea with performance assessment is with two different stories different. Contexts. And really, interesting adventures about how they've gotten to where they've gotten to we. Can learn some things that we can take away for our own practice so. With, that I will hand it over my. Name is Teddy's furrow nose I'm a lecturer of public policy at the Kennedy School of Government where I teach statistics. And, econometrics in, the core curriculum to, our, master students so the people I teach are not, going to become statisticians. As much as I'd like them to but, instead are. Going to hopefully users, use the tools of data analysis, in their sort of policy careers down, the road this, is some work that I did with make Klinger who's not currently here and Dan levy who, is here on. The use of two state exams in a. Classroom. Setting for this quantitative, course those of you that don't teach quantitative, courses I think that there's still plenty here so please don't write me off just cuz I said I do statistics, but. Hopefully. There's plenty that I think can lead to some interesting discussions, I'm gonna, start with the punchline so, if you want. To be surprised eyes, closed, ears closed for, like the next 30 seconds. This. Started, as an effort to promote, learning not, evaluate. Learning and in. The process, we came across. Metric. To use that, actually helps us understand. The ability, of our students, to collaborate so in the process of looking for a way to promote, learning we came across a way to evaluate it and the process, this has led me to question. My, own teaching, methods and to evaluate, my own teaching to understand the extent to it I can improve it over time so. Before, I get into the Nitty Gritty I want to thank the, inspiration, for this Eric Mazur and Carl Wieman are in.
The Physics departments, of Harvard and Stanford and, are kind of the originators, of this two-stage exam concept, that I'll explain in a second funding. From V pal excellent. Research assistants and excellent, colleagues at the Kennedy School that provided some guidance. So. I'm going to start by explaining why I think to safety exams are important then, I'm going to talk about how, they work I'm going to give you a sense of the data that we have in sort of the process we did of analyzing, the data and then what I think this implies so, first why to safety, exams even matter the. Thing, that has always bothered me is if, you think about a student's, experience in, most classrooms they. Work. With the course material, throughout the semester, in class out of class together they. Spent a bunch of time studying, for the exam. They. Take the exam and then. They get their grades hopefully. This is a non-controversial, sequence. The. Thing that frustrated, me is that if I thought about when, students, were actually learning, it, felt. Like it happened throughout the semester, it happened. While they were studying, but. Then the exam was basically just an assessment the exam was can, you do the things that you've been studying for or not and then. We. Posted solutions, and very, few students look at them how, few very few. 38%, in a midterm exam, we tracked. How many people actually looked at the solutions, downloaded, them 30%. For a midterm in which how well you did matters, for the finals that's cumulative still. Less, than half, of people even looked at it so. Part, of this was to try to figure out how we could bring learning into these other two parts of the process and that's, what two staged exams do so the, idea is that a standard exam let's. Say is three hours long from 9:00 to noon it's, an individual, exam you take it that's it in a. Two stage exam we. Instead divide it into a two hour individual. Exam and then. We add an, additional hour, of group. Work and when, I say group, I mean we take the. Questions. From the individual exam we. Take the most challenging. Or complex, ones that, we think could benefit from group interaction, and then. The students, redo, a subset, of those exam of those questions, together in this group of four or five and we, structure it so that it can only help, their grade it can't hurt their grade so people don't freak out but, essentially. What we're doing is trying to for students when they hand in their first staged exam don't, just walk out of the room and never think about this again instead. Engage with your peers talk through what's happening and see what you can get from it so. In, the, process, oops. We. Hope that using, to safety exams integrates, active, learning into the exam setting right it forces people to instruct, one another it forces people to be critical about how they're thinking it, forces. Them to engage with the material, even. After the initial assessment which hopefully leads to longer term gains and, it. Develops skills of debate and collaboration, related to course topics right there's a there's it may be a specific, set of skills related, to working in a group on a quantitative, question, that, isn't captured, in a single stage exam. So. What, that looks like is the stage one looks like kind of a normal exam sorry for the lighting and the.
Second Stage is this. Kind of cacophonous. Group. Of people, working. In little groups talking through different questions and issues they had dividing, up work sometimes, and really working, together to submit. A second, stage that. Represents. Their sort of collective knowledge there, are some groans and sadness in the room when people realize that they did a question wrong but, that's part of in it's sort of by design a feature, not a bug so. Generally. People. Seem. To like it 84% of the learners that we've done this with reported. That a two state exam was more helpful for, their learning than a traditional exam, but. Now I hear, people, grumbling, under their breath that this, is not a measure of actual learning I didn't. Actually hear you guys grumbling, but this. Is not an actual this is not a measure of learning this is a self-report. A self-assessment of whether or not learning, is actually happening, so, we were trying to take the data that we had and parse through it and figure out what, we can do with it so just so you no this data is from three statistics courses from, 2013, and 2017 taught multiple times 11. Total exams, about, 900, students in five different cohorts, and when. We had the second stage they were randomized, into two hundred and twenty five groups of four to five so they're, quite, there's quite a bit of data to work with and the question that we kept sort of hitting up against is what metrics do we use to try to evaluate. This data and see what it can tell us so. Here, are the ones we came up with if you imagine let's, say just three people in a group instead of four to five and in. Stage one, this. Is how they did there, were three questions let's say they, each got this grade, afterwards. They were randomized, into the stage two group and they. Did the. Same subset. Of three questions and got 25 so. On one hand we have the, state to grade to, compare things to we do also just take the average of, each, of the three and. Compare. That to their state to grade sure, straightforward we, could also say. Well what, about the highest performing, student how. Did this group do relative, to that highest performing, student I'm going to call that the top student grade so that's 25 here and then. There's this third one or fourth, one I guess that we. Thought was an interesting thing to do which is to say take, the best grade, that any student, did on any question, so, for question one the first student did the best they got nine where's, the other two got eight and seven in the.
Second One the second student did is the best and the third one the third student, did the best so, we have here is a synthetic. What. We call a super student. The. Super student is as well as anyone in the group could do on a single. One of the questions pool, together so. If we look at these four metrics that we came up with the. Stage 1 average is just the average student performance in the group the. Stage 2 is the, collective, performance of the group after collaboration. What they were able to do together the top, student is the highest perform, the students the, highest student performance, in the group and then this super student is the, collective, performance before, collaboration. And one. Way to think of this is sort. Of the raw knowledge. That. The group had before, they even interacted, with each other right it is, the most that any that all the people in the group knew about each question pool, together and then they tried to go into state two together so. These are the four metrics we used I'm, going to show you results standardized. Around. The stage one result so here. Are the here. Is the the distribution. Of stage, one groups centered, around zero that's the stage one group average just to give you a sense of the spread in the. Stage two. They. Did quite a bit better in fact they did about one standard, deviation better, in a. Group and group two than. They did in the first group one now. I'm not trying to make causal claims that, we improve learning by one standard deviation because they did have extra time to work and all of that but, it is worth noting that this one standard deviation change, is a huge change. We're, usually excited, in educational, interventions, if we can get 0.2. 0.3 standard, deviation, increases, and here, we have one if. We. Compare that to the top student, it's, actually, quite similar so the top student distribution. Is also about one standard deviation better, and. The. Super student the. Distribution, is like this which. Is about one and a half standard, deviations, so. The super student the maybe maximum, amount of knowledge before collaboration, in the group is one, and a half standard deviations over. The, stage one average, so. Here's the metric that I think is quite interesting if we, take these two gains, and. Figure. Out what percentage, of, the. Super student, was. Reclaimed, by the stage two group. We. Get this measure of the extent to which the group was able to sort of harvest, the, knowledge, from. One another and, contribute. It to their stage two group so. This 66, percent number were, calling collaborative, efficiency, this. Is in contrast, to a term used in psychology called collaborative, inhibition. Which, is how much knowledge is lost when people work together, but. We're trying to be positive so we did collaborative, efficiency, and, the. Question is how efficient. Did these groups work how much were they able to reclaim the, maximum, amount of knowledge that was across them before they interacted, and here's. What the distribution looks like so. A one, means, a hundred percent efficiency, means that, the stage-two group. Did. As well. As the, collective, knowledge going into it so they were able to reclaim a hundred, percent of the knowledge in the group what's. Kind of interesting about this is that there are people in. Which. As a group, they, did collectively, better, than. The pooled amount of knowledge in the group so. That's at least evidence, for the suggestion, that collaborating. Actually, helped to generate knowledge, not, just kind of spread the knowledge around in a way that was useful.
So. The, thing about collaborative efficiency, that I think is really worth doing is that in the Kennedy School and I think in a lot of professional, schools a thing, we actually want to stimulate. In our students, isn't the ability to reclaim, this information, in a group context, usually in the Kennedy School we say well if you work in a group that's negotiations. If you're trying to lead a group that's leadership but. We teach quantitative, classes dan and I in the, Kennedy School and we want people to be able to collaborate, and interact on, quantitative, questions, just. As much as they do when trying to negotiate something, and so, that's a skill that I want to actually develop articulating. The reason why, you have that position, understanding. The position of your other group members and being, able to assess, the strengths and shortcomings. Of your colleagues as positions, and then, trying to convince them otherwise right, knowing. When you're wrong understanding. Why they might be right or wrong trying to communicate that that's a crucial skill that we hope that, graduates, of the Kennedy School will have and that's, something that I think we're getting toward measuring, using, this two-stage exam so. Reaching this collective, position, led. To this Livi's lessons that I want to talk about which is trying. To improve. Learning just experimenting. With ways to make learning better has. Led us to find, a way to, measure a skill, that I didn't even sort of realize we. I really, cared about and I didn't even sort of realize that I was trying very hard to generate, in my students, and the, experience, has led me to think we do a lot of active. Learning in our classes we do group work we do things like that but, I sort of treated that as means to get more knowledge not necessarily, as an ending of itself and I'm. Sorry to think maybe this is a thing that I actually want, to foster in, my, students, develop, and maybe, even evaluate evaluate, the. Extent to which they're able to collaborate and so. That has been our sort of process, with two stage exams we've, gotten to this new measure and in the process it's it's, had quite a bit of an effect on how I approach teaching. Thanks.