Smart Money: Education Investments in Adolescents Earn Higher Returns
[MUSIC] Anyway welcome back everyone and thank you very much for attending today's lecture, whether in-person or on Zoom. I'm Rebecca McLennan, I'm on the faculty of the History Department and I'm also serving on the Tanner Lectures Organizing Committee this year. Today, Professor Caroline Hoxby will be presenting the second lecture in her Tanner lecture series, The Fork in the Road, the imperative of investing in adolescent education. Now yesterday, Professor Hoxby was very persuasive in arguing for the existence of a fork in the road of children's cognitive development.
One that accelerates not so much in the first three or four years of a child's life though it does a little as most laity assume, but around the age of 10.5. Now Professor Hoxby, I wish I had heard your wonderful lecture two years ago before I started writing preposterously large texts for my toddlers San Francisco pre-school. Building on this theme, today's lecture is entitled smart money, educational investments in adolescents and higher returns. We'll take a quick break after Professor Hoxby has further enlightened us and then I'll introduce today's distinguished commentators and after that, we'll move to discussion. Thank you. Please welcome Professor Hoxby [APPLAUSE]
Thank you very much Professor McLennan and thank you too commentators who are here today and anyone who is in attendance or attendance on the Zoom. I'm going to see whether I can do better with the clicker today [LAUGHTER] than I managed to do yesterday. Again, I'm just showing you some synapses to get started on the neuroscience. Mostly today we're going to be talking about tests of whether adolescents are able to learn more than students at other ages, and so that's the reason for the title of the talk today. Smart money: Educational investments in adolescents earn higher returns.
I want to start with a very brief recap. I'm going to try to keep it brief but just so people who weren't here at the lecture yesterday, know where we are. The key points were first, that increasing the share of Americans with advanced cognitive skills is crucial in my opinion, for social cohesion, for advancing economic opportunity, for reducing inequality, and for decreasing geographic on political polarization and possibly also for other things like reducing crime or improving health. Now, a recent neuroscience and neuropsychology suggests that adolescents is a key period for frontal cortex development. Since it is this frontal part of the brain that is thought to be responsible for advanced cognitive skills, at least just proportionately, we should expect some of these skills to really take off during adolescence.
In fact, I showed you this chart yesterday, measures of children's cognition and you can see that in early adolescents, this period right here from starting at about 10.5 and going to about 15.5. Generally speaking, that period, you can really see children who start gaining advanced cognitive skills, their skills takeoff on a high trajectory and other children they have their skills really stagnate during this period of time and then they tend to flatline thereafter and they gained very little, if any skill in their late teen years. In other words, early adolescents offers both great opportunities to learn, but also risk of not learning. After early adolescence, I showed you some charts that looks like this that show that students skill trajectory start to harden. This is consistent with what we would expect from neuroscience, specifically the pruning and myelination processes that occur in adolescents, which harden up these trajectories.
I also showed you a couple of maps of the United States, this is one that I happen to like a lot. I think it expresses political frustration and resentment of elites that may stem from the economic fatalism that some people feel when they realized that they have attained young adulthood without acquiring advanced cognitive skills and they may be shut out of remunerative and fulfilling careers. This one shows you what people think, scientists think about climate change. It's important to say it's not what they think about climate change. I concluded yesterday's lecture by saying that it's one thing to try to identify early adolescence as a key time when students should start developing advanced cognitive skills. It's one thing to conclude that as a matter of logic, or by looking at the neuroscience.
But's another thing to demonstrate that we actually have realistic educational interventions that would be especially productive if they were implemented in early adolescence. Today's lecture is all about those demonstrations. I'm going to leave this up for right now and tell you a little story about how I became interested in adolescence now that we know where we are going forward. I teach an undergraduate class, it's called the economics of education and in it, we regularly examine recent research on early childhood education. Everything from the very famous Abecedarian and Perry Preschool randomized controlled trials that are now almost 50 years old to recent research on the Federal Head Start program. Every year I teach my students the theory of endogenous skill growth which posits that children who acquire skills sufficiently early benefit more from each later educational experience creating trajectories that diverge further and further apart with age.
Now this theory predicts that positive early childhood interventions such as Abecedarian, which believe it or not enrolled children at four months of age on average, or Perry Preschool, another very small but intensive program , or Head Start, that's the big federal program, a very big program should launch children on a permanently steeper trajectory of skill growth. In fact though, the evidence looks quite different, the effects of these programs are evident during the intervention period itself, when you're enrolled in Abecedarian, or Perry Preschool, or Head Start, but then they tend to die out fairly quickly by second grade typically, sometimes even first grade. The students and I would always find the theory of endogenous skill growth compelling and intuitive.
We would think to ourselves, this makes sense, this comports with our own personal experiences. We would finish up the lectures on early childhood education with a sense of class-wide frustration, including myself. After class, I would inevitably field many questions along the lines of, if the theory is so compelling and intuitive and we all think it makes sense, why isn't there a stronger evidence for it? The other experience that forced me to pay attention to early adolescence was my research on charter schools in New York City. In this research, some of which we will see later but very briefly, I studied more than a 100 charter schools using an experimental design that makes use of application lotteries. Essentially, the experiment compared students who were and were not given seats in charter schools purely because of their randomly assigned lottery numbers, I'll talk more about that a little later. Now, some of the charter schools start with pre-kindergarten or kindergarten, and then they end with an elementary grade, let's say Grade 5 or Grade 6.
Some charter schools start with Grade 5 and end with Grade 8, so their charter middle schools as it were. Some charter schools are more like charter high schools, they start typically with Grade 9. Now the many economists who work on early childhood had led me to strongly expect that if a charter school was going to be a success, it would be more successful if it started with the earliest possible grade, thereby launching its pupils on a higher endogenous growth trajectory of skills. I was really baffled by the people who wanted to lead charter schools that started at Grade 5.
I thought you're just setting yourself up for failure, you should have started your charter school with kindergarten or pre-kindergarten. Why are you starting with Grade 5? I was even more concerned because the data showed very clearly that the schools that started with fifth graders had applicants who were already struggling and already falling behind in the regular public schools at the time that they applied to the charter school. I thought this is too late for these students they're never going to get the full benefit of a successful charter school, if the charter school is successful. When I computed the results in this charter school study, it was immediately obvious that these Grade 5-8 schools, the middle schools, were not only generating greater skill gains than other charter schools, their gains were often much greater. I reported the results, but I really just didn't know what to make of them or how to explain the differences.
I also became aware that there was a reasonably large Mathematica study, that's a non-profit research organization. It was conducting a large randomized control trial on charter schools and it was coming up with very similar results about the middle charter schools doing an unusually good job. One day I was walking back to my office after having taught my early childhood classes and having heard all of my students frustration, I thought about the charter school evidence and I thought about the early childhood evidence. I realized that when I looked at education data, which is of course something I do ordinarily in life, that the fork in the road did not appear as early as I had expected it to occur, something like kindergarten or pre-kindergarten, but rather seemed to events itself mainly in middle school.
I also recall that a sizable share of students appear to get stuck at the middle school level of skills. This is typified by the many students who are still struggling to learn Algebra 1 in their freshmen year of college, even though they have been taking it in variously named math classes since the seventh grade. Indeed, Algebra 1 is the most commonly taken college course in the United States, and freshmen who fail to master it often have to drop out of college, not just because it's a prerequisite for majors like engineering or science or something along those lines, but it's also a prerequisite for many remunerative community college majors, just dental hygiene, for instance or a lot of nursing majors. Anyway, this concatenation of thoughts about the charter middle schools, the early childhood education, and thinking about these people who get stalled in these middle school levels of skill, made me rush off to the library, which luckily is very close to my office, and grab a whole stack of books and articles on neuroscience and adolescents, and that was the genesis of these lectures. I just grabbed the books, and I had a wonderful time taking a month or so off from economics and reading [LAUGHTER] neuroscience and neuropsychology instead. "If adolescence is indeed an age of opportunity," this is a phrase coined by Laurence Steinberg, "then it also has a big advantage vis-a-vis early childhood."
Early childhood interventions are plagued by logistics. Many adults are needed to provide infants or toddlers with adequate care. Children of that age, think of this, four-month-olds, need help with basic tasks such as eating, toileting, and dressing.
They usually have to be dropped off and picked up, they cannot just ride on school bus with other older children, and many parents are just uncomfortable with handling their toddler over to others for six or more hours a day, unless this setting is intimate and familiar to them. Adolescents on the other hand are quite different in terms of logistics, they're fairly independent people, they are already in school for six hours a day. In other words, if we can find educational interventions that improve their cognitive skills, they are a captive audience. Now the next thing that I want to talk about is the current neglect of adolescent education, which might surprise you very much.
Given the fact that this is this key opportunity time, and a time that's a risky time, it's unfortunate that adolescent grades are by far the most neglected in terms of teaching. The neglect in my opinion is unintentional and it derives from some basic phenomenon. Class size is the largest in the middle school grades, that's about grades 6-8, so when I say middle school grades, keep thinking grades 6-8, usually. This is partly because of the fascination with early childhood education, combined with the fact that the logistics for smaller kids are just more difficult, so schools tend to put them in smaller classes.
High school students are also in small classes but for a very different reason. It's really because the curriculum starts to break up in high school, so it's not just that there's science, but there's chemistry, and there's physics, and there's earth science, and biology, so classes gets split up into smaller and smaller units. The same thing could be true of history: there's US history, European history, world history, and so forth.
Evidence on class size, sorry, this is from one of my cognitive skill graphs showing you that early adolescence seems to be a time of divergence. This is average class size by grade for the state of North Carolina, which happens to have amazing administrative data, so those are the data that I'm using for this, and I'm going to keep using the North Carolina data for a little while for the rest of the lecture, so you'll see more North Carolina data. But if we look at average class size by grade, you can see that pre-kindergarten has about 14 children in each class, very small. The rest of the elementary grades from kindergarten to grade 5, it's usually 20 or fewer students in a class. Then we have the middle school grades 6, 7, and 8, and they are the middle school grades in the state of North Carolina. You'll notice that all the class sizes are 27 or 30 even, so much bigger class sizes, and then we have a decline in class size as we get higher and higher into the high school grades, with the smallest class sizes being in 11th and 12th grade.
From now on I'm going to cluster some of those grades together, both to keep the graphs a little bit neater, and also because there are some data reasons for doing that, into which I will not go. This is basically the same chart as my last chart, and I'll keep using the same colors all the time. Elementary will always be green, middle school will always be pink, and high school will always be blue. It's just a way of collapsing the data to make it a little bit easier to read.
Not only is class size the largest for middle school teachers, but their compensation is lower than that of elementary and high school teachers. You can see again green, pink, blue, and the pink bar is the lowest, and that's because middle school teachers are paid less than elementary and high school teachers. This is not intentional, but it appears to be due to the fact that teachers find it taxing and difficult to work with adolescents, probably due to the brain development that makes them especially plastic cognitively at that age, or it could just be puberty and the social [LAUGHTER] transformation that they're experiencing.
In any case, teachers who teach middle school grades are more likely to report problems such as student apathy, student mental health issues, or students being belligerent. They're less likely to say that they are satisfied with their current job or satisfied with teaching. It should not come as a surprise that teachers often move to an elementary or high school as soon as vacancy arises in one of them.
As a result it's middle schools that are always seeing vacancies occur and having a tougher time filling them. Now most public school teachers, as you may know, are paid in the United States according to a scale that depends only on their years of seniority and their highest educational degree, like whether you have a bachelor's or a master's degree. This is called lock-step pay.
Thus middle schools have lower-paid teachers because they constantly need to fill their vacancies with more first-year teachers or rookies, or teachers who have little or no seniority in the district, or teachers who have been taking years off. That's what's shown on this chart. This is the percentage of teachers who are rookies in elementary, middle, and high schools, and you can see that you have more rookies in the middle schools. This is the percentage of teachers with no experience in this district, so they are completely new to the curriculum, they may not know things as well, and you can see again, that's higher in middle schools, and it's because more vacancies arise in middle schools. It's not just a problem for having very little experience with teaching, it's also a problem for teacher quality.
Although teacher value-added research, which I am going to talk about in a moment, finds little increase in teachers' effectiveness after they have three or four years of experience, rookies are consistently found to be less effective, and so are teachers who have no experience in their district. In other words, middle school students not only encounter larger classes and less well-paid teachers, they also face teachers who are more likely to be struggling with instruction, and who are dissatisfied with their current job. Middle school teachers are also less likely to have a graduate degree. As shown here, you can see that the graduate degrees are quite disproportionately higher for people who are high school teachers.
This is not a particular surprise. Remember that the middle school teachers are more likely to be rookies, and so many teachers in the United States actually get their master's degree while they're already teaching, and they do it part-time. They're just less likely to have a graduate degree. Well that means they're going to be paid less because of the lock-step pay, which gives you an automatic pay promotion if you get a master's degree.
All of these phenomena add up to this figure, which I think is very dramatic. What this is showing you is the amount spent per pupil on teacher compensation. You can see that it is much lower for middle school students than it is for elementary school students or for high school students. In fact, it's less than half of what it is for high school students. Why is this? It's all of these forces combining.
It's the bigger classes, it's the less experienced teachers who aren't getting the seniority pay, it's teachers who don't have a graduate degree, so they're not getting that bump in pay, and that means that what we have is middle school students actually being unintentionally neglected in terms of the resources that they get. At this crucial point in time, where their brains are plastic and they're trying to make transitions to more advanced cognitive skills, they're also being comparatively neglected. Regardless of the reasons why this happens, it's clearly counterproductive in the sense that I tend to think that middle school teachers really may need combat pay of some type, not the other way around because they're dealing with more risky, harder to teach children. Now that we have established that early adolescents are relatively neglected people, I want to return to the central goal of this lecture.
That is conducting tests that can generate plausibly causal evidence of whether early adolescents are especially affected by successful educational interventions. I had four criteria for the natural or policy experiments. I'm going to use those two words interchangeably that I employ. I want to test interventions that one, are specific to a grade or an age so that I can compare results across ages.
I can compare what happens to an adolescent versus someone who's in elementary or high school. Two, my second criteria is the intervention should be something that you could apply at any grade. Again, this is so that I can compare results across grades.
In other words, if it's some trigonometry intervention, you obviously are going to apply that to a third grade or so. It can't be an intervention of that type. Three, my third criterion was that I wanted to test interventions that have typically been found in the past to how statistically significant effects. That's because there's just no point in comparing results across grades if all of the results or null or extremely noisy. My fourth criterion was that I wanted to test interventions that allowed me to examine later effects such as a student's achievement at the end of high school or going to college. After all, given that age only moves in one direction, we keep all getting older, sad but it's true.
I can obviously not see the effects of an intervention that is applied to ninth graders and then see now that I've affected them as ninth graders, what would have happened to their third-grade test scores? They're already past the third grade. But I can test the same intervention if I apply it to third graders and ninth graders and then look at both of them at the very end of high school and say, where did it make the most difference? Now, it turns out that I had to rack my brain to come up with interventions that fulfilled all of these criteria. I actually only came up with three that I think are at all good.
The first one is what I think as the most important one. It's about a teacher's value-added. The idea is quite simple. It's having a teacher with high versus low value-added. In other words, ability to teach students more materials, ability to teach students more skills.
Would I want to have a teacher when I was in middle school who has high value-added versus elementary school versus high-school? I'm not going to get a high-value-added teacher every year. Where would I like to concentrate then if I had the choice? My second, which I hope you will find relatively fun, is being exposed to a curriculum that is more challenging cognitively. For instance, if a school district is going to introduce a new testing or curricular regime that has richer cognitive content and it's going to do that across all grades, should students want to see that happen when they're still in middle school as opposed to say high school? Then I'm going to talk about attending a successful charter school going back to the story with which I began. If students are only going to be able to secure a charter school seat in one lottery, they can maybe win an elementary school, maybe they could win in middle school, maybe they could win in high school. Where would you want to win the most? I'll show you those results very briefly. I'm going to spend most of my time on this value-added because I think it's especially important, partly because of who teachers are and partly because it's really the best policy experiment, just to be clear.
In my first natural experiment, I'm going to test whether having a teacher with relatively high value-added is especially important in the middle school grades. The way you want to think about this here is, we could conjecture, what if we could just concentrate all of the most effective teachers in the middle school grades. Perhaps we could pay them more to teach in those grades, or maybe we could reduce class size in those grades to make those jobs more appealing. We could also have other amenities for them. For instance, occasional sabbaticals to attend graduate school or do something like that. Something to make them want to take those jobs and stay in those jobs.
Compared to the curricular changes in the charter schools, I'm going to focus more on the teacher value-added experiment because it's not only the most informative, but in my opinion, it's the most relevant to policy. There is now a very extensive body of research that shows the individual teachers matter and that they differ substantially in their value-added, even within the same school and the same grade. Therefore, any policymaker could propose to pay them or hire them differently, as mentioned earlier, to induce them to teach particular grades.
Now, I'm going to use the most validated method of estimating teachers' value-added. It may seem a little complicated when I describe it. But in fact, it is intuitive if we compare the problem to a simple randomized controlled trial, which is what the method mimics.
Suppose that there are two third-grade teachers, Smith and Jones. They have classrooms located right next to one another. Every year the school principal randomly assigns 20 students to Smith and 20 students to Jones, something like that.
Then Smith and Jones are essentially conducting a very small experiment every year, 20 getting the Smith treatment, 20 getting the Jones treatment. If we did this for awhile, we could answer the question, do students systematically gain more skills in Smith's classroom than in Jones's classroom? Now, of course, in any one given school year, Smith or Jones might get an unlucky draw of students for whom learning is difficult. Even if we try to control for each student's characteristics such as poverty, race, ethnicity, and gender and try to control for their prior achievement before coming into Smith's or Jones's classrooms, there will probably be some unobservable differences in their two classes' tendency to learn. But if you repeat the experiment for several school years, it will tend to iron out this problem simply because of the randomization.
It's not going to be that Smith is always lucky or that Jones is always unlucky. As long as you have multiple years of data, think of them as multiple little experiments on the same teachers, and you have students' prior achievement and their characteristics and you have an accurate measure of achievement for each year, then we can calculate a teachers value-added with reasonable precision. Now, those who enjoy statistics can listen to this brief description of the method and others can tune it out. [LAUGHTER] I'm going to give you what the method is.
But if you don't like statistics, you can just put your fingers in your ears. Calculating the value-added of teachers, I told you the method is more complicated than my Smith, Jones example. Is you regress a student's test score on his or her prior test scores, you could use a different outcome, by the way, it does not have to be test scores.
It just happens that people usually describe it in terms of test scores. Also in their predetermined characteristics such as poverty, race, and gender. Actually regressing them on the predetermined characteristics often doesn't matter very much as long as you have some prior test scores. You take the residuals from this regression and you sum them at the level of the teacher by year, by class. Then you take each summer of residuals and regress it on the other sums of residuals for the same teacher but in different years. The prediction from this last regression is that teachers' value-added.
This is the method that is most validated in actual randomized control trials where researchers are able to assign students to teachers. It also produces something called the correct shrinkage, which really just means that luck is not going to dominate your estimate of teachers' value-added. Smith always being lucky or Jones always being unlucky, something like that.
I'm going to apply this method to the North Carolina data that I've already mentioned. Then having obtained an estimate of each teacher's value-added, I'm going to do a pretty simple exercise. What I do is I'm going to plot the distribution of teacher value-added. I'm going to do it differently for math and reading by grade. I left out some grades to keep this chart from getting too busy.
I'm going to always show and I made a mistake by the way, under legend of the chart. It should say grade 3 is in the light blue or cyan. Then it should be grades 6, 7, and 8 not 5, 6, and 7. Keep in mind, it's always 6-8 are the middle school grades.
They're all plotted in a orangey color. It doesn't matter. You can hardly tell the three lines apart anyway.
I just plotted them all in the same color spectrum. Then I'm also going to plot the distribution for grade 12. What do we see when we look at this chart? I should say, by the way, it's a mechanical fact that teacher value-added is always pretty much centered around zero. We know that Smith may be better than Jones, but teacher value-added is about their relative ability to teach students skills and get students to learn. It's not really about, you don't want to take the numbers on the bottom are less important than the shapes of the distribution. That's how maybe you should think about it.
These distributions are smoothed a little bit as my fellow economists recognize that those graphs come from a program that we all like to use a lot. If we look at the distribution for third grades teacher value-added, you'll see it's centered on zero and a lot of the mass or the density is piled up right around zero. It's a quite narrow distribution. What's the interpretation of this? It means that if I get an unusually good third-grade teacher, that is better than getting one who's unusually bad. But the difference between an unusually good teacher and an unusually bad teacher in the third grade is not all that great in terms of the value-added that I'm going to get out of her classroom.
Now if you look at grades 6, 7, and 8 in this orangey colors, you can see that these distributions are much more spread out. They're still centered around zero, but the density or the mass is much lower than it is in the light blue distribution for the third graders and they're just more spread-out distributions. Now if I happen to get unlucky with say, a grade 6 teacher or a grade 7 teacher, I would be way down here.
If I get lucky, I could be way up here. Middle school teachers have bigger differences in value-added than third-grade teachers do. Your luck in being assigned to a good teacher or the less good teacher [LAUGHTER] as a student matters more. One standard deviation below the mean is just much more meaningful.
Then finally, the 12th-grade distribution, that's the one that's in purple. That shows you that the 12th-grade distribution is more spread out than the third-grade distribution, but it's between the distributions for the third grade and the distributions for grades 6, 7, and 8. This is consistent with some hardening of the student's trajectory of cognitive skills and later adolescence. As I discussed in yesterday's lecture, I would put it this way. It's just harder for a 12th-grade teacher to change a student's skill much in either a positive or negative direction than it is to change the cognitive skill of a sixth, seventh, or eighth-grader.
Now, one might worry that teachers are assigned to positions in such a way that middle school teachers are simply less similar for some reason in their efficacy. For instance, if you think about all of these rookie teachers, you might say, well, they're just less similar than teachers who have greater seniority. To test this possibility, this next figure, which looks just like the previous one so don't worry about that, is based on by-grade differences in value-added for this same teacher. The way to think about it is this. If I teach sometimes in the sixth grade and then sometimes I'm teaching in the fourth grade or something like that, then I'm only looking within that same teacher. It is not about assignment of some teachers to the sixth grade or some teachers to the fourth grade, a teacher has to be moving among the different grades to be included in this figure.
You can see it looks tremendously like the previous one. I have to like look carefully, but so I can tell them apart from one another. But it is actually making a different use of the data, it is looking within teachers, so we don't have to worry about that potential assignment problem. In this figure, of course, what we can see is that the third-grade distribution is much more narrow and has a lot of mass or density around zero. The teacher value-added distribution for grades six through eight is the most spread out, and grade 12 is someplace in between.
The next two figures, I will show them just briefly, are for reading. This is between different teachers and this is within different teachers. It's just to let you know, it's not just all about math. Teachers also differ significantly in their value-added across reading. Now the next figures that I'm going to show you are going to do another exercise and they differ only in the outcome that I'm going to study. For instance, I'm going to study outcomes at the end of high school, like how well you do on college aptitude tests as a student, your SAT scores, or your ACT scores.
The next figures are going to show the result of this other exercise that's fairly different. They're just going to differ in which outcome I look at it at the end of high school. For instance, your SAT or ACT scores, or whether you enroll in a four-year college. Then what I'm going to do is I'm going to say, well, does it matter when you got the teacher with the high-value-added for your outcomes at the end of high school. The figures are a little bit complex, so it's worthwhile explaining the first one carefully and then once you understand it, the others are all the same. The outcome just changes.
Here's the figure. What we have is grades going across the bottom. Grade 3, 4, 5, 6, 7, 8, in North Carolina for reasons that I don't need to explain, there is really no Grade 9 testing, grades 10 and 11.
What I want you to see here is I'm regressing a student's math or ACT scores. They're all converted to the SAT scale so don't worry about that. On her third-grade teachers value-added in math then her fourth-grade teachers value-added in math and so on. This first one is saying if I had a teacher who was better than average in value-added in the third grade, how much does that make me have higher SAT or ACT scores at the end of high school. Same exercise, but for the fourth grade, fifth grade, sixth, seventh, and eight, and then 10 and 11.
Now, if a teacher is value-added, had the same effect on you regardless of the grade in which you encountered that teacher, then all the bars would be of this same height. They're obviously not all of the same height. Most notably, the ones for grade 6, 7, and 8 are significantly taller than the ones for the other grades, the elementary grades, but also the two high-school grades. That really looks like having a middle school teacher who has high value-added is what's going to have the most effect.
It's not just that you take the SAT at the end until high school would matter more than middle school because you can see high school actually matters less. The figure demonstrates that a middle school teacher can change a student's later outcomes more than elementary or high school teachers and this evidence confirms the idea, at least to me, that students are especially plastic when developing advanced cognitive skills in middle school. Having an effective teacher in those grades sets a student on a steeper trajectory with endogenous skill growth, ultimately producing substantially higher SAT or ACT scores. Now I want to know the role of endogenous skill growth here, something I emphasized a lot yesterday. Let's look at the Number 32.6 for seventh grade.
It does not mean that having a teacher who sort of plus 1 in terms of a standard deviation, raises students SAT scores by 32.6 points in any immediate way. Rather, it's having a teacher who's better in the seventh grade allows you to learn a little bit more in the eighth grade, and then you learn a little bit more in the ninth grade because you learned a little bit more in the eighth grade, and the whole effect of that is the 32.6 points. It's like compound interest. That's how I like to think about it. You can see how it adds up over time.
The next figure is similar to the last one except that it's going to look at teachers' value-added in reading and actually their value-added in reading, the effect on your verbal SAT or ACT scores is even bigger than the effect of those, the value-added on math SAT or ACT scores. Otherwise, it's not very different. Again, I think endogenous growth is really playing a role here because those numbers are quite big. It couldn't really all happen in one year.
It needs a few years to have compound interests to build up. Now, SAT and ACT scores are loved by some people and hated by others. There are other ways to measure whether students are taking on cognitive tasks that are truly challenging. The next couple of figures, I'm going to look at College Board Advanced Placement Test and important subjects.
Because the College Board Advanced Placement Test have the same curriculum and the same test across all the high schools in the United States. I don't have to worry about some grading standards being easier than others. For instance, this figure is showing that if you have a teacher who's a plus one in terms of standard deviation in seventh grade, it raises your probability of taking AP calculus by 6.7 percent. Or if you have that teacher at eighth grade, it raises your probability of taking AP calculus, by 8.8 percent. Now, obviously, they're not taking AP calculus in seventh and eighth grade.
It's that later on, having had this teacher allows you to end up being more likely to do that. There's a very similar result for AP science classes. I'm not going to show you the figure. It looks somewhat similar to that.
They're not taking AP chemistry in the seventh or eighth grade either. But they're having had a better science teacher or better math teacher in the seventh or eighth grade makes a big difference. Teachers value-added in reading, this is on taking AP English by the grade in which the teacher taught students. Again, it's actually bigger than it is for math, which I find fascinating. For instance, having a high value-added teacher in the seventh or eighth grade raises the probability that you take AP English probably in your 11th or 12th-grade year by a little more than 10 percent. Finally, oh, I have two more.
Finally, not finally. Let's look at GPAs as an outcome. This is the effect of teacher value-added in reading on your final GPA in high school by the grade in which the teacher taught students. Again, we see this disproportionate effect of the middle school grades on students' final GPAs in high school and the math version is similar, but I won't show it to you. Then finally, I wanted to consider the effect of having a high value-added teacher on enrolling in a four-year college. That's because I think for your college is a very authentic and important outcome for many people going to colleges, one of the things that change their lives.
So this figure is showing you that having a teacher who's one standard deviation better in math in grades 6, 7, or 8 raises your probability of enrolling in a college by 6.3- 8.1 percentage points, and by the way, the alternatives to enrolling in a four-year college, or employment, the military, enrolling at a one or two-year technical or a community college, or just playing in activity, which is actually quite common among young people. Interestingly, encountering a teacher who had value-added reading a plus 1 has an even bigger effect than having math of plus 1 for college enrollment. The effects of seventh and eighth grade reading teachers are more than 16 percentage points more likely to go to a four-year college.
That's an impressive amount. Again, probably due to indigenous skill growth. I think the impressive and facts of value-added in reading on later outcomes demonstrate that the critical reading skills that we developed through reading and writing are just as, if not more important for a person's long-term outcomes like college attainment, and this is consistent with the fact that an inability to process college-level materials or read college-level texts is a consistent stumbling block for the many students who regardless of their preferred nature, find it hard to absorb material and they end up dropping out of college for that reason. Now I'm going to switch gears.
We're going to stop looking at teacher value-added, and instead we're going to talk about introducing a cognitively more challenging curriculum. Now the State of Texas has a longstanding and fairly high-stakes accountability system. It's based on students test scores. Not only do the schools themselves get graded, but students need to pass an exit exam in order to get a high-school diploma in Texas, and also promotion to a higher grade is somewhat dependent on having done well on the tests at the end of that school year, although there's more discretion apart from the exit exam, the final one. In addition, Texas is only one of two states that select the textbooks for its school, so it has an unusual degree of influence over curriculum.
In practice though, I think that curriculum is more often influenced by tests than by textbooks. A teacher can ignore the new textbook that is put into her classroom and just decide, they're always giving me new textbooks. I don't want to have to learn how to do this with this new material. I don't want to have to learn new problem sets. I don't want to have to learn new readings. I'm just going to ignore the new textbook because I know in two or three years there will be another new textbook that will come down the road.
But a teacher cannot ignore the skills that are being tested by the exam that her students will take at the end of the school year. Typical, a testing regime typically lasts at least a decade, so the one that we're going to be talking about in Texas has been in place for more than a decade at this point. Therefore, schools and teachers paid a great deal of attention when they were switched, starting in 2011, from the testing program known as the Texas Assessment of Knowledge and Skills, I'm just going to call it TAKS from now on, so I don't have to keep saying that long phrase, to a new program, which is known as the state of Texas Assessments of Academic Readiness, or STAAR. So from TALKS is the earlier one, and STAAR is the later one. Now the switch is interesting because making the tests more cognitively challenging was the explicit motivation for the introduction of STAAR. To make the tests more cognitively challenging, the creators of STAAR design questions that require critical thinking more often than the TALKS tested.
The STAAR questions were deliberately constructed to make it hard for teachers to teach to the test, or coach students in test-taking strategies that tend to work well on a multiple-choice exam. STAAR questions often ask students to provide an answer and not just only guess over multiple choices, or choose over multiple choices. There are three short essays on the STAAR exam, each in a different format, whereas TALKS had no essay requirement. To assess whether a student truly understands the concept, this is a subtle thing, the STAAR exams are timed. Why? It's because time pressure makes it hard for students to simply try out each answer on a multiple-choice test.
But the TALKS exam, you can take as much time as you liked, so you could just try out each answer. You didn't have to really deeply understand how the problem worked, and we'll see an example of that in a moment. STAAR test also focused on the content that should have been taught in the grading question, thereby assessing whether students are making progress, visa being evermore challenging curriculum. TALKS tested content that was often way below the grade level of the students. An eighth-grade student could possibly feel like he or she was doing quite well on the test, even though he or she was only really answering the questions that were for fourth grade or something like that.
So to make this, I made all these claims about this difference in these tests, and it's going to help to examine questions from released prior-year tests. The Texas State Department of Education releases prior-year tests, not quite all the questions, but you can download them pretty easily from their website. I'm just going to show you a couple here.
The first one is, this was a TALKS math question for seventh graders, and I'm going to keep showing you seventh-grade questions because they seem most relevant. Which list of integers is in order from least to greatest? Is it negative 42, negative 39, negative 4, 40, 41, or is it 41,40, negative 4, negative 39, negative 42? So this is just not a seventh-grade level question. It's a very easy question that a lot of fourth or fifth graders should be able to get right.
Or here's another example of a TALKS math question for seventh graders. This is really an arithmetic question, has nothing to do with algebra. It simply requires you to divide 9 by 3, and that's obviously 3. Then square it, then you get 9. Multiply that by 5, you got 45, and add 5, so you come up with 50 as the answer to the question. But these are easy math questions for a seventh-grader.
In contrast, if you look at this STAAR question for seventh-graders, you can see that this is basically an algebra question. It also requires you to understand that this is our Cartesian diagram. You have to understand negative and positive numbers, two different axes, x and y.
You have to understand how to think about slope of a line. You can see that what you have to choose among, they are multiple-choice. But there are four different algebraic equations that represent that line, so it's a significantly harder question that requires more critical thinking. Now I told you that I hope we will get to something that's rather funny. So I want to show you a TALKS seventh-grade reading question, so this is, again, the older test that's less demanding. It focuses on an Austin, Texas festival known as Spamarama.
That's after that canned processed meat that we all know, SPAM, they're not love meat, but we all know. The contest has two divisions; one for professional chefs and restaurant owners, and one for amateur cooks. In the amateur division, everyone is welcome to show their stuff.
One contestant entered the contest, Spamarama, with a dish that was a mixture of cheddar cheese, mayonnaise, SPAM, and raisins. The dish's poor rating at the contest did not deter this stubborn individual hoping to find a more accepting panel of judges, the next year, he froze his entry and brought it back the following year. In keeping with the spirit of the event, the judges decided to create a last place, even if there were 100 entries award just for him.
That's the paragraph that students have to read on the TALKS test, and that question was, one contestant in Spamarama froze his food entry because he planned to carve it. He missed the entry deadline. He wanted it to be eaten cold, or he thought he deserved to win, so this is a very easy reading comprehension question for a seventh-grader. Now if you look at STAAR's question, it's obviously much harder. It's on a more serious subject as well.
It's about Sainsbury's, which is a big grocery store chain in the United Kingdom, the Safeway or A&P of the United Kingdom. They're talking about food waste. This is a longer article that is about three pages long and relatively complicated. I've picked out just one paragraph and I hope you get the basic idea of just the one paragraph would be better to read the whole thing, but too much time. In spite of Sainsbury's efforts, a large volume of waste, in other words, food waste remains.
Machine grinds the waste into slushy goo, which is then poured into giant silos called anaerobic digesters. These giant silos act like artificial stomachs. Inside microbes, digest organic waste and produce methane bubbles. The same thing happens to organic waste, and landfills, and waste treatment plants. The difference is that these anaerobic digester silos are tightly sealed so they can capture and store the methane.
The resulting biofuel can power vehicles, or it can be burned to produce electricity. Sainsbury's management estimates that it's ADs, those are anaerobic digesters, can produce enough energy to power 2,500 homes for one year, or it can make enough electricity to completely remove one of its stores from the public power grid. So this just requires a level of understanding. There are actually a few hard vocabulary words like anaerobic or biofuel in the reading example.
What is the most likely reason that the author wrote this selection? They're trying to get the student to think about, what's the structure of the argument? What argument is being made? You can read the answers for yourself. I think the correct answer is to demonstrate that creativity can help to solve environmental problems. But it's undoubtedly significantly harder question than this Spamarama question, I think we would all agree. To test whether the switch from TALKS to STAAR affected students' cognitive skill development, I'm going to focus on two cohorts, in particular, the students who entered the ninth grade in 2011, that was the first year of STAAR. They experienced STAAR, the heavier, more difficult exam throughout high school.
However, this cohort did not experience any STAAR-driven, more cognitively challenging curriculum in any of their middle school grades. The second cohort that interests me is composed of the students who entered fifth grade in 2011, so they experienced the more cognitively challenging curriculum both in middle school and in high school. The difference between these two cohorts is not their experience of the STAAR-driven curriculum in high school because they both had that, but only the second cohort experienced STAAR-driven cognitively challenging curriculum in the middle school. So by comparing their later performance on the SAT, we can test whether it is crucial to come up against challenging material in middle school.
Unfortunately, a simple comparison between the two cohorts is not all that straightforward, and that's because the college board undertook a major redesign of the SAT starting in 2017, just with the worst possible timing. So not only did the redesign change the scoring, but also fear of taking a new test for which they had not prepared, caused many students to take the exam in 2016 when they ordinarily would have taken it in 2017, and in addition, of course, testing in 2020 was affected by the pandemic, so I have done my absolute best to make sure using the concordance tables that everything is in the right scores. They are rationalized with one another, but I still need to have a control for Texas that did not experience the talks to STAAR transition of curriculum, but did experience the same external events like the redesign of the SAT. To find a control, I'm going to adopt the method called synthetic controls.
This method combines all of the possible controls for Texas. In other words, all of the possible states that could be controls for Texas to create a synthetic Texas. The weights on these other possible controls are optimized to match Texas's time pattern in SAT scores as well as possible in the pre STAAR period. Other words, in the TAKS period, and the weights and the controls are then validated outside of the optimization period.
It has to also do well out of the period which used to generate the weights. The weights have to do well when you go out of sample and see that they can still be valid. That's a very important part of using synthetic controls. This all sounds a little complicated, but it's actually very intuitive. I love synthetic controls.
Texas is a big state with a variety of cities and landscapes and industries. Some of Texas is a lot like Oklahoma, it's neighbor to the North. They share oil drilling and farming and some of that thing. Some of Texas is a lot like Louisiana, with similar refineries and ports on the Gulf Coast.
Some of Texas is a lot like New Mexico. They share a desert, they share a lot of cattle ranching, and so forth and so on. Austin, Texas has a lot of similarities to other cities that are dominated by a state house and a flagship public university. Dallas has some similarities with other financial hubs.
I can go on and on. What's the point here? The point is that none of these other states would be a good control for Texas in and of itself, but if we weigh them all optimally, we can create a synthetic Texas that walks, talks, looks, and acts like Texas. We're going to be comparing Texas and synthetic Texas. A properly constructed synthetic control should show how Texas school students would have done in the absence of the switch from the talks-based curriculum to the STAAR-based curriculum.
I'm just going to show you one figure here. It's a little bit complicated. First of all, this is the period that in TAKS, before STAAR was put in.
This is the period when the weights are constructed for this synthetic control, like synthetic Texas, which is in the dashed purple line, and Texas itself is in the green line. You can see both in the weight construction period and in the validation period before the switch of tests, the synthetic Texas is doing a very good job of looking a lot like Texas. They have the same things happening to their student's test scores on the SAT. The pink line is the last cohort that had no early adolescent Grade 5-8 exposure to the new STAAR exam.
Then the red line is the first cohort with full early adolescent exposure to the new STAAR exam. What you'll notice is that up until you get to the pink line, the two lines, the synthetic Texas and Texas track one another very well, but then after the introduction of the STAAR exam, Texas is doing significantly better than synthetic Texas, and that's the idea. That synthetic Texas is telling us what would have happened in Texas if they have not switched tests. One thing I wanted to mention about tests like STAAR is that they're a very manageable policy reform.
Of course, it's not that easy to construct a good test. That's what psychometricians or far many educators have to get involved, but it's actually quite cheap to construct a test. Testing is not expensive by the standards of school finance in the United States.
Finally, I told you the story early in the talk about charter schools. One set of facts that motivated me was the strong performance of charter schools that started Grade 5 as their entry year and used Grade 8 eight their exit year. These school students often made annual test gains that were 2-3 times the gains of schools with kindergarten entry or Grade 9 entry. At the time, as I said this, results struck me as very counter-intuitive because I expected that the schools that started with kindergarten would produce the most positive gains in student achievement. I really had low expectations for schools that took in and negatively selected applicants at Grade 5 because I thought of them as students who had missed the boat of strong early childhood education and were already struggling in the regular public schools. This was all based on my idea, which was then popular in economics and continues to be very popular among economists, that children are only very plastic in early childhood.
To my mind, the struggling fifth graders had already been fitted to have a lower level of cognitive skills. My study used more than 100 New York City charter schools and it used lottery based methods. I mentioned these before. They're very simple.
Students apply to one or more charter schools simply by filling out a one-pager with their name, their address, their parent's contact information, and so on. It's called an application, but it's really just a basic information. Once the charter school has received all the applications, it holds a lottery in which every child is assigned a lottery number. For instance, if they want to admit 60 kindergarteners, the first 60 kids would get seats in the charter school and if you are below the cutoff for your lottery number, you would typically not get to seat. There's bit more to it than that because the lottery done and the lottery out students would only want to use that lottery information to come up with estimates.
We don't want to use students who might have been picked off a waiting list or something like that. There are some bells and whistles, but I won't go into the details. This method is generally considered to be the gold standard for evaluating charter schools and lottery based studies tend to have similar findings across urban settings where the application lotteries are oversubscribed.
It's not as easy in rural areas because you might just not have too many students apply. This next figure shows you the effects of a year in charter school on math scores. It doesn't matter whether you look at the blue bars or the orange bars, that's something about attrition.
I didn't just use my own study on New York City charter schools, I used all of the lottery based studies that I could find where it was clear what grade was being shown for achievement. For instance, the KIPP schools are charter management organization that ordinarily starts charter middle schools with the fifth grade, and so they're covered by a lot of those middle bars. You can see that kids are just learning at a faster rate in the middle charter schools than they are learning in the high schools or the elementary schools.
For New York City where I can do it, I know long-term outcomes. It is also true that if you look at long-term outcomes like post-secondary attainment, whether you go to college or not, the effects are significantly bigger for students who are lottery done in the fifth grade. In fact, the effects for them are almost four times as large as they are for students who were lottery done in kindergarten. At the beginning of this lecture, I made the case that early adolescents are relatively neglected. They have a lot less spent on their instruction, they're in larger classes, they have teachers who are rookies, and they have teachers who are eager to stop teaching