Job Postings and Patents
[MUSIC] This is about job postings and patents, and I'll largely talk about a paper with Nick on that subject. So the idea is to construct text-based measures of exposure to specific disruptive technologies at the firm, patent, and job level going back to 2002. And then to use these novel data to study the spread of new technologies across firms, regions, occupations, and skill-levels. So you'll see a lot of the stuff that we're doing here, is using the fact that texts can be used to generate multi-dimensional data and then aggregating it in interesting ways. Okay, so this paper is after measuring the development and spread of disruptive technologies that are key to economic growth, inequality, entrepreneurship, and firm dynamics.
Lots and lots of questions that we deal with as economists really hinge on, how does technological progress transmit itself? So for example, when you think about inequality, one question is, does new technology generate jobs only for college grads? And it's a very important question, because if the answer is yes, then really we probably need to do something. If the answer's no, then there's hope that people with low skills also benefit from technical progress. Politicians in kind of towns throughout the world are trying to generate the next Silicon Valley. One research question is, is that a good idea and what does it buy you.
So what we do in this project is we develop a text-based methodology to determine which new technologies affect businesses. And then trace the spread of these new technologies to the locations and firms where they emerge and track their diffusion through regions, occupations, and industries over time. This is kind of ongoing project, so I'm just going to give you five preliminary insights from this. The first is that the development and initial employment in disruptive technologies is geographically very highly concentrated. So we know that research activity is very highly concentrated, and if you go and look at where does the research happen. That then leader generates disruptive technologies that change the way that businesses operate that's even more concentrated than research itself.
And around where Nick sits, there's a lot of it. The second kind of big stylized fact is over time, hiring associated with new technologies gradually spreads across space. And over time, again, the skill level and technology jobs declined sharply, we call the skill broadening. So the way to think about this is that, it does seem that the use of new disruptive technology does trickle down to skill ladder. So even people without college degrees do use new technologies over time. The kind of maybe unfortunate fact about this is that, low-skilled jobs associated with a given technology spread out from where they were invented significantly faster than high-skilled jobs.
So that means that the places where the new disruptive technologies are invented retain an advantage in high school hiring for a longer period of time. Okay, so let me kind of focus here on the measuring part because this is why you guys are here, you want to learn the methods. And then I'm going to go briefly through the main findings of that paper and, I think, it's kind of interesting mainly because it's kind of, again, it's multi-dimensional nature. And I'm sure there's going to be many papers that use this text-based data in this sort of multi-dimensional way. So, we're going to be ambitious and use full text from three big sources here.
And you kind of want a big computer under your desk to facilitate this. So number one is, we're going to have the full text of all patents from the USPTO patents from 1976. We're going to have the earnings call transcripts, that I've already talked about. And then the new thing is, we're going to have the full text of 200 million online job postings from Burning Glass. I'm sure you guys have seen a lot of kind of people's about using Burning Glass data already, that's kind of all the rage in macro at the moment. Most of those papers use basically the data, the Burning Glass coded from those job postings.
What we're going to be using is the full text, the full underlying text, and that's been used much less so far. So, for each job posting, we know where the job is and which occupation the job is in, and I'm going to use that in various ways. All right, so kind of let me give you step one. So step one is we first need to define, basically, what is the technology? So I'm going to kind of use a text based way of doing this.
So step one is we're going to identify bigrams, what I'm going to call technical bigram some patents. So we're going to identify two-word combinations that are indicative of discussion of new technologies. The way we're going to do this is we're going to extract all 17 million two-word combinations that we can find in US patents since 1976. And then this is kind of a fun trick, you want to go, there's 17 million of these, okay? You don't want to deal with 17 million. So how do you reduce dimensionality here? We're going to remove any bigrams that were in common use prior to 1970.
So there's this beautiful thing called the Corpus of Historical American English. Which is basically just linguists kind of collecting all kinds of texts, newspaper articles, speeches, people buying coffee and so forth, for each year. Cut that off in 1970 and remove all the two word combinations that kind of existed in 1970 from the US patent text. And then you're going to end with 35,000, what we call technical biagrams, and we're actually going to use only those that kind of appear in top patents that are highly cited.
And so, 35,000 is still a lot. >> [CROSSTALK] >> Can I ask you a question? Why do you copy off in the Corpus of Historical American English, why do you cut her off the 1970s? Wouldn't be of interest if we had it up to 2010, would use it, it's just we don't have it again. >> No, I think the idea is to cut off before the 19, well, so by doing this, is you're giving any invention that happened since 1970 a chance basically, that's how I think about it. So you're literally moving anything that existed in 1970, you could go up to 1980 or 2000. But if you went up to 2020, you would end up with nothing, but you're right, we could go up to 2002 or something.
>> So that's why words like climate change show up, because they're not common language, but they wouldn't be shut up in 1970? Okay, thanks. >> So, okay, step 2 is we're now going to take our 35,000 two-word combinations from influential patents, that are kind of new to patents since 1976. And then cross-reference that with the earnings conference call.
And now we want to know, which patent related language is then also used a lot by firm executives and investors when they talk about the firm? And the idea is that this step is going to isolate what we call disruptive technologies. So disruptive technologies that change, in some way, the way that firms operate. In particular, to make sure that I find innovations that change the way that firms operate, I'm going to keep only two-word combinations that have an increase in frequency during our sample at least tenfold. Yeah, so the top ones of these, so I'll have a look at them. This is kind of fun.
Mobile devices, machine learning, cloud computing are the top three. And the top three, what we call the top three bigrams which are associated with disruptive technologies. Interestingly, out of our 35,000, now we end up only with 305.
And this seems pretty robust. There's not all that many two-word combinations that come from patents and then explode in their use in firm language. Now, so even if you kind of change this threshold around, you're going to get maybe 270 or 360, but not that much more. And so you really kind of reduced by two orders of magnitude by looking at which innovations end up changing business.
All right, so let me kind of go through this a little bit quickly. So I'm now going to measure technology exposure at the patent, earnings call, and job level by just looking at does the two-word combination. Sorry, I should say, you have these 305 bigrams, we're now going to group them into technologies. Yeah, acknowledging that mobile devices, smartphone, tablet, and android phones are all kind of talking about the same technology, which we define as kind of a lot of steps. But just think about we're drawing circles around the 305 bigrams and figuring out which are which technology. I go to Sam.
>> Hi, I have kind of silly question on the bigram. So should we follow the list? >> Yes. >> So the word cloud service and cloud services appears in the list? >> Yes. >> So is that a kind of problem for the matrix that you're using, and how you identified whether it's a plural or singular word? >> No, there's a very easy way, this is a Python package that will allow you to include plural and singular of all words.
So yes, we allow that. So you're right, so that happens here. Let me kind of talk a little bit more about, sorry, I guess this should be about method, so let me kind of just spend time on the methods. So it's actually kind of a tricky question of grouping these 305 bigrams into technologies. So we did a lot of kind of manual work on this. Part of this is, if you have a group that you want to group into a technology like these mobile devices, smartphone, tablet, android phones, you can use a technique called embedding vectors.
And particularly, you can train embedding vectors or embedding vectors basically recognize, it's like you can think of this like an eigenvalue-eigenvectors decomposition of the bigram space. And what it does is it basically finds you two-word combinations that are often used in conjunction or as synonyms for the two-word combinations that you feed into it. So there's kind of a smart Python package where you can say, here's the text I want to train you on. In this text, if you hear mobile devices, smartphone, tablet, or android phone, what are the top synonyms for these? And so if you're worried about kind of missing technical bigrams, that's a way of addressing false negatives. Similarly, false positives, you can address by just reading the damn thing and seeing how often does mobile devices actually refer to a band as opposed to a gadget. Does that make sense? So we've described this a lot in the paper, I don't want to spend too much time actually, I have no idea how much time I have so, okay.
All right, so now we have our definition of each technology, we actually ended up with 29 of them, but you could make more or less if you wanted to. And now we want to measure the exposure of each job to each technology at each point in time. So when there's a job posting and it says something about mobile devices, then we say, aha, it has something to do with smart devices, this job in fact, okay.
So here's an example. Here's a job posting, where they talk about as a member of the digital entertainment business unit, you will play a key role in the development of testing and validation of new chips in the growing smart TV market. So this is an example of a job posting that mentions smart TVs because you're producing the smart TVs. Similarly, there's going to be other job postings that talk about use of the smart device in your job.
Use third channel technology on a smart device to collect crucial data and engage with consumers, this is for a sales representative. So this is the same technology, one job posting is about producing it, the other one is about using it. And in fact, if you look at mentions of these technologies in job postings, you're going to find that the vast majority is about either producing or using that new technology. So here's a last thing that we kind of need for our analysis, which is we need to kind of figure out when did the technology start? It kind of sounds like a silly question. But it's actually kind of hard to have a systematic answer for it, turns out there's many different ways of doing it, here's one way. So what I'm showing here is the time series of mentions of machine learning or AI in earnings conference calls, you see, it's exploding here.
What we're going to say is we're going to take the time series of each mentions of each technology, and we're going to say, at the point in time where it's a 10% of its max. That's what we call the year of emergence to technology and the idea that it captures the year of the commercial breakthrough of the technology. Now, meaning lots of firms are talking about it.
You can do this in different ways, but that's one way of pinpointing a year, and in this case, it's 2015 as the year of emergence of AI as a disruptive technology. So here's a picture for the other technologies. So you get production technologies like 3D printing, products like autonomous cars, medical advances, bi-specific antibodies. We have GPS, which is already kind of on its way becoming less disruptive over time, mobile payments, social networking, Wi-Fi, and so on.
First thing I want you to see, there's a high correlation between the frequency of mentions of these disruptive technologies in earnings calls by firms and in job postings. So in other words, when technologies are talked about a lot by executives, that's also when lots and lots of new jobs are going to be relating to this new disruptive technology. So here's kind of the first main finding, and what I want you to see here is that we have this text data, it has many, many dimensions.
And all we're doing is aggregating it in different dimensions and running repressions, okay? So the first thing I want to do is I want to measure the average skill associated with a given new technology at a given point in time. The way I will do that is, for example, I will take all the job postings that mention smart devices, and I will look at what is the occupation of this job posting. Then there's other data sets that tell you what are kind of typically the education requirements for that occupation.
And then you can collapse that back and just have for each technology one time series for example, the share of job postings in that technology at that point in time that require a college education. Ranaregression of that, on the year since the emergence of the technology and you get a very strong negative slope. Meaning as technologies mature education requirements of jobs in that technology fall. And this is actually kind of a central question more so in industrial skills whether it would happen or not the answer is resoundingly yes. The second main finding is region broadening. So for each region, this is like a CBSA.
It's like a metro area. We're going to measure the share of jobs in that CBSA that are in that technology. This is going to be the number of shares, sorry, the share of jobs that are using the disruptive technology divided by the total number of, sorry, the share of overall jobs in the country that are related to the technology.
Then calculate the coefficient of variation across regions. And then that gives you a measure of dispersion of how concentrated is the use of this new technology across space. Okay, so oops, [LAUGH] we're missing a crucial label here, I'm sorry. So, the right hand side variable is here years since emergence of the technology. This is what happens I guess when you are PDF or PowerPoint.
So, okay what I want you to see again a negative and significant slope. So what does this mean?. It means the regional concentration of the use of a new technology is falling as the technology matures. Again, this is the number of years that have elapsed since the start year of the technology.
So as technologies mature they spread across space. We have like the pictures, here's a nice way of looking at this. So, the bubbles here are the places, the regions that were responsible for at least 50%, sorry, the regions that were responsible for patenting in a given new technology. And I wanted to see several things in this plot. The first thing I want to, just one second, sorry.
So the first thing I want you to see is that the places where new technologies are developed that end up being disruptive are very highly concentrated around where Nick is sitting, okay? So something like 44%, do I have this down here? Yeah, so 40% of pioneer locations are in California. Then another 20 or 21% are in the Northeast Corridor. This is like extremely highly skewed.
These are the places where technology is a bleeder become disruptive for business are mentioned in conference calls emerge, where there were the patents relating to those technologies are developed. Then over time you see the spread. This is what I call region broadening. So over time, centered on pioneer locations, you have to spread across space. And that's what is kind of half disappearing regression table was supposed to tell you. Any questions so far? [INAUDIBLE] >> So yeah, I have the following question.
So, correct me if I'm wrong, but so far are you associating a technology tool, workers and regions that have a positive effect from it, those that are using it. And so, if you think about solar, for example, solar is great for, x types of workers and x types of locations. But here we're not capturing negative effects that you may have on coal workers, for example, is that is that correct? >> No, no, so you're completely right, and there's a big research question, which is who's getting booted out by the new technology? So far, we have nothing to say about that. And we're much more modest in a sense that what we're measuring is, where the jobs, where are the jobs posted? Okay, so these are all about job postings.
So where are their open positions that say, we want somebody using our producing this technology, this new disruptive technology. So this is a modest step towards trying to answer the kind of big macro question that you're after, which is who actually gets displaced and who gets disrupted by these technologies. There are in the paper, a number of interesting tables that we haven't really explored all that much. So again, this is a multi dimensional thing. What I'm taking here is the location of where the job is posted.
Using the same exposure measure, using my dummy that's one at the job level if the technology is mentioned in the job ad, I can also look at which occupation at which point in time is that job in. And there are technologies that penetrate very few occupations. And there are other technologies that penetrate many occupations. And there's some hope that kind of that other aggregation of this data will tell you at least which jobs changed the most, which occupations changed as a result of kind of this new technology. But I have nothing to kind of definitive to say about that.
>> It's a great question the most it is almost like to reverse the literature on trade. So, with new technologies, it's easy to measure where the new jobs are. It's much harder to measure where the new jobs are lost. As you ask about the trade stories almost exact inverse, so it's very easy to measure where the jobs are lost from China is much harder to measure where they're gained from trade. So you know, those two literature's it feels like have biases in different directions because to generally it's like looking under the lamppost for the light.
Economists, I mean, for good reason, focus on what can be measured, but that therefore you tend to focus accuracy. So we focused a lot on job creation in technology, but there's equally, I'm sure a lot of jobs that are destroyed it's just harder to measure it. So it's really hard as an economist to kind of avoid that. Falling away measurement really matters because measurement drives a lot of what the policy questions on. And I think it's why the technology policies focused on all these job creation or trade policies on job destruction, but I think they're both a bit more balanced but it's just driven by measurement.
>> Right, but so, [LAUGH] I guess that's the million dollar question. Like, what's the answer? Like, how do you measure the flip side of this? I guess I want to show you the region broadening result in some more detail, you can look at individual technologies and see which technology spread out across space the most. So anything to do with mobile as smart devices pretty strongly downward sloping as interestingly used machine learning, this kind of stuff to look in the paper and nothing specific to say about it.
Let me kind of show you, also and again this status online on disruptivetech.net, I think. Zack models, if you go to Nick's website or my website, you can follow a link and then that data is online. So this I'm kind of very excited about, because I showed you a map of this before, which is where are the locations where the early patterns ten years before the emergence of the technology are assigned. So if you look across us like machine learning, very heavily like San Jose, San Francisco, New York, Seattle, Boston, the ones that you would might think.
Fracking Houston, hybrid vehicles, Detroit, autonomous cars, also like the tech hubs in Detroit. So I think that's kind of an interesting sort of thing to look at. Speaking to the question of why do we care, also you'll see here that the places where the early patents were basically to technologies were invented are also the places where employment in that technology is focused early on.
So if you go for each technology and you look at where do the patents come from, and where's the early employment, you get almost the same answer. So if I made this table here based on purely kind of like job posting data, I would get almost the same answer. So a very high correlation here between the red dots and the blue circles.
My dear lord. Okay, so I'm glad to have pictures because they render better on the PDF. So, yeah Namrata you have a question? >> I have a question about firm size here. If a large firm is innovating some technology, whether it is medical or whatever.
It's not using it in its establishments across the country, is it only using it in headquarters? How do we think about it? >> This is a great question. And I think that's what's really exciting about this approach, is because this was showing you where is the new technology being used, we don't really distinguish it between produce and use of the technology. But you see that we know that there's lots, this is the early employment graph. And what this is showing you is that early on, the places where the technologies are invented is also where they're produced and used. And over time, then that kind of attenuates.
Your question now is is this true within firm, my guess is yes, but we haven't looked at that. So you can run these regressions within firm, there's nothing keeping you from doing that. There's some tricky questions about like in the burning glass data figuring out what is a firm? We have an answer to that. That's not super simple because basically what you need to do is you need to take the name of the firm, and then you need to figure out who belongs to what firm and you did take out, sometimes the firm name is not mentioned and so far it's a bit tricky. But you could in principle look at within firm where are the new tech? How are they spreading within technology? But I can tell you and I'm not sure if I have this here.
Yeah, I don't have it here, so what we have done is you can look at very specific examples, so look at GM and Ford, and jobs relating to autonomous cars and self-driving stuff. And you see that basically both GM and Ford after the emergence of autonomous cars as a technology, shift high skilled jobs in that technology to the places where the technology was invented. And specifically in that example, that means they create hundreds of jobs in Silicon Valley that were not there before in that firm.
Yeah, so I think there's exciting things to be done, but we've only kind of looked at kind of very basic stuff so far. >> You're working theory of why this is a returns to scale on using new technology in a group, or sort of why the dispersion is the way you're observing it. >> I might have to consult my lawyer. I'm not sure. >> How do you mean? You mean why they're locally concentrated move out over time? >> So, if I have a task based kind of model in my head and I'm working on a computer, I'm a coder sitting in Boston even though the technology is invented in Silicon Valley. Why the spatial concentration of people who are allowed to work on a technology even within a large firm? >> Well, okay, I interpreted your question slightly differently.
So just to make sure the job ad just has the location of where you're working. So if it's general electric, general electric's across the country, but it would say like GE Bakersfield, California, that's what we're using. So the upside of the job ads is because you need to know where to report to work.
It's true. There's a very few working from home jobs that are fully remote but they're so rare that basically, so that's why, that's the useful thing on it. We have the location, so the fun thing isn't that critical for us is primarily where it is. Now I think if your question is take Apple, Apple has its labs, obviously for the cell phone in Silicon Valley. This thing we all carry these things now, but in 2007, they launched it. They were working it for three, four, five years beforehand.
And those engineers are all in Silicon Valley. So not surprisingly early on around the launch of the whatever PDA like the smartphone, all the engineering jobs tended to be around Silicon Valley. Now as the smartphone takes off, you get a lot of more low-skilled jobs that are like selling them in at&t and Verizon shops across the country. But it's still the case that a lot of Apple labs and research and people making apps for the phone is still located in Silicon Valley. So maybe if I understood the question correctly, basically high school jobs stay locally because the labs and researchers and these people tend to cluster has direct shared like 40% in Silicon Valley, 20% in Boston.
That's kind of astounding. That's the most striking thing. Two-thirds of big innovative creations in the US, over the last 30 years have come to small region Silicon Valley and Boston, and it's kind of amazing.
It's not that amazing if you read the news, think where they kind of make sense most big deals do come from that. There are some like fracking that don't but most of the rest are from there. And those high skilled jobs tend to stick there and the low skilled jobs move up pretty fast.
>> So that's this picture here. So this is the geographic concentration like just the same regression I showed you earlier just run separately for high school and law school jobs. And you see here the high school jobs have a much flatter slope, meaning they tend to stick around where they were initially, which is where the technology was invented, whereas the low skill stuff is much faster to spread around.
So the consequences of this is that the pioneer locations where the technology was invented have a long-term advantage in high school employment in that technology. So, again this is >> Five minutes left. >> Yeah, so I'm going to wrap up now.
So this is like again, a terribly butchered table, but what this is supposed to show is that there's very high persistence in high school jobs, they tend to stick around the locations where the technology originated. You see here like the spread, this is what you showed graphically. There's a much lower kind of gradient and how high school jobs spread, then low-skilled jobs spread. So on the way to think about this, Nick, correct me if I'm wrong. So this is like half-life of 20 years for low-skilled jobs in the coefficient of variation and 40 years for high-skilled jobs.
All right, so I've already mentioned that part of this persistent advantage of pioneer locations is due to rehoming, where established firms shift jobs to pioneer locations after the emergence of technology where autonomous cars is the one example that we've worked out in the paper. Pioneer locations are more likely to arise in urban areas with universities and an educated population, which is just a function of somebody who needs to invent something in order for that invention to become disruptive later on. You can also kind of look at this across other dimensions that are interesting. Like the diffusion of technologies across firms and industries and occupations, I've mentioned this already. Maybe the most interesting finding there is that firms that originally developed a technology retain a persistent advantage in hiring in that technology.
With a half-life of about 11 years. So that means if you've invented something, if it becomes disruptive, you also tend to like post more jobs in that technology for a long period of time. So the data here is on this website, techdiffusion.net.
We've tried to kind of basically post the most granular version of this data possible. So you can aggregate it in a variety of different ways and maybe kind of figure some stuff out that we haven't looked at at all. [MUSIC]