At Luohan Academy, our daily work is a racing between data and model: we find patterns based on big data, and develop theories to claim causality. This work requires us to use microeconomic theory (micro theory) to guide our applied microeconomics (applied micro) work. Hence, when reading the work by Currie, Kleven and Zwiers, we are glad to see this type of work is also trending in the applied micro community. From a practioner's perspective, we will first briefly introduce the data, method, and results of the paper, and then share our thoughts on these results.
Data and method
The paper has two plain text datasets. The first has 10,324 applied micro papers in the National Bureau of Economic Research (NBER) working paper series between January 1, 1980 and June 30, 2018, and the second has 2,830 papers published in the "top five" economics journals (American Economic Review, Econometrica, Journal of Political Economy, Quarterly Journal of Economics, and Review of Economic Studies) between January 1, 2004 and August 2019.
The text mining method used here is the regular expression in python (regex). For each key word category, authors develop a bunch of trigger phrases. If a paper has at least 1 trigger phrase, it is categorized to the category the trigger phrase is in. One paper could be counted into multiple categories. For example, if a paper contains a term "causal identification" and a term "administrative data", then it is counted into the identification category and the administrative data category because, by definition, the first term is a trigger term for the identification category and the second term is a trigger term for the administrative data category.
The authors first highlight four dimensions of the "credibility revolution" in economics. Applied micro papers have increasing focus on identification, experimental and quasi-experimental methods, administrative data, and increasing ratio of figure words to table words. Within identification, concerns on omitted variables, and simultaneity are decreasing while concerns on selection, and reverse causation are increasing. This focus on cleaner identification leads to a slower rise in experimental and quasi-experimental methods such as RCTs, lab experiments, difference-in-differences, regression discontinuity, event studies, or bunching. The increasing in administrative data mainly consists of the rise in survey data, internet data, and big data, and a stable proprietary data.
The rising of experimental and quasi-experimental methods brings in rising concerns on external validity and mechanism. Discussion of external validity began in the late 1990s and rose sharply in both the NBER and top-five series after 2005. There is also an impressive rise in the fraction of applied micro papers discussing mechanism, suggesting that editors either select papers that provide evidence on mechanisms or push authors to add such evidence as part of the editorial process.
Then, the authors show this rise of the new methods hasn't grown at the expense of older empirical methods such as instrumental variables and fixed effects. There is a stable fraction of papers mentioning instrumental variables, and an increasing fraction of papers mentioning fix effect, matching, and synthetic control. They also show economists have become increasingly concerned with whether their estimates are precisely estimated and not merely with whether they are significantly different from zero in a statistical sense. This is because big data significantly increase the power of a test making it more likely to detect significance if there should be any.
The trend of applying structural models is different in different sub-fields of applied micro. In industrial organization, there is an increasing trend for using structural model, general equilibrium, and functional forms. While this increasing trend is stable in labor studies, and is flipped over to decreasing in public economics.
The authors reach their final conclusion by referring to the history: there was a spiral of new data and new methods in the late 1970s started by the emergence of new data. Therefore, they collude that "we may be at a similar turning point in the field today, with a proliferation of new data and methods." In addition to this, based on the evidence of rising emphasis on credibility and transparency, the authors also conclude with a remark suggesting that "the larger trend toward demanding greater credibility and transparency from researchers in applied economics and a "collage" approach to assembling evidence will likely continue."
Reflections and Comments
When I was in grad school, I once asked one of my micro theory professors why did he chose micro theory when he was in grad school. Among the reasons he listed, one is that there was no data available to answer the questions that intrigued him and the cost to gather the data on his own was too high. Hence, from this narrow perspective, answer the question we are interested in using assumptions and logic became the second best when the first best was not feasible. However, the constraints are mostly relaxed nowadays. As digital technology being commercialized and prospered, the granularity of data we gathered had increased quite a lot. For example, when Ackerberg published the paper on distinguishing informative and prestige effects of advertising, the Neilson scanner data set he used only captures the purchase action. Researchers have no idea what does a consumer browse in the supermarket before she decides to buy a product. 20 years later, anonymous datasets at Luohan Academy tell us what does a consumer browsed, or asked if possible, before she makes the purchase decision. Also, livestream technology has reduced the decision time a consumer needs to make a purchase decision, and hence, reduces the uncertainty between an ad exposure and a purchase decision.
This modern big data practice would suggest my micro theory professor a better solution if he were to graduate this year: do micro theory and applied micro, both. Find patterns from the big data. Develop a theory to explain, if not rationalize, the patterns. Use big data and applied micro method to test the theory. Modify the theory and test again, so on and so forth. This practice is also documented by Currie et al. (2020). The trending up of structure models and the rising focus on mechanisms their paper documented both suggest a proposition: big data narrows the gap between micro theory and applied micro.
As researchers at the Luohan Academy, we echo this proposition. In addition to that, we think this shrinking gap gives researchers chances to do works that make more real-world impact. For example, in our work, we can even push the aforementioned practice one step further to real business. Once a positive theory explaining the pattern we observed is validated, we engineer it to a normative version and share it with business units. We show them what is happening, derive implications, and suggest what to do next. This is our practice and thoughts. Please let us know if you'd like to share your thoughts.
Ackerberg, Daniel A. "Empirically distinguishing informative and prestige effects of advertising." RAND Journal of Economics (2001): 316-333.
Currie, Janet, Henrik Kleven, and Esmée Zwiers. "Technology and big data are changing economics: Mining text to trac
[i] Since the two datasets used in this paper have different time span, we left out all ratios in figures to keep this article neat. For those who are interested in those ratios, please check out the figures in the paper.