- Observe regularities in the data.
- Formulate a theory.
- Generate predictions from the theory (hypotheses).
- Test your theory (is it consistent with data?)
David Hendry just published a paper about the scientific method in Economics that appears to fly in the face of what I just described. Here is an attempt to summarize his stand, and I apologize for quoting quite liberally:
- Specify the object for modeling, usually based on a prior theoretical analysis in Economics. An example of such an object is y=f(z).
- Defining the target for modeling by the choice of the variables to analyze, y and z, again usually based on prior theory. This is about deriving the data-generating process of the variables of interest, or fitting an equation with some statistical procedure.
- Embed that target in a general unrestricted model (GUM), to attenuate the unrealistic assumptions that the initial theory is correct and complete. The idea is to add other variables, lags, dummies, shift variables and functional forms to improve the empirical accuracy of the initial model.
- Search for the simplest acceptable representation of the information in that GUM. Or, now that the model has become huge (and may contain more variables than data points), let us get rid of some of them without losing too much in accuracy.
- Rigorously evaluate the final selection: (a) by going outside the initial GUM in step three, using standard mis-specification tests for the ‘goodness’ of its specification; (b) applying tests not used during the selection process; and (c) by testing the underlying theory in terms of which of its features remained significant after selection.
A part from the fact that this is really the blueprint for an automated data mining exercise that is not driven in any way to answering a particular policy question, this procedure not only disregards the scientific method, but also Occam's Razor and the Lucas Critique. What use is it to learn that the CPI follows a polynomial of degree five with three lags on exports of cabbage, the number of sunny days, 25 other variables and three structural breaks (not an actual example used by Hendry, but it could)? If you want to make some very short term forecasts, that may be accurate, and this method is abundantly used in the City or Wall Street by neural networks "experts." But when it comes to advising policymakers, you need to have some Economics, and by that I mean economic theory, to explain why economic agents behave in such a way and what an intervention would lead to.
The scientific method starts with the observation of the data. Hendry dismisses this with a slight of hand, stating that stylized facts are "an oxymoron in the non-constant world of economic data." What if there are constants in economic data? In fact there are plenty, and this is what theories are trying to explain. Has Hendry never observed something in his surrounding that he then tried to explain? Or does he really spend his days feeding linear equations into his computer to see what it can come up with with his database?
Such papers, especially by people who enjoy respect like Hendry does in the UK, deeply upset me. To top it off, there are 33 self-citations.