Variable selection is often left up to an algorithm. However, controlling for some variables can improve measurement accuracy, and thus overall performance. On the other hand, certain "bad" controls can block pathways of relationships between variables that we want to preserve or create spurious correlations. Using real and simulated data, I explain when to reconsider your controls, and why that may significantly improve model accuracy.
Basic linear regression
Econometricians spend a lot of time thinking about causality, whereas data science generally focuses more on prediction and classification. But is there something to be learned from economists' fixation on causal relationships?
Variable selection is often left up to an algorithm. However, controlling for some variables can improve measurement accuracy, and thus overall performance. On the other hand, certain "bad" controls can block pathways of relationships between variables that we want to preserve or create spurious correlations. Using real and simulated data, I explain when to reconsider controls, and why that may significantly improve model accuracy.
Empirical economist, data nerd, erstwhile nomad, and coffee junkie. After living and working on a few continents, I have settled in Germany and immersed myself in econometrics, which led me to all things data. Specifically, monetary policy led to Bayesian models, which led to questioning everything I ever learned. I'm still following those and other questions that bother me so.