This list comes from the Coursera Data Analysis Course.

Linear and Logistic Regression are some of the most common techniques applied in data analysis. Here is a list of possible problems with regression in the real world.

**Confounders**– variable that is correlated with both the outcome and other variables in the model**Complicated Interactions**– how do the covariates interact**Skewness**– is the data not evenly distributed, heavy to one side or the other**Outliers**– data points that don’t fit the pattern**Non-linear Patterns**– not all datasets can be fit with a straight line**Variance Changes****Units/Scale issues**– make sure the units are standard across the model**Overloading Regression**– too much complexity**Correlation does not imply Causation**

What other problems do you find when using Regression on real-world data

Do you know of other problems that are missing.

Scott (@ScottOrz)March 10, 2013 at 11:35 amSmall Sample – absence of sufficient data to fit a regression model.

Ryan SwanstromMarch 11, 2013 at 10:18 pmThat is a common problem. Professor Jeff Leak did not add that to his list. I wonder if that problem is not specific to Regression, because all statistical/machine learning models suffer when not enough data is present. I would agree with you though; small sample size can be a problem when doing any data analysis.

Thanks for commenting,

Ryan

End-to-End Predictive Model in AzureML using Linear Regression | Continuous LearningNovember 15, 2014 at 10:48 am[…] http://101.datascience.community/2013/03/07/9-problems-with-real-world-regression/ […]

End-to-End Predictive Model in AzureML using Linear Regression - Continuous Learning - Site Home - MSDN BlogsNovember 15, 2014 at 10:50 am[…] http://101.datascience.community/2013/03/07/9-problems-with-real-world-regression/ […]