Can you use regression with outliers?
Christopher Pierce
Updated on February 27, 2026
Outliers are defined as abnormal values in a dataset that don’t go with the regular distribution and have the potential to significantly distort any regression model. If the outliers are real, one can take those outliers into a regression model or simply drop them to make a better regression model.
How do you deal with outliers in regression?
in linear regression we can handle outlier using below steps:
- Using training data find best hyperplane or line that best fit.
- Find points which are far away from the line or hyperplane.
- pointer which is very far away from hyperplane remove them considering those point as an outlier.
- retrain the model.
- go to step one.
How do you identify outliers in regression?
The good thing about standardized residuals is that they quantify how large the residuals are in standard deviation units, and therefore can be easily used to identify outliers: An observation with a standardized residual that is larger than 3 (in absolute value) is deemed by some to be an outlier.
When should outliers be excluded from a regression analysis?
If the outlier creates a relationship where there isn’t one otherwise, either delete the outlier or don’t use those results. In general, an outlier shouldn’t be the basis for your results.
Should outliers be excluded?
Removing outliers is legitimate only for specific reasons. Outliers can be very informative about the subject-area and data collection process. Outliers increase the variability in your data, which decreases statistical power. Consequently, excluding outliers can cause your results to become statistically significant.
How do you solve outliers in SPSS?
How to Remove Outliers in SPSS
- Click on “Analyze.” Select “Descriptive Statistics” followed by “Explore.”
- Drag and drop the columns containing the dependent variable data into the box labeled “Dependent List.” Click “OK.”
Should you remove outliers from regression?
How do you treat outliers in SPSS?
There are no specific commands in SPSS to remove outliers from analysis or the Active DataSet, you fill first have to find out what observations are outliers and then remove them using case selection Select cases . Make sure to understand that you can select observations.
How do you check for outliers in multiple regression SPSS?
ARCHIVED: In SPSS, how do I find outliers in my regression?
- From the Analyze menu, select Regression, and then Linear.
- In the dialog box that appears, click Save.
- In the next dialog box that appears, check Leverage values.
How do you avoid outliers in regression?
Here are four approaches:
- Drop the outlier records. In the case of Bill Gates, or another true outlier, sometimes it’s best to completely remove that record from your dataset to keep that person or event from skewing your analysis.
- Cap your outliers data.
- Assign a new value.
- Try a transformation.
What is an example of an extreme outlier in SPSS?
SPSS also considers any data value to be an extreme outlier if it lies outside of the following ranges: 3rd quartile + 3*interquartile range 1st quartile – 3*interquartile range Thus, any values outside of the following ranges would be considered extreme outliers in this example:
How can I check the assumptions of the regression in SPSS?
To fully check the assumptions of the regression using a normal P-P plot, a scatterplot of the residuals, and VIF values, bring up your data in SPSS and select Analyze –> Regression –> Linear. Set up your regression as if you were going to run it by putting your outcome (dependent)…
Can I use SPSS Statistics to run multiple regression studies?
This can change the output that SPSS Statistics produces and reduce the predictive accuracy of your results as well as the statistical significance. Fortunately, when using SPSS Statistics to run multiple regression on your data, you can detect possible outliers, high leverage points and highly influential points.
How do you find outliers in SPSS box plots?
If there are no circles or asterisks on either end of the box plot, this is an indication that no outliers are present. SPSS considers any data value to be an outlier if it lies outside of the following ranges: 3rd quartile + 1.5*interquartile range 1st quartile – 1.5*interquartile range