Skip to content Skip to sidebar Skip to footer

Multiple Linear Regression In Pandas Statsmodels: Valueerror

Data: https://courses.edx.org/c4x/MITx/15.071x_2/asset/NBA_train.csv I know how to fit these data to a multiple linear regression model using statsmodels.formula.api: import pandas

Solution 1:

When using sm.OLS(y, X), y is the dependent variable, and X are the independent variables.

In the formula W ~ PTS + oppPTS, W is the dependent variable and PTS and oppPTS are the independent variables.

Therefore, use

y = NBA['W']
X = NBA[['PTS', 'oppPTS']]

instead of

X = NBA['W']
y = NBA[['PTS', 'oppPTS']]

import pandas as pd
import statsmodels.apias sm

NBA = pd.read_csv("NBA_train.csv")    
y = NBA['W']
X = NBA[['PTS', 'oppPTS']]
X = sm.add_constant(X)
model11 = sm.OLS(y, X).fit()
model11.summary()

yields

OLSRegressionResults==============================================================================Dep. Variable:                      W   R-squared:0.942Model:                            OLS   Adj. R-squared:0.942Method:                 Least Squares   F-statistic:6799.Date:Sat,21Mar2015   Prob(F-statistic):0.00Time:                        14:58:05   Log-Likelihood:-2118.0No. Observations:                 835   AIC:4242.Df Residuals:                     832   BIC:4256.Df Model:2Covariance Type:nonrobust==============================================================================coefstderrtP>|t|      [95.0%Conf.Int.]
------------------------------------------------------------------------------const41.30481.61025.6520.00038.14444.465PTS0.03260.000109.6000.0000.0320.033oppPTS-0.03260.000-110.9510.000-0.033-0.032==============================================================================Omnibus:                        1.026   Durbin-Watson:2.238Prob(Omnibus):0.599Jarque-Bera(JB):0.984Skew:0.084Prob(JB):0.612Kurtosis:3.009Cond.No.1.80e+05==============================================================================Warnings:
[1] StandardErrorsassumethatthecovariancematrixoftheerrorsiscorrectlyspecified.
[2] Theconditionnumberislarge,1.8e+05.Thismightindicatethattherearestrongmulticollinearityorothernumericalproblems.

Post a Comment for "Multiple Linear Regression In Pandas Statsmodels: Valueerror"