linear regression - lm function in R does not give coefficients for all factor levels in categorical data -


i trying out linear regression r using categorical attributes , observe don't coefficient value each of different factor levels have.

please see code below, have 5 factor levels states, see 4 values of co-efficients.

> states = c("wa","te","ge","la","sf") > population = c(0.5,0.2,0.6,0.7,0.9) > df = data.frame(states,population) > df   states population 1     wa   0.5 2     te   0.2 3     ge   0.6 4     la   0.7 5     sf   0.9 > states=null > population=null > lm(formula=population~states,data=df)  call: lm(formula = population ~ states, data = df)  coefficients: (intercept)     statesla     statessf     stateste     stateswa           0.6          0.1          0.3         -0.4         -0.1 

i tried larger data set doing following, still see same behavior

for(i in 1:10) {     df = rbind(df,df) } 

edit : responses eipi10, mrflick , economy. understand 1 of levels being used reference level. when new test data state's value "ge", how substitute in equation y=m1x1+m2x2+...+c ?

i tried flattening out data such each of these factor levels gets it's separate column, again 1 of column, na coefficient. if have new test data state 'wa', how can 'population value'? substitute it's coefficient?

> df1 

population ge mi te wa 1 1 0 0 0 1 2 2 1 0 0 0 3 2 0 0 1 0 4 1 0 1 0 0

lm(formula = population ~ (ge+mi+te+wa),data=df1)

call: lm(formula = population ~ (ge + mi + te + wa), data = df1)  coefficients: (intercept)           ge           mi           te           wa             1            1            0            1           na   

ge dropped, alphabetically, intercept term. eipi10 stated, can interpret coefficients other levels in states ge baseline (statesla = 0.1 meaning la is, on average, 0.1x more ge).

edit:

to respond updated question:

if include of levels in linear regression, you're going have situation called perfect collinearity, responsible strange results you're seeing when force each category own variable. won't explanation of that, find wiki, , know linear regression doesn't work if variable coefficients represented (and you're expecting intercept term). if want see of levels in regression, can perform regression without intercept term, suggested in comments, again, ill-advised unless have specific reason to.

as interpretation of ge in y=mx+c equation, can calculate expected y knowing levels of other states binary (zero or one), , if state ge, zero.

e.g.

y = x1b1 + x2b2 + x3b3 + c y = b1(0) + b2(0) + b3(0) + c y = c 

if don't have other variables, in first example, effect of ge equal intercept term (0.6).


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -