probability - Multinomial regression using multinom function in R -
i thinking posting question in cross-validated, decided come here. using multinom() function nnet package estimate odds of becoming employed, unemployed, or out of labor force conditioned on age , education. need interpretation.
i have following dataset of 1 dependent categorical variable employment status(empst) , 2 independent categorical variables: age (age) , education level (education).
>head(df) empst age education 1 employed 61+ less high school diploma 2 employed 50-60 high school graduates, no college 3 not in labor force 50-60 less high school diploma 4 employed 30-39 bachelor's degree or higher 5 employed 20-29 college or associate degree 6 employed 20-29 college or associate degree
here summary levels:
>summary(df) empst age education not in universe : 0 16-19: 6530 less high school diploma :14686 employed :61478 20-29:16031 high school graduates, no college:30716 unemployed : 3940 30-39:16520 college or associate degree :28525 not in labor force:38508 40-49:17403 bachelor's degree or higher :29999 50-60:20779 61+ :26663
- first,what estimation equation(model)
i want determine estimation equation(model) call
df$empst<-relevel(df$empst,ref="employed") multinom(empst ~ age + education,data=df)
so can write down in research paper. in understanding employed base level , logit model call is:
where , n categories of variables age , education respectively (sorry confusing notation). please, correct me if understanding of logistic model produced multinom() incorrect. not going include summary of test because lot of output, below include the output call >test
:
> test call: multinom(formula = empst ~ age + education, data = ml) coefficients: (intercept) age20-29 age30-39 age40-49 age50-60 age61+ unemployed -1.334734 -0.3395987 -0.7104361 -0.8848517 -0.9358338 -0.9319822 not in labor force 1.180028 -1.2531405 -1.6711616 -1.6579095 -1.2579600 0.8197373 educationhigh school graduates, no college educationsome college or associate degree unemployed -0.4255369 -0.781474 not in labor force -0.8125016 -1.004423 educationbachelor's degree or higher unemployed -1.351119 not in labor force -1.580418 residual deviance: 137662.6 aic: 137698.6
given understanding of logit model produced multinom() correct coefficients logged odds base level employed. actual odds antilog call exp(coef(test))
gives me actual odds:
> exp(coef(test)) (intercept) age20-29 age30-39 age40-49 age50-60 age61+ unemployed 0.2632281 0.7120560 0.4914298 0.4127754 0.3922587 0.3937724 not in labor force 3.2544655 0.2856064 0.1880285 0.1905369 0.2842333 2.2699035 educationhigh school graduates, no college educationsome college or associate degree unemployed 0.6534189 0.4577308 not in labor force 0.4437466 0.3662560 educationbachelor's degree or higher unemployed 0.2589504 not in labor force 0.2058891
which brings me next question.
- second, probabilities
i wonder if there way actual probabilities of being unemployed vs employed based on combination of age , education,e.g probability of being unemployed if 22 , have high school diploma. sorry lengthy question. help. let me know if additional clarification needed.
about first question, i'm having doubts multinom
categorical variables (here question: multinom matrix of counts response).
from user replied in question , output of >test
posted, guess math wrote partially right: indeed, multinomial model should work if predictor variables continuous or dichotomous (i.e., values 0 or 1), , seems when multinom
gets categorical variables predictors, in example, r
automatically converts them dummy varibales (only 0 or 1).
with reference example, considering age
predictor, should have ln(\frac{pr(unemployed)}{pr(employed}) = \beta_0 + \beta_1*age20-29 + \beta_2*age30-39 + ...
, analogous formula pr(not in labor force)
, different \beta
coefficients.
about second question: yes, there way. use predict(test, newdata, "probs")
, newdata
array age20-29
, high school graduates, no college
entries (given example).
Comments
Post a Comment