probability - Multinomial regression using multinom function in R -


i thinking posting question in cross-validated, decided come here. using multinom() function nnet package estimate odds of becoming employed, unemployed, or out of labor force conditioned on age , education. need interpretation.

i have following dataset of 1 dependent categorical variable employment status(empst) , 2 independent categorical variables: age (age) , education level (education).

>head(df)                empst   age                         education 1           employed   61+   less high school diploma 2           employed 50-60 high school graduates, no college 3 not in labor force 50-60   less high school diploma 4           employed 30-39       bachelor's degree or higher 5           employed 20-29  college or associate degree 6           employed 20-29  college or associate degree 

here summary levels:

>summary(df)                 empst          age                                    education      not in universe   :    0   16-19: 6530   less high school diploma  :14686    employed          :61478   20-29:16031   high school graduates, no college:30716    unemployed        : 3940   30-39:16520   college or associate degree :28525    not in labor force:38508   40-49:17403   bachelor's degree or higher      :29999                               50-60:20779                                                                         61+  :26663                                     
  • first,what estimation equation(model)

i want determine estimation equation(model) call

df$empst<-relevel(df$empst,ref="employed") multinom(empst ~ age + education,data=df)

so can write down in research paper. in understanding employed base level , logit model call is:

enter image description here enter image description here

where , n categories of variables age , education respectively (sorry confusing notation). please, correct me if understanding of logistic model produced multinom() incorrect. not going include summary of test because lot of output, below include the output call >test:

> test call: multinom(formula = empst ~ age + education, data = ml)  coefficients:                    (intercept)   age20-29   age30-39   age40-49   age50-60     age61+ unemployed           -1.334734 -0.3395987 -0.7104361 -0.8848517 -0.9358338 -0.9319822 not in labor force    1.180028 -1.2531405 -1.6711616 -1.6579095 -1.2579600  0.8197373                    educationhigh school graduates, no college educationsome college or associate degree unemployed                                         -0.4255369                                 -0.781474 not in labor force                                 -0.8125016                                 -1.004423                    educationbachelor's degree or higher unemployed                                    -1.351119 not in labor force                            -1.580418  residual deviance: 137662.6  aic: 137698.6  

given understanding of logit model produced multinom() correct coefficients logged odds base level employed. actual odds antilog call exp(coef(test)) gives me actual odds:

> exp(coef(test))                    (intercept)  age20-29  age30-39  age40-49  age50-60    age61+ unemployed           0.2632281 0.7120560 0.4914298 0.4127754 0.3922587 0.3937724 not in labor force   3.2544655 0.2856064 0.1880285 0.1905369 0.2842333 2.2699035                    educationhigh school graduates, no college educationsome college or associate degree unemployed                                          0.6534189                                 0.4577308 not in labor force                                  0.4437466                                 0.3662560                    educationbachelor's degree or higher unemployed                                    0.2589504 not in labor force                            0.2058891 

which brings me next question.

  • second, probabilities

i wonder if there way actual probabilities of being unemployed vs employed based on combination of age , education,e.g probability of being unemployed if 22 , have high school diploma. sorry lengthy question. help. let me know if additional clarification needed.

about first question, i'm having doubts multinom categorical variables (here question: multinom matrix of counts response).

from user replied in question , output of >test posted, guess math wrote partially right: indeed, multinomial model should work if predictor variables continuous or dichotomous (i.e., values 0 or 1), , seems when multinom gets categorical variables predictors, in example, r automatically converts them dummy varibales (only 0 or 1).

with reference example, considering age predictor, should have ln(\frac{pr(unemployed)}{pr(employed}) = \beta_0 + \beta_1*age20-29 + \beta_2*age30-39 + ... , analogous formula pr(not in labor force), different \beta coefficients.

about second question: yes, there way. use predict(test, newdata, "probs"), newdata array age20-29 , high school graduates, no college entries (given example).


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -