Estimation of multivariate t distribution in R -
i konw if there function in r allows estimate df of multivariate t distribution.
the problem easy: have matrix of 5 variables (columns) 75 observations (rows). estimate df of multivariate t on sample.
thanks,
juan.
***edition: after fabians suggestions implemented dmvt() formula****
# "residuals" matrix residuals model. want estimate df of # sample assuming multivariate-t sigma<-cor(residuals, use="pairwise.complete.obs", method="pearson") my_means<-vector(length = 8) (i in 1:8){ my_means[i]<-mean(my_matrix[,i]) } residuals.scaled<-scale(residuals) df.1 <-dmvt(residuals.scaled, my_means, sigma, log= false, type = "shifted", df = 1)
i have doubts regarding: 1) scaling: i'm centering data. don't know if correct. 2) using log = false don't know why densities should given log(d) in case 3) here should estimate likehood of sample data each df. thus, more code lines df.2, df.3, etc should added , calculate likelihood of each. then, choose highest. correct?
package mvtnorm
supplies density of (shifted) multivariate t-distribution in function dmvt
. enter (scaled) data , sample correlation , compute likelihood of data different values of df
. pick value of df
that maximizes likelihood of data.
edit:
library(mvtnorm) set.seed(12121212) ################################################################################ ## simulate n vectors of p-dim. t-distributed data in matrix x: n <- 300 p <- 8 # draw random column means means <- 10 * rnorm(p) # correlation ar(1) correlation rho=.8 rho <- 0.8 sigma <- rho ^ abs(outer(1:p, 1:p, "-")) # column s.d.s sqrt(1:8) df <- 3 x <- t(t(rmvt(n, sigma=sigma, delta=means, df=df)) * sqrt(1:8)) ################################################################################ # evaluate t-likelihood scaled x: x_scale <- scale(x) sigma_est <- cor(x_scale) df_candidates <- seq(1, 20, by=2) loglik <- numeric(length(df_candidates)) names(loglik) <- df_candidates for(df in df_candidates){ # no need delta since we're working on scaled & centered data. # use sum(log(likelihood)), not prod(likelihood) avoid numeric over/underflow loglik[as.character(df)] <- sum(dmvt(x=x_scale, sigma=sigma_est, df=df, log=true)) } loglik # 1 3 5 7 9 11 13 #-1788.219 -1756.301 -1768.885 -1783.724 -1797.386 -1809.556 -1820.382 # 15 17 19 #-1830.066 -1838.788 -1846.698 ## --> maximal df=3, used simulation. ## verify mean shift can incorporated pre-processing above: dmvt(x[1,], delta=means) == dmvt(x[1,] - means) #[1] true
Comments
Post a Comment