scikit learn - Python sklearn - how to calculate p-values -
this simple question trying calculate p-values features either using classifiers classification problem or regressors regression. suggest best method each case , provide sample code? want see p-value each feature rather keep k best / percentile of features etc explained in documentation.
thank you
just run significance test on x, y
directly. example using 20news , chi2
:
>>> sklearn.datasets import fetch_20newsgroups_vectorized >>> sklearn.feature_selection import chi2 >>> data = fetch_20newsgroups_vectorized() >>> x, y = data.data, data.target >>> scores, pvalues = chi2(x, y) >>> pvalues array([ 4.10171798e-17, 4.34003018e-01, 9.99999996e-01, ..., 9.99999995e-01, 9.99999869e-01, 9.99981414e-01])
Comments
Post a Comment