machine learning - Standardize/decompose training/test together or separately? -


two common ml preprocessing steps on x data standardization (e.g. scale unit variance) , decomposition (map features new space, aiui).

two possible ways of implementing these steps in ml pipeline including training/test/validation sets :

i) standardize/decompose on entire training/test/validation x data set, break training/test sets , make predictions on validation set using lowest error model.

ii) break training/test sets, then standardize/decompose training/test sets separately, , make predictions on validation set using lowest error model (after standardizing/decomposing it)

is 1 of these approaches more preferable other, , why?

i think third option valid:

split test/training set, compute parameters standardization/decomposition on training set (e.g. mean , variance of training set standardization) , apply same parameters on test set.

for standardization mean, test set not have 0 mean / unit variance.

looking @ test set transform training set considered bad practice, except special case of transductive learning have given inputs of test set in advance.

your second option dangerous since test set have outliers affect parameters of standardization severly. hence, have single set of transformation parameters estimate on training set.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -