scikit learn - What is the effect of the zero_based parameter in dump_svmlight_file() method in sklearn.datasets? -
i'm making classification experiments using sklearn. during experiments, i'm building csr_matrix objects store data , used logisticregression classifier on these objects , results. dump data using dump_svmlight_file , model using joblib. when load data using load_svmlight_file , model, obtained (very) different results.
i realized if dump data setting zero_based parameter false, retrieve original results. effect of parameter? usual have different results modifying value of parameter?
zero_based : boolean or “auto”, optional, default “auto”
whether column indices in f zero-based (true) or one-based (false). if column indices one-based, transformed zero-based match python/numpy conventions. if set “auto”, heuristic check applied determine file contents. both kinds of files occur “in wild”, unfortunately not self-identifying. using “auto” or true should safe.
your observation seems odd, though. if dump zero_based=false
, load zero_based='auto'
heuristic should able detect right format. also, if wrong format have been detected, number of features have changed, there error classifier.
Comments
Post a Comment