scikit learn - What is the effect of the zero_based parameter in dump_svmlight_file() method in sklearn.datasets? -


i'm making classification experiments using sklearn. during experiments, i'm building csr_matrix objects store data , used logisticregression classifier on these objects , results. dump data using dump_svmlight_file , model using joblib. when load data using load_svmlight_file , model, obtained (very) different results.

i realized if dump data setting zero_based parameter false, retrieve original results. effect of parameter? usual have different results modifying value of parameter?

the docs pretty explicit:

zero_based : boolean or “auto”, optional, default “auto”

whether column indices in f zero-based (true) or one-based (false). if column indices one-based, transformed zero-based match python/numpy conventions. if set “auto”, heuristic check applied determine file contents. both kinds of files occur “in wild”, unfortunately not self-identifying. using “auto” or true should safe.

your observation seems odd, though. if dump zero_based=false , load zero_based='auto' heuristic should able detect right format. also, if wrong format have been detected, number of features have changed, there error classifier.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -