scikit learn - What is the effect of the zero_based parameter in dump_svmlight_file() method in sklearn.datasets? -


i'm making classification experiments using sklearn. during experiments, i'm building csr_matrix objects store data , used logisticregression classifier on these objects , results. dump data using dump_svmlight_file , model using joblib. when load data using load_svmlight_file , model, obtained (very) different results.

i realized if dump data setting zero_based parameter false, retrieve original results. effect of parameter? usual have different results modifying value of parameter?

the docs pretty explicit:

zero_based : boolean or “auto”, optional, default “auto”

whether column indices in f zero-based (true) or one-based (false). if column indices one-based, transformed zero-based match python/numpy conventions. if set “auto”, heuristic check applied determine file contents. both kinds of files occur “in wild”, unfortunately not self-identifying. using “auto” or true should safe.

your observation seems odd, though. if dump zero_based=false , load zero_based='auto' heuristic should able detect right format. also, if wrong format have been detected, number of features have changed, there error classifier.


Comments

Popular posts from this blog

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -

android - MPAndroidChart - How to add Annotations or images to the chart -