machine learning - Weka ARFF How to handle features/attributes that can have more then 1 value per data-item -


for recommendation engine trying convert movie data arff format, , though arff format clear me unsure best way solve following problem.

my dataset going in following (or similar) format rating predicted classification variable:

for each user list of: movieid-movie title-year of release-genre(s)-actor(s)-director-writer(s)-runtime-rating

my problem here fact features genre, actor, writers, can have 1 or multiple entries , weka arff allows 1 value each attribute. solution though of is:

  • have attributes such genre0, genre1, genre2. , leave empty if movie has example 1 genre. problem see this work great genre, mean actors example i'd have include actors in attribute declaration?

    @attribute actor1 {all actors} @attribute actor2 {all actors} @attribute actor3 {all actors}

since they're possible values specific feature. approach make sense me, since there thousands of actors, directors , writers rather big attribute declarations.

is there better, more efficient, way this?

i don't know of way around it, preprocessing may reduce expected size of attribute declarations. example:

{'cruise' : 1, 'smith' : 2}


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -