machine learning - Weka ARFF How to handle features/attributes that can have more then 1 value per data-item -
for recommendation engine trying convert movie data arff format, , though arff format clear me unsure best way solve following problem.
my dataset going in following (or similar) format rating predicted classification variable:
for each user list of: movieid-movie title-year of release-genre(s)-actor(s)-director-writer(s)-runtime-rating
my problem here fact features genre, actor, writers, can have 1 or multiple entries , weka arff allows 1 value each attribute. solution though of is:
have attributes such genre0, genre1, genre2. , leave empty if movie has example 1 genre. problem see this work great genre, mean actors example i'd have include actors in attribute declaration?
@attribute actor1 {all actors} @attribute actor2 {all actors} @attribute actor3 {all actors}
since they're possible values specific feature. approach make sense me, since there thousands of actors, directors , writers rather big attribute declarations.
is there better, more efficient, way this?
i don't know of way around it, preprocessing may reduce expected size of attribute declarations. example:
{'cruise' : 1, 'smith' : 2}
Comments
Post a Comment