java - How to replace a word by its most representative mention using Stanford CoreNLP Coreferences module -


i trying figure out way rewrite sentences "resolving" (replacing words with) coreferences using stanford corenlp's coreference module.

the idea rewrite sentence following :

john drove judy’s house. made dinner.

into

john drove judy’s house. john made judy dinner.

here's code i've been fooling around :

    private void dotest(string text){     annotation doc = new annotation(text);     pipeline.annotate(doc);       map<integer, corefchain> corefs = doc.get(corefchainannotation.class);     list<coremap> sentences = doc.get(coreannotations.sentencesannotation.class);       list<string> resolved = new arraylist<string>();      (coremap sentence : sentences) {          list<corelabel> tokens = sentence.get(coreannotations.tokensannotation.class);          (corelabel token : tokens) {              integer corefclustid= token.get(corefcoreannotations.corefclusteridannotation.class);             system.out.println(token.word() +  " --> corefclusterid = " + corefclustid);               corefchain chain = corefs.get(corefclustid);             system.out.println("matched chain = " + chain);               if(chain==null){                 resolved.add(token.word());             }else{                  int sentindx = chain.getrepresentativemention().sentnum -1;                 coremap corefsentence = sentences.get(sentindx);                 list<corelabel> corefsentencetokens = corefsentence.get(tokensannotation.class);                  string newwords = "";                 corefmention reprment = chain.getrepresentativemention();                 system.out.println(reprment);                 for(int = reprment.startindex; i<reprment.endindex; i++){                     corelabel matchedlabel = corefsentencetokens.get(i-1); //resolved.add(tokens.get(i).word());                     resolved.add(matchedlabel.word());                      newwords+=matchedlabel.word()+" ";                  }                     system.out.println("converting " + token.word() + " " + newwords);             }               system.out.println();             system.out.println();             system.out.println("-----------------------------------------------------------------");          }      }       string resolvedstr ="";     system.out.println();     (string str : resolved) {         resolvedstr+=str+" ";     }     system.out.println(resolvedstr);   } 

the best output able achieve

john drove judy 's 's judy 's house . john made judy 's dinner .

which not brilliant ...

i'm pretty sure there easier way trying achieve.

ideally, reorganize sentence list of corelabels, keep other data have attached them.

any appreciated.

the challenge need make sure token isn't part of representative mention. example, token "judy" has "judy 's" representative mention, if replace in phrase "judy 's", you'll end double "'s".

you can check if token part of representative mention comparing indices. should replace token if index either smaller startindex of representative mention, or larger endindex of representative mention. otherwise keep token.

the relevant part of code this:

            if (token.index() < reprment.startindex || token.index() > reprment.endindex) {                  (int = reprment.startindex; < reprment.endindex; i++) {                     corelabel matchedlabel = corefsentencetokens.get(i - 1);                      resolved.add(matchedlabel.word());                      newwords += matchedlabel.word() + " ";                  }             }              else {                 resolved.add(token.word());              } 

in addition, , speed process, can replace first if-condition by:

if (chain==null || chain.getmentionsintextualorder().size() == 1) 

after all, if length of co-reference chain 1, there no use looking representative mention.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -