Problems with groups in java regex -


i'm quite sure has simple solution, i've been searching 3 hours , haven't managed find helps me.

i'm writing parser in java using regex , i'm supposed able match decided words, numbers 1-10000 , hex color codes. it's going great matching words, reader isn't reading numbers , color codes whole. example reads input:

down. color #000000.

as:

reading: down returning: down

reading: . returning: dot

reading: returning: whitespace

reading: color returning: color

reading: returning: whitespace

reading: # returning: nothing

reading: 0 returning: number

reading: returning: nothing

reading: f returning: nothing

reading: 2 returning: number

reading: 3 returning: number

reading: 4 returning: number

reading: . returning: dot

so it's able read words color , down whole want doesn't read color code #000000. ideally want 7 lines be:

reading: #0af234 returning: colorcode

i have:

string stringtokens = "down|color|(\\s|\\t)+|\\n|\b[1-9][0-9]{0,3}\b|10000|^(#)([a-fa-f0-9]{6})$"; pattern stringpattern = pattern.compile(stringtokens, pattern.case_insensitive); matcher m = stringpattern.matcher(input); 

then:

while (m.find()) {         if (m.start() != inputpos) {             tokens.add(new token(lineno, tokentype.invalid));         }         if (m.group().matches("^(#)([a-fa-f0-9]{6})$"))             tokens.add(new token(lineno, tokentype.colorcode));                      else if (m.group().equals("."))             tokens.add(new token(lineno, tokentype.dot));         else if (m.group().matches("down"))             tokens.add(new token(lineno, tokentype.down));         else if (m.group().matches("color"))             tokens.add(new token(lineno, tokentype.color));         else if (character.isdigit(m.group().charat(0)))             tokens.add(new token(lineno, tokentype.number, integer.parseint(m.group())));         else if (m.group().matches("\\n")) {             tokens.add(new token(lineno, tokentype.whitespace));             lineno++;         }         else if (m.group().matches("(\\s|\\t)+"))             tokens.add(new token(lineno, tokentype.whitespace));         inputpos = m.end();     } 

so question basically:

how manage read groups regarding color codes , numbers together? when print out m.group() each reading now, returns single digits. yet looking @ code digits read in same format, regex above [0-9]+, simple me. each group read whole number.

i have tried use along lines of m.group(1) , m.group(2), used word boundaries (which don't understand completely) , ^$ format, nothing seems work read token whole.

i hope managed keep code copied simple without missing important, , can me figure simple (it must be?!) thing out. thank you! :)

so have regexp:

down|color|(\\s|\\t)+|\\n|\b[1-9][0-9]{0,3}\b|10000|^(#)([a-fa-f0-9]{6})$ 

that can decompose as:

  • down
  • color
  • (\\s|\\t)++: one or more \s (ok, whitespace class) or \t (not needed \t included in \s)
  • \\n (note included in \s)
  • \b[1-9][0-9]{0,3}\b: ok, here try use word-boundary, not taking account backslashes need escaped in java string, should \\b. not sure why want use that?
  • 10000: isn't covered previous pattern?
  • ^(#)([a-fa-f0-9]{6})$: (#) seems unnecessary, #. ^...$ you're forcing content of input #abcdabcd, i'd remove it.

how match dot?

since need match again distinguish different types of tokens, why don't use multiple regexp (one each token) (or no regexp @ literals) check against head of string parse.

if matches have new token , can consume matched part of string.


Comments

Popular posts from this blog

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -

javascript - Blogger related post gadget image Resize s72-c [ Need Expert Help ] -