performance - Fuzzy Problems in Solr Filter Query -
it grateful if can me problem. have query:
select?q=city:frankfurt main~&fq=street:gerhart-hauptmann-str.~
this not working me. want use fuzzy search catch user input mistakes.
here want:
frankfurt main
should searched in fieldcity
fuzzy searchgerhart-hauptmann-str.
should converted 3 terms fuzzy search.
debug output of actually:
"debug": { "rawquerystring": "city:frankfurt main~", "querystring": "city:frankfurt main~", "parsedquery": city:frankfurt text:am text:main~2", "parsedquery_tostring": "city:frankfurt text:am text:main~2", "explain": {...}, "qparser": "luceneqparser", "filter_queries": [ "street:gerhart-hauptmann-str.~" ], "parsed_filter_queries": [ "street:gerhart-hauptmann-str.~2" ],
i (think) want output:
"debug": { "rawquerystring": "city:frankfurt main~", "querystring": "city:frankfurt main~", "parsedquery": city:frankfurt~2 city:am~2 text:main~2", "parsedquery_tostring": "city:frankfurt~2 city:am~2 text:main~2", "explain": {...}, "qparser": "luceneqparser", "filter_queries": [ "street:gerhart-hauptmann-str.~" ], "parsed_filter_queries": [ # analyser converts str. strasse "street:gerhart~2 street:hauptmann~2 strasse~2" ],
the definition of fields in schema.xml
<field name="city" type="admin_name" indexed="true" stored="true" /> <field name="street" type="street_name" indexed="true" stored="true" multivalued="false"/> <fieldtype name="admin_name" class="solr.textfield" > <analyzer> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.lowercasefilterfactory" /> <filter class="solr.synonymfilterfactory" synonyms="lang/synonyms_de_admin.txt"/> <filter class="solr.asciifoldingfilterfactory"/> </analyzer> </fieldtype> <fieldtype name="street_name" class="solr.textfield" > <analyzer> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.lowercasefilterfactory" /> <!-- startendsynonymfilter replaces synonyms @ start or end of term. types start_synonym or end_synonym set. --> <filter class="my.startendsynonymfilterfactory" synonyms="lang/synonyms_de_street.txt"/> <filter class="solr.asciifoldingfilterfactory"/> </analyzer> </fieldtype>
is somehow possible?
if need additional information answer, please leave hint in comment.
- tokenizing on hyphens
have @ worddelimiterfilterfactory: https://wiki.apache.org/solr/analyzerstokenizerstokenfilters#solr.worddelimiterfilterfactory
- applying fuzzy every single term
disclaimer: have not yet used fuzzy search in solr setups.
you might have careful tokenizing city names , applying fuzzy search every single token. example "frankfurt main" in case apply fuzzy search "am", well. please try parenthesis: (frankfurt main)~
whether gets intended result.
however, in case of names (city or streets) i'm not sure should tokenizing them. maybe storing them 1 case insensitive token , applying fuzzy search "frankfurt main"~ (with quotes in query) need.
nevertheless, should try , work in way have described it. @ query results. , (maybe in parallel) setup index store city , street names single tokens (keywordtokenizer lower casing , ascii folding, e.g.) , apply fuzzy search them single terms. guess results sharper. best - try out , compare.
in addition, suggest try out (extended or not) dismax handler input without caring differentiate between cities , streets on input side: https://cwiki.apache.org/confluence/display/solr/the+extended+dismax+query+parser
with dismax handler processing input, can allow user input search terms freely (like having single search field cities , streets can input in random order , format).
Comments
Post a Comment