[Django]-Django Haystack ElasticSearch: order by position of matched term

2👍

Solved it.
Partial answer was found here: stackoverflow.com/questions/27538766/scoring-by-term-position-in-elasticsearch — there is explanation how you can rewrite scoring to consider term position and build query to sort by that score.

What came up is that to make it work with Django-Haystack you need to overwrite Elasticsearch Backend and SearchQuerySet provided by Haystack. Below is my implementation of this.

First of all, what was needed from Haystack is:

  1. Produce correct mapping like this:

    "text" : {            
        "type" : "string",
        "index_options" : "offsets",
        "index_analyzer" : "edgengram_analyzer",
        "search_analyzer" : "standard_search"
      }
    

    When “index_options” set to “offsets” — term offset saved in index so we can retrieve it later in scoring script.

  2. Build query that sorts by updated score. My query looked like this:

    {"query":{
             "match_phrase_prefix" : {"text" : text}
             },
     "sort": {
              "_script": {
                        "script_file": "score_script",
                        "type":"number",
                        "order": "asc",
                        "params": {"q": text}
                         }
             }
    }
    

    Script file “score_script” that provides updated scores looks like this:

    termInfo=_index["text"].get(q,_OFFSETS | _CACHE);
    for(pos in termInfo)
    {
    return _score+pos.startOffset
    };
    

So first thing first. To build correct mapping we need to overwrite ElasticSearch backend provided by Haystack, so we can pass custom parameters like “index_options”. My implementation is based on elasticstack — project that allows to specify custom analyzer for each field like this:

    text = CharField(document=True, use_template=True,
        analyzer='stop')

It is my customization of elasticstack configurable backend — gist.github.com/GrigoriyMikhalkin/f76be703bc53380986a0#file-backend-py . It adds ‘add’ argument which accepts dictionary of form — {parameter: value}. Example:

    text = CharField(document=True, use_template=True,
                 analyzer="edgengram_analyzer",\
                 add={"search_analyzer":"standard_search",
                      "index_options":"offsets"})

To use it you need to overwrite HAYSTACK_CONNECTIONS variable in settings.py of tour project like this:

    HAYSTACK_CONNECTIONS = {
    "default":{
       "ENGINE":
            "base.search_backend.backend.ConfigurableElasticSearchEngine",
       "URL": os.getenv("ELASTICSEARCH_URL", "http://127.0.0.1:9200/"),
       "INDEX_NAME": "haystack",
}

}

For more details look at elasticstack docs.

Next thing is to build correct query. It consists of two parts. First, you need to create script that makes rescoring(like script above) and place it in /config/scripts/ directory of ES.

Next is overwrite default SearchQuerySet provided by Haystack. My implementation was inspired by this blog post:
http://www.stamkracht.com/extending-haystacks-elasticsearch-backend/

My implementation(gist.github.com/GrigoriyMikhalkin/f76be703bc53380986a0#file-query-py) adds custom_search method to SearchQuerySet. It can be used like this:

    sqs = ConfigurableSearchQuerySet().models(Game).load_all()\
                                      .filter(content__startswith=q)\
                                      .custom_search(search_text=q)

My custom ElasticSearch Backend.

Leave a comment