0👍
Solving my own question – appreciate the input by solarissmoke as it has helped me track down what was causing this.
My answer is based on Greg Baker’s answer on the question
ElasticSearch: EdgeNgrams and Numbers
The issue appears to be related to the use of numeric values within the search text (in my case, the N133TC
pattern). Note that I was using the snowball
analyzer at first, before switching to pattern
– none of these worked.
I adjusted my analyzer setting in settings.py
:
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["haystack_edgengram"]
}
Thus changing the tokenizer
value to standard
from the original lowercase
analyzer used.
I then set the default analyzer to be used in my backend to the edgengram_analyzer
(also on settings.py
):
ELASTICSEARCH_DEFAULT_ANALYZER = "edgengram_analyzer"
This does the trick! It still works as an EdgeNgram field should, but allows for my numeric values to be returned properly too.
I’ve also followed the advice in the answer by solarissmoke and removed all the underscores from my index files.
1👍
It doesn’t fully explain the behaviour you are seeing, but I think the problem is with how you are indexing your data – specifically the text
field (which is what gets searched when you filter on content
).
Take the example data you provided, callsign N133TC
, flight name Shahrul Nizam
. The text
document for this data becomes:
flight___N133TC___Shahrul Nizam
You have set this field as an EdgeNgramField
(min 4 chars, max 15). Here are the ngrams that are generated when this document is indexed (I’ve ignored the lowercase filter for simplicity):
flig
fligh
flight
flight_
flight___
flight___N
flight___N1
flight___N13
flight___N133
flight___N133T
flight___N133TC
Niza
Nizam
Note that the tokenizer does not split on underscores. Now, if you search for N133TC
, none of the above tokens will match. (I can’t explain why Shahrul
works… it shouldn’t, unless I’ve missed something, or there are spaces at the start of that field).
If you changed your text
document to:
flight N133TC Shahrul Nizam
Then the indexed tokens would be:
flig
flight
N133
N133T
N133TC
Shah
Shahr
Shahru
Shahrul
Niza
Nizam
Now, a search for N133TC
should match.
Note also that the flight___
string in your document generates a whole load of (most likely) useless tokens – unless this is deliberate you may be better off without it.
- Django AllAuth Warning
- Divide queryset on 'read' / 'unread' in Django
- Django: How to order a query set for a foreign key by a field of referencing class
- Django URL Pattern not triggered
- Django query return primary key related column values