[Answered ]-Revelants queries suggestion for autocomplete with Solr

2đź‘Ť

âś…

After struggling hours I finally get something. Not perfect but good enough.

According to this article :
http://alexbenedetti.blogspot.fr/2015/07/solr-you-complete-me.html

I used the FreeTextLookupFactory

My search_indexes.py

class ArticleIndex(indexes.SearchIndex, indexes.Indexable): 

    text = indexes.CharField(document=True, use_template=True)
    created = indexes.DateTimeField(model_attr='created')
    rating = indexes.IntegerField(model_attr='rating')
    title = indexes.CharField(model_attr='title', boost=1.125)

    def get_model(self):
            return Article

My schema.xml

<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>


<field name="text" type="text_en" indexed="true" stored="true" multiValued="false"  termVectors="true" />
<field name="rating" type="long" indexed="true" stored="true" multiValued="false"/>
<field name="title" type="text_en" indexed="true" stored="true" multiValued="false"/>
<field name="created" type="date" indexed="true" stored="true" multiValued="false"/>

My Solrconfig.xml

<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">suggest</str>
    <str name="lookupImpl">FreeTextLookupFactory</str> 
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">title</str>
    <str name="ngrams">3</str>
    <float name="threshold">0.004</float>
    <str name="highlight">false</str>
    <str name="buildOnCommit">false</str>
    <str name="separator"> </str>
    <str name="suggestFreeTextAnalyzerFieldType">text_general</str>
  </lst>
</searchComponent>

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
  <lst name="defaults">
    <str name="suggest.dictionary">suggest</str>
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

As I use Solr 6.4, it is by default on managed schema mode (which did not take my changes in schema.xml in consideration), I had to switch to manual edit mode by adding in solrconfig.xml :

<schemaFactory class="ClassicIndexSchemaFactory"/>

See here: https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig#SchemaFactoryDefinitioninSolrConfig-Switchingfromschema.xmltoManagedSchema

Then restart Solr, Rebuild index using Haystack with rebuild_index

And of course build the suggester with curl:
curl http://127.0.0.1:8983/solr/collection1/suggest?suggest.build=true

And finally the results:

curl http://127.0.0.1:8983/solr/collection1/suggest?suggest.q=new%20y

I will try to digg more into the FreeTextLookupFactory to see if I can make it more accurate but it is already satisfying.
Hope this help.

PS: always keep an eye on the logs at:
http://127.0.0.1:8983/solr/#/~logging
I would strongly suggest to have it always open on a tab. It saved my hours of pain…

👤kollo

0đź‘Ť

For what you need, I suggest using the BlendedInfixLookupFactory set up as follows:

In schema.xml, create a field that you will use for the suggester, then copy into that field:

<field name="title" type="text_general" indexed="true" stored="true" /> 
<field name="term_suggest" type="phrase_suggest" indexed="true" stored="true" multiValued="true"/>

<copyField source="title" dest="term_suggest"/>

<fieldType name="phrase_suggest" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>

</fieldType>
  <fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Then in the solrconfig.xml file:

<searchComponent name="suggest" class="solr.SuggestComponent">
   <lst name="suggester">
      <str name="name">suggest</str>
      <str name="lookupImpl">BlendedInfixLookupFactory</str>
      <str name="blenderType">linear</str>
      <str name="dictionaryimpl">DocumentDictionaryFactory</str>
      <str name="field">term_suggest</str>
      <str name="weightField">weight</str>
      <str name="suggestAnalyzerFieldType">text_suggest</str>
      <str name="queryAnalyzerFieldType">phrase_suggest</str>
      <str name="indexPath">suggest</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">false</str>
      <bool name="exactMatchFirst">true</bool>
   </lst> 
</searchComponent>

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="wt">json</str>
      <str name="indent">false</str>
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

With the BlendedInfixLookupFactory you can find “new y” wherever it occurs in the field, giving greater weight to those occurring at the beginning. The combination of using the standard tokenizer for the suggestAnalyzerFieldType and keyword tokenizer for the queryAnalyzerFieldType will make it so you can search using spaces (the query “new y” will be read as a string or keyword).

The confluence wiki link that you posted is good, it was last modified in September 2016.

EDIT:
I didn’t realize you didn’t want the whole titles. You can try using shingles for this, by changing the phrase_suggest fieldType in the above schema to this:

<fieldType name="phrase_suggest" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" 
            minShingleSize="2"
            maxShingleSize="4"
            outputUnigrams="true"
            outputUnigramsIfNoShingles="true"/>
    </analyzer>
</fieldType>

EDIT2:
Alternatively, you could use the phrase_suggest with a standard tokenizer with a shingle filter for the index analyzer and keyword tokenizer for the query analyzer:

<fieldType name="phrase_suggest" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" 
            minShingleSize="2"
            maxShingleSize="4"
            outputUnigrams="true"
            outputUnigramsIfNoShingles="true"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
</fieldType>

Then for the suggest searchComponent, you just need:

<str name="suggestAnalyzerFieldType">phrase_suggest</str>

(and no queryAnalyzerFieldType). Of course, you’ll need to change the ShingleFilterFactory settings to fit your needs.

Leave a comment