Create a custom Solr index in Sitecore 9

Hello there. 

Hi! So you want to create a new Solr index?

Yes, I think so?

It’s a great idea. You’ll be familiar with the big three, sitecore_core_index, sitecore_master_index and sitecore_web_index, but you don’t have to stop there! You can create individual indexes for certain content types on your site, such as Products. Smaller, more individualised indexes are easier to maintain, troubleshoot, faster to rebuild and can be faster to query.

Are they hard to set up?

Not as hard as you’d expect! Let’s create one now.

OK. My Solr is set up and I can access the web UI on https://solr:8983/solr/#/ – what now?

Let’s create the physical Solr core.

  1. Find your Solr index folder for the sitecore_master_index. Mine was at C:\solr\solr-6.6.2\server\solr\sitecore_master_index
  2. Copy this whole folder (into the same parent folder) and call it sitecore_master_products_index
  3. Inside the sitecore_master_products_index folder, open up the core.properties file and change the name property to read sitecore_master_products_index
  4. Restart Solr (I use the solr stop and solr start commands – see below)
  5. Now, go to https://solr:8983/solr/#/ and check out your cores – you will have a new one!

Awesome, it’s there. So I get that we copied the sitecore_master_index and renamed it to sitecore_master_products_index – and in Solr I can see that it contains thousands of documents already, copied from sitecore_master_index. How do I clean things up?

Well, good question. We want to delete all of the existing items in this index and start afresh. You can do this via a web browser – just call this URL:

https://solr:8983/solr/sitecore_master_products_index/update?commit=true&stream.body=<delete><query>*:*</query></delete>

Radical. Everything is deleted. Soo. I want to use this index to only contain certain types of content from Sitecore. How do I configure it properly?

We need to just add a single configuration file to Sitecore. It’s below. It looks mostly like the configuration file for sitecore_master_index, but we change two important things, (a) which template types we want to include in our index and (b) which field types we want to include in our index. In your real solution, this will take a bit of time to set up, but being selective is the whole point of creating a custom index, and you’ll want to keep it as trim as possible.

Here’s the whole config file, which I’ve called Sitecore.ContentSearch.Solr.Index.Master.Products.config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
  <sitecore role:require="Standalone or ContentManagement" search:require="solr">
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
        <indexes hint="list:AddIndex">
          <index id="sitecore_master_products_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="name">$(id)</param>
            <param desc="core">$(id)</param>
            <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
              <configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration">
                  <documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
                      <indexAllFields>false</indexAllFields>

                      <!-- Included fields -->
                      <include hint="list:AddIncludedField">
                          <ProductName>{E676F36E-B0E0-4BE5-998A-329A8F9055FD}</ProductName>
						  <LongDescription>{8A978A2E-0E7A-4415-9163-2F4ECF85A3AB}</LongDescription>
                      </include>

                      <!-- Included templates -->
                      <include hint="list:AddIncludedTemplate">
                          <Product>{665DC431-673A-4D63-B9A6-00EB148E693C}</Product>
                      </include>

                  </documentOptions>
              </configuration>
            <strategies hint="list:AddStrategy">
              <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/syncMaster" />
            </strategies>
            <locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>master</Database>
                <Root>/sitecore</Root>
              </crawler>
            </locations>
            <enableItemLanguageFallback>false</enableItemLanguageFallback>
            <enableFieldLanguageFallback>false</enableFieldLanguageFallback>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

The two bits you’ll need to replace here are the bits commented as Included Fields and Included Templates:

<!-- Included fields -->
<include hint="list:AddIncludedField">
  <ProductName>{E676F36E-B0E0-4BE5-998A-329A8F9055FD}</ProductName>
  <LongDescription>{8A978A2E-0E7A-4415-9163-2F4ECF85A3AB}</LongDescription>
</include>

<!-- Included templates -->
<include hint="list:AddIncludedTemplate">
  <Product>{665DC431-673A-4D63-B9A6-00EB148E693C}</Product>
</include>

OK, done. I’ve added my list of templates, and fields here. So, can I reindex now and see my new content?

Absolutely. Go into Sitecore > Control Panel > Indexing Manager, find your index and rebuild it.

When you’re done, go back to the Solr UI and see your documents! If things didn’t go quite to plan, check in your site Crawling.log, which will contain any indexing errors.

Production ready?

Well, not quite. You might want to create a sitecore_web_products_index and use the Sitecore.ContentSearch.Solr.Index.Web.config configuration file as an example of how to register it in Sitecore. Using Sitecore’s conventions for master and web keep the surprises to a minimum.

Search on, pals!

 

Sitecore 9: ContentSearch Solr query quirks with spaces and wildcards

Sitecore provides a Linq powered IQueryable mechanism with which you can build powerful search queries. Your query will be translated into a native query for your underlying search engine (eg. Solr). There are some odd quirks (bugs?) with this translation in Sitecore 9.0 and 9.0.1 when your search term includes a space. Let’s take a look.

In the below examples, context is an instance of IProviderSearchContext, which you’d typically wire up with dependency injection. In each case, we’re looking to query something from the index based the item’s path in the Sitecore tree.

Querying on exact matches:

context.GetQueryable().Where(x => x.Path == "Hello");
 Translates to: {_fullpath:(Hello)}

Ok! This makes sense.

context.GetQueryable().Where(x => x.Path == "Hello World");
 Translates to: {_fullpath:("Hello World")}

Notice that if your query term has a space, we need to wrap the term in quotes.

context.GetQueryable().Where(x => x.Path == "\\Hello");
 Translates to: {_fullpath:(\\Hello)}

Backslash? No problem.

context.GetQueryable().Where(x => x.Path == "/Hello");
 Translates to: {_fullpath:(\/Hello)}

Forwardslash? We need to escape that with a ‘\’

context.GetQueryable().Where(x => x.Path == "\\Hello World");
 Translates to: {_fullpath:("\\Hello World")}

Backslash with space? No problem, just add the quotes.

context.GetQueryable().Where(x => x.Path == "/Hello World");
 Translates to: {_fullpath:("\/Hello World")}

As above, we’re all good, the forwardslash is just escaped.

Querying on partial matches – where things get interesting:

context.GetQueryable().Where(x => x.Path.Contains("Hello"));
 Translates to: {_fullpath:(*Hello*)}

All good. Here, we wrap our search term in a wildcard, *

context.GetQueryable().Where(x => x.Path.Contains("Hello World"));
 Translates to: {_fullpath:("\*Hello\\ World\*")}

Uh oh! Something weird has happened. The quotes and wildcard seem to have got mixed up, and we’ve ended up with something which won’t return the results we want. Having read more about wildcard / space combinations here , we probably want to end up with something simpler, like {_fullpath:(*Hello\ World*)}

context.GetQueryable().Where(x => x.Path.Contains("\\Hello"));
 Translates to: {_fullpath:(*\\Hello*)}

No problem with this partial match, as we don’t have a space to deal with.

context.GetQueryable().Where(x => x.Path.Contains("/Hello"));
 Translates to: {_fullpath:(*\/Hello*)}

Again, fine.

context.GetQueryable().Where(x => x.Path.Contains("\\Hello World"));
 Translates to: {_fullpath:("\*\\Hello\\ World\*")}

The space completely breaks everything here

context.GetQueryable().Where(x => x.Path.Contains("/Hello World"));
 Translates to: {_fullpath:("\*\/Hello\\ World\*")}

and here..

Summary

I raised this with Sitecore and it has been raised as a bug. In the meantime – if you can get away with using StartsWith rather than Contains, you’ll find this works OK:

context.GetQueryable().Where(x => x.Path.StartsWith("Hello World"));
 Translates to: {_fullpath:(Hello\ World*)}

Which is just about perfect.

Sitecore Solr Error: Processing Field Name. Resolving Multiple Field found on Solr Field Map. No matching template field on index field name, return type ‘String’ and field type ”

After an upgrade to Sitecore 9, our Sitecore search logs were filled with thousands of warnings, like the below:

WARN Processing Field Name : Overview Text. Resolving Multiple Field found on Solr Field Map. No matching template field on index field name 'overview_text', return type 'String' and field type ''

What’s the fix?

You need to add field mappings for each of the fields in your Solr index. In our case, we had no mapping for ‘overview_text’, so Sitecore / Solr didn’t know how to treat the field. Add a config patch and specify a returnType for the fields you see as warnings in the log:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultSolrIndexConfiguration>
          <fieldMap>
            <fieldNames hint="raw:AddFieldByFieldName">
              <field fieldName="overview_text" returnType="text" />
            </fieldNames>
          </fieldMap>
        </defaultSolrIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

** UPDATE 25/01/2018 **

While the above is suitable for adding a few fields, having hundreds or thousands of fields in your Solr index will lead to having to maintain lots of the above configuration entries. I raised a ticket with Sitecore, and was told “We registered this behavior as a bug with the reference #​195567”. Sitecore’s suggested workaround is to add a log4Net filter which will stop the problematic entries from reaching the log. For example:

<!-- Filter out Solr log warnings-->
    <log4net>
      <appender name="SearchLogFileAppender">
        <filter type="log4net.Filter.StringMatchFilter">
          <regexToMatch  value="Resolving Multiple Field found on Solr Field Map. No matching solr search field configuration on index field name|Search field name in Solr with Template Resolver is returning no entry|Resolving Multiple Field found on Solr Field Map. No matching template field on index field name|Solr with Template Resolver is returning multiple entry|is being skipped. Reason: No Field Type Name" />
          <acceptOnMatch value="false" />
        </filter>
      </appender>
    </log4net>

Hopefully a proper fix or configuration guidance will be released at some point.

Sitecore Solr setup: Document is missing mandatory uniqueKey field: id

While reconfiguring Sitecore (8.2u5) to use Solr (6.6.1) instead of Lucene, I came across the following error:

Document is missing mandatory uniqueKey field: id

In full:

Job started: Index_Update_IndexName=sitecore_master_index|#Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> SolrNet.Exceptions.SolrConnectionException: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">1</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">org.apache.solr.common.SolrException</str></lst><str name="msg">Document is missing mandatory uniqueKey field: id</str><int name="code">400</int></lst>
</response>
 ---> System.Net.WebException: The remote server returned an error: (400) Bad Request.
 at System.Net.HttpWebRequest.GetResponse()
 at HttpWebAdapters.Adapters.HttpWebRequestAdapter.GetResponse()
 at SolrNet.Impl.SolrConnection.GetResponse(IHttpWebRequest request)
 at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
 --- End of inner exception stack trace ---
 at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
 at SolrNet.Impl.SolrConnection.Post(String relativeUrl, String s)

Here’s what to check.

  • Does your Solr core index config directory have a file called managed-schema? If so, delete this file and reload the core. Solr will be ignoring any changes you’re making to schema.xml and using managed-schema instead. Deleting this file and reloading the core will pick up your latest version of schema.xml

 

Delete this file

solr-error

 

Reload the core

solr-rebuild

 

Rebuild the index in Sitecore and the error should be gone. 

solr-reindex