Explaining Lucene explain

Each time you perform a search using Lucene, a score is applied to the results returned by your query.

--------------------------------------
| #   | Score | Tags                 |
--------------------------------------
| 343 | 2.319 | Movies, Action       |
| 201 | 2.011 | Music, Classical     |
| 454 | 1.424 | Movies, Kids         |
| 012 | 0.003 | Music, Kids          |
 --------------------------------------

In our index, # is the unique document number, score is the the closeness of each hit to our query, and tags is a text field belonging to a document.

There are many methods Lucene can use to calculate scoring. By default, we use the DefaultSimilarity implementation of the Similarity abstract class. This class implements the commonly referenced TfIdf scoring formula:


(more: https://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/search/Similarity.html)

If you’re new to Lucene (or even if you’re not!) this formula can be a bit to get your head around. To get inside the formula for a given search result, Lucene provides an explanation feature, which we can call from code (c# example in Lucene.Net):

public List GetExplainerByRawQuery(string rawQuery, int doc = 0)
{
    using (var searcher = new IndexSearcher(_directory, false))
    {
        // Create a parser, and parse a plain-text query which searches for items tagged with 'movies' or 'kids' (or hopefully, both)
        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "id", analyzer);
        var query = parser.Parse("tags:(movies OR kids)");

        // Get references to the top 25 results
        var hits = searcher.Search(query, 25).ScoreDocs;

        // For each hit, get the accompanying explanation plan. We now have a List
        var explains = hits.Select((x, i) => searcher.Explain(query, i)).ToList();

        //Clean up and return
        analyzer.Close();
        searcher.Dispose();
        return explains;
    }
}

Calling searcher.Explain(query, match.doc) gives us a text output explanation of how the matched document scores against the query:

query: tags:movies|kids
----------------------------------------------------
| #   | Score  | Tags                              |
----------------------------------------------------
| 127 | 2.4824 | Movies, Kids, Animation, Movies   |
----------------------------------------------------
2.4824  sum of:
  1.4570  weight(tags:movies in 127) [DefaultSimilarity], result of:
    1.4570  score(doc=127,freq=2.0 = termFreq=2.0), product of:
      0.7079  queryWeight, product of:
        2.9105  idf(docFreq=147, maxDocs=1000)
        0.2432  queryNorm
      2.0581  fieldWeight in 127, product of:
        1.4142  tf(freq=2.0), with freq of:
          2.0000  termFreq=2.0
        2.9105  idf(docFreq=147, maxDocs=1000)
        0.5000  fieldNorm(doc=127)
  1.0255  weight(tags:kids in 127) [DefaultSimilarity], result of:
    1.0255  score(doc=127,freq=1.0 = termFreq=1.0), product of:
      0.7063  queryWeight, product of:
        2.9038  idf(docFreq=148, maxDocs=1000)
        0.2432  queryNorm
      1.4519  fieldWeight in 127, product of:
        1.0000  tf(freq=1.0), with freq of:
          1.0000  termFreq=1.0
        2.9038  idf(docFreq=148, maxDocs=1000)
        0.5000  fieldNorm(doc=127)

Ok! But still, there’s a lot going on in there. Let’s try and break it down.

  • 2.4824 is the total score for this single search result. As our query contained two terms, ‘movies’ and ‘kids’, Lucene breaks the overall query down into two subqueries.
  • The sum of the two subqueries (1.4570 for ‘movies’ and 1.0255 for ‘kids’) are added to arrive at our total score.

For our first subquery, the ‘movies’ part, we arrive at the score of 1.4570 by multiplying queryWeight (0.709) by fieldWeight (2.0581). Let’s go line by line:

  1.4570  weight(tags:movies in 127) [DefaultSimilarity], result of:
The total score for the ‘movies’ subquery is 1.4570. ‘tags:movies‘ is the raw query, 127 is the individual document number we’re examining, and DefaultSimilarity is the scoring mecahsnism we’re using.
1.4570 score(doc=127,freq=2.0 = termFreq=2.0), product of:
The term (‘movies‘) appears twice in the ‘tags‘ field for document 127, so we get a term frequency of 2.0
0.7079 queryWeight, product of:
queryWeight (0.7079) is how rare the search term is within the whole index – in our case, ‘movies‘ appears in 147 out of the 1000 documents in our index.   This normalization factor is the same for all results returned by our query and just stops the queryWeight scores from becoming too exaggerated for any single result.
2.9105 idf(docFreq=147, maxDocs=1000)
  This rarity is called inverse document frequency (idf)
0.2432 queryNorm
  .. and is itself multiplied by a normalization factor (0.2432) called queryNorm.

This normalization factor is the same for all results returned by our query and just stops the queryWeight scores from becoming too exaggerated for any single result.

2.0581 fieldWeight in 127, product of:
  fieldWeight (2.0581) is how often the search term (‘movies‘) appears in the field we searched on ‘tags’.
1.4142 tf(freq=2.0), with freq of:
  2.0000 termFreq=2.0
  We take the square root of the termFreq (2.0) = 1.4142
2.9105 idf(docFreq=147, maxDocs=1000)
  This is multiplied by the idf which we calculated above (2.9105)
0.5000 fieldNorm(doc=127)
   and finally by a field normalization factor (0.5000), which tells us how many overall terms were in the field.

This ‘boost‘ value will be higher for shorter fields – meaning the more promenant your search term was in a field, the more relevant the result.

Further reading:

Happy Lucene hacking!

Returning JSON errors from Sitecore MVC controllers

ASP.NET MVC gives us IExceptionFilter, with which we can create custom, global exception handlers to apply to controller actions.

public class ExceptionLoggingFilter : FilterAttribute, IExceptionFilter
{
	public void OnException(ExceptionContext filterContext)
	{
		// filterContext now contains lots of information about our exception, controller, action, etc
		filterContext.Exception.Message;
		filterContext.Exception.StackTrace;
		filterContext.Controller.GetType().Name;
		filterContext.Result.GetType().Name;
		UserAgent = filterContext.HttpContext.Request.UserAgent;
	}
}

 

We can apply this filter to all Action methods, by adding our filter to the list of global filters:

public class FilterConfig {
	public static void RegisterGlobalFilters(GlobalFilterCollection filters) {
		filters.Add(new ExceptionLoggingFilter());
	}
}

 

and wiring this up to our application in our Application_Start method:

FilterConfig.RegisterGlobalFilters(GlobalFilters.Filters);

 

In Sitecore

As you may expect, Sitecore exposes this functionality as pipeline processors. Sitecore defined a custom IExceptionFilter implementation (see our snippet above) which kicks off the mvc.exception pipeline, passing along the ExceptionContext object.

As client developers, it is our job to create an appropriate processor to accept the ExceptionContext and do something with it. Let’s run through an example where we want to return a JSON representation of the error, loaded with as much useful information as possible.

For more reading on Sitecore controller actions returning JSON, have a look at John West’s post here: https://community.sitecore.net/technical_blogs/b/sitecorejohn_blog/posts/use-json-and-mvc-to-retrieve-item-data-with-the-sitecore-asp-net-cms

So, first up, create an empty handler class, which inherits from ExceptionProcessor:

public class JSONExceptionHandler :
	Sitecore.Mvc.Pipelines.MvcEvents.Exception.ExceptionProcessor
{
	public override void Process(Sitecore.Mvc.Pipelines.MvcEvents.Exception.ExceptionArgs args)
	{

	}
}

 

Create a Web.config include, to add this processor to the mvc.exception pipeline:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <mvc.exception>
        <processor type="Bleep.Handlers.JSONExceptionHandler, Bleep.Handlers"/>
      </mvc.exception>
    </pipelines>
  </sitecore>
</configuration>

 

Ok! Now our JSONExceptionHandler class will be called each time an exception occurs in MVC code. So, let’s grab all the detail we can from the ExceptionContext class and return it as JSON:

public override void Process(Sitecore.Mvc.Pipelines.MvcEvents.Exception.ExceptionArgs args)
{
	var filterContext = args.ExceptionContext;
 
	filterContext.Result = new JsonResult
	{
		JsonRequestBehavior = JsonRequestBehavior.AllowGet,
                  Data = new
		  {
    			Message = filterContext.Exception.Message,
    			StackTrace = filterContext.Exception.StackTrace,
    			Controller = filterContext.Controller.GetType().Name,
    			Result = filterContext.Result.GetType().Name,
    			UserAgent = filterContext.HttpContext.Request.UserAgent,
    			ItemName = args.PageContext.Item.Name,
    			Device = args.PageContext.Device.DeviceItem.Name,
    			User = filterContext.HttpContext.User.Identity.Name
		  }
	};
 
	filterContext.ExceptionHandled = true;
 
	// Log the error
	Sitecore.Diagnostics.Log.Error("MVC exception processing " 
                	+ Sitecore.Context.RawUrl, args.ExceptionContext.Exception, this);
}

 

This will produce a result such as:

ExceptionFilter2

Happy hacking!

A workaround for missing ViewData in Sitecore MVC

Passing data between Sitecore renderings can get tricky.

Sending messages between sibling renderings can lead us to worry about the order in which they render, and you may end up with renderings tightly coupled to other renderings. Jeremy Davis discusses ways to switch the order of rendering execution on his blog here: https://jermdavis.wordpress.com/2016/04/04/getting-mvc-components-to-communicate/

Share_Data_1

Pass data down, not across

My preferred approach is for renderings to be as isolated as possible and not need to talk to siblings. In a regular MVC site, we would instantiate a ViewModel, and pass it down to any child (or partial) views as needed. If a child view doesn’t change this ViewModel at all, we don’t have to worry about order of execution or changes of state.

In Sitecore, we can achieve this by wrapping child renderings in a parent Controller Rendering. This Controller Rendering creates and prepares the ViewModel, and then passes it down to one or more child renderings.

Share_Data_2

Let’s recap on the main points here:

  1. Our parent Controller Rendering creates and prepares a ViewModel. This parent specifies a view, which contains one or more placeholders.
  2. This ViewModel is passed along to any child renderings currently attached to the placeholders.
  3. During execution, child renderings do not modify the ViewModel. We may even consider the ViewModel immutable while rendering takes place.

Sitecore has a peculiarity here which makes our job difficult. Each rendering gets a new instance of ViewData – explained by Kern Herskind Nightingale here: http://stackoverflow.com/a/35210022/638064. This puts a stop to us using ViewData to pass our ViewModel down from the parent rendering to child renderings.

The Workaround

There’s a way you can ensure that ViewData is correctly passed down from parent to child renderings. Let’s go through how this is possible.

  1. In your top level controller, create a ViewModel, which will be passed down to all child renderings.
public ActionResult ParentContainer()
{
    var viewModel = new {PageSize = 3, CurrentPage = 2, Results = Sitecore.Context.Item.Fields["Results"].Value};
    return View();
}

  1. Add it to the ViewData collection in the current ViewContext
public ActionResult ParentContainer()
{
    var viewModel = new {PageSize = 3, CurrentPage = 2, Results = Sitecore.Context.Item.Fields["Results"].Value};
    ContextService.Get().GetCurrent().ViewData.Add("_SharedModel", viewModel);
    return View();
}
  1. In each child rendering, fetch the ViewModel and add it to the local ViewData for the current rendering (which will be empty at this point). View Renderings will do this step for you, so you don’t need to do anything special there
public ActionResult ChildRendering()
{
    // Get any ViewData previously added to this ViewContext
    var contextViewData = ContextService.Get().GetCurrent().ViewData;
    contextViewData.ToList().ForEach(x => ViewData.Add(x.Key, x.Value));
    return View();
}
  1. Et voila! You now have access to the same ViewModel for each of your child renderings.
@{
    Layout = null;
    var viewModel = ViewData["_SharedModel"];
}

Making it better

MVC offers us even better tools to remove code duplication. If you have a lot of child renderings needing access to your shared ViewModel, adding the code in step 3 will happen a lot. Let’s refactor that to an filter attribute.

public class RetrieveViewDataFilter : ActionFilterAttribute, IActionFilter
{
    public void OnActionExecuting(ActionExecutingContext filterContext)
    {
        //Merge ViewData from context
        var contextViewData = ContextService.Get().GetCurrent().ViewData;
        contextViewData.ToList().ForEach(x => filterContext.Controller.ViewData.Add(x.Key, x.Value));
    }

    public void OnActionExecuted(ActionExecutedContext filterContext)
    {
    }
}

Now, we just need to add this attribute to any Action Methods who may want to access shared ViewData from higher up in the stack

[RetrieveViewDataFilter]
public ActionResult ChildRendering()
{
    return View();
}

There we go. I’m sure Sitecore will amend their implementation at some point, but until then, we have an immutable, single direction ViewData flow.

Finding the current Action name from an MVC pipeline processor

Sitecore provides two pipeline hooks for tapping into an Action method at point of execution:

  • mvc.actionExecuting
  • mvc.actionExecuted

These follow standard MVC naming conventions – actionExecuting fires before your Action method executes, and actionExecuted fires afterwards.

A process hooking into actionExecuting looks something like this:

public class LogActionExecuting
{
    public void Process(ActionExecutingArgs args)
    {
        //Something here
    }
}

At this point, it might be useful to get some information about the Action we’re executing, or the Controller it belongs to. Sitecore allows us to access the MVC ActionDescriptor and ControllerDescriptor objects, which contain plenty of information about our Action and Controller.

public class LogActionExecuting
{
    public void Process(ActionExecutingArgs args)
    {
        //Some interesting items from the Action
        var actionName = args.Context.ActionDescriptor.ActionName;
        var actionAttributes = args.Context.ActionDescriptor.GetCustomAttributes(false);
           
        //Some interesting items from the Controller
        var controllerName = args.Context.ActionDescriptor.ControllerDescriptor.ControllerName;
        var controllerType = args.Context.ActionDescriptor.ControllerDescriptor.ControllerType;
        var controllerActions = args.Context.ActionDescriptor.ControllerDescriptor.GetCanonicalActions();
 
        //args.Context.ActionDescriptor.Execute(..);
 
    }
}

The last line is commented out, as executing the action from within itself may cause the universe to implode. Maybe.

Happy hacking!

C#: Async functionality testing with xUnit

While we’re used to using xUnit for properly isolated single unit tests, the library makes it extremely easy to assert HTTP endpoints are running and responding properly.

We make use of the System.Net.Http.HttpClient class, available with .NET Core.

image

System.Net.Http.HttpClient makes an async call to a server running locally. Once we have a response, we deserialize the response to a model object, Person, and ensure that we got a single, valid result.

Use this approach with TDD to set out endpoints with expected responses, and apply red-green-refactoring until you have all of your services online.

Lastly – how should you run your test server, which the tests will query against? If you’re using a local IIS instance or debugging through Visual Studio already, you’ll already have a suitable test server in place. Better practice, is to create a test server as part of the suite of tests, which exits and cleans up afterwards. I’ll be covering this in future posts.