Why you should use Solr 6 with Sitecore

I’m recently set up Sitecore with Solr 6.2. Anyone who has used Sitecore and Solr already has probably been aggravated by one annoying bug/oversight in the Solr admin, something that’s finally been fixed.7dc962eabf1

How beautiful is that? We can finally see the full name of the core in the selector!

So far I haven’t found any compatibility issues with Sitecore 8.2 and Solr 6.2. Give it a try!

Publish Sitecore Media Items on Referenced Datasources

One of the great additions to Sitecore 8 is the ability to publish related items when executing a publish. Using this feature, you’ll be sure to publish out any necessary items that may be needed to render the page correctly, such as data sources, referenced taxonomy items, or images.

However, you may still have some gaps when using this feature. Consider common scenario where you have a new page, and you add a component to the page that uses an separate item as a data source. On that data source is a field for an image. When publishing the page, the newly created data source item goes out, but the media item linked to on that data source does not.

This is because of the way Sitecore processes referenced items. In essence, it only goes one level deep in the reference tree. So, items referenced by the item being published will be added to the queue, but items referenced by those referenced items will not.

Normally this is ok. If the publisher crawled references recursively, you’d probably wind up in an infinite publishing loop, or you’d at least wind up doing a large publish unintentionally. But it is common for data source items to reference new content, like media, so we need to include those in the publish too.

There’s a pipeline in Sitecore 8 we can use specifically for this purpose, the <getItemReferences> pipeline. Out of the box, it includes a step to AddItemLinkReferences. This step is the one responsible for adding our referenced data source item, so we can override this step to add logic to include media referenced by that data source.

Like all great Sitecore developers, we customize Sitecore by reflecting on their code and replacing it with our own logic. I opened up Sitecore.Publishing.Pipelines.GetItemReferences.AddItemLinkReferences, and added the following.

...
  foreach (Item obj in itemLinkArray.Select(link => link.GetTargetItem()).Where(relatedItem => relatedItem != null))
  {
    list.AddRange(PublishQueue.GetParents(obj));
    list.Add(obj);
    // This will look at the item's links looking for media items.
    list.AddRange(GetLinkedMediaItems(obj));
  }
  return list.Distinct(new ItemIdComparer());
}

Then we’ll add the GetLinkedMediaItems method,

protected virtual List<Item> GetLinkedMediaItems(Item item)
{
  List<Item> mediaList = new List<Item>();
  ItemLink[] itemLinkArray = item.Links.GetValidLinks()
    .Where(link => item.Database.Name.Equals(link.TargetDatabaseName, StringComparison.OrdinalIgnoreCase))
    .ToArray();
  foreach (ItemLink link in itemLinkArray)
  {
    try
    {
      Item target = link.GetTargetItem();       
      if (target == null || !target.Paths.IsMediaItem) 
        continue;
      // add parent media items or folders
      Item parent = target.Parent;
      while(parent != null && parent.ID != ItemIDs.MediaLibraryRoot)
      {
        mediaList.Insert(0, parent);
        parent = parent.Parent;
      }
      mediaList.Add(target);
    }
    catch (Exception ex)
    {
      Log.Error("Error publishing reference link related media items", ex, typeof(AddItemAndMediaLinkReferences));
    }
  }
  return mediaList;
}

We can include this new pipeline by replacing the old one we reflected on.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
 <sitecore>
  <pipelines>
   <getItemReferences>
    <processor type="Sitecore.SharedSource.Pipelines.Publish.AddItemAndMediaLinkReferences, Sitecore.SharedSource"
               patch:instead="processor[@type='Sitecore.Publishing.Pipelines.GetItemReferences.AddItemLinkReferences, Sitecore.Kernel']"/>
   </getItemReferences>
  </pipelines>
 </sitecore>
</configuration>

With this in place, media items referenced on any linked item will be published. You can further refine the logic to just consider data sources, perhaps by checking the path or template to ensure it’s a data source, to cut down on unintentional publishes.

Keep Sitecore Online When Solr Fails

There have been a few experimental patches made available from Sitecore to improve the support for Solr. One particularly thorny issue is that Sitecore will throw exceptions, thus bringing down your site, if Solr is misconfigured or unavailable. Recently the source for a patch was released on GitHub by Sitecore support that addresses this. It even supports switch on rebuild.

https://github.com/andrew-at-sitecore/Sitecore.Support.391039

With this patch, Sitecore will poll at a configured interval for a Solr connection. If available, it will initialize the indexes or create your IQueryable objects. Otherwise, it will log an error, and return an empty result set if applicable.

The magic is in the how the patch initializes the index and Queryable objects. Using the StatusMonitor class included with the patch, it checks to see if Solr is available before attempting to use the connection.

void ISearchIndex.Initialize()
{
  SolrStatusMonitor.CheckCoreStatus(this);
  if (this.PreviousConnectionStatus == 
      ConnectionStatus.Succeded)
  {
    base.Initialize();
  }
}

To use this patch, you’ll need to build it against your version of Sitecore. After that, drop in the patch config and follow the example in the provided configs to swap the types on your Solr indexes to the fail-tolerant Solr search indexes.

One disclaimer: This patch will keep your CD servers online if Solr fails, but the Sitecore Admin will not function. So, your authors will not be able to use the back end until the Solr problem is corrected.

Benchmarking Sitecore Publishing

Publishing has been a sore spot lately for some of our clients due to the high amount of content they have in their Sitecore environment. When you start to get into hundreds of thousands of pieces of content, a full site publish is prohibitive. Any time a change is made that requires a large publish your deployment window goes from an hour to potentially an all-day affair. If a user accidentally starts a large publish, subsequent content publishes will get queued and backed up until that large publish completes, or until someone logs into the server and restarts the application.

Still waiting

There are options available to speed up the publishing process. Starting in Sitecore 7.2, parallel publishing was introduced, along with some experimental optimization settings. In Sitecore 8.2, we have a new option, the Sitecore Publishing Service.

What benefits can we see from these options?  I decided to do some tests of large content publishes using these techniques. Each publishing option has its own caveats of course, but this post is concerning itself mainly with the publishing performance of each of the available options.

Skip to the results!

Methodology

I wanted to run these tests in as pure an environment as possible. I set up 3 Sitecore 8.2 environments using Sitecore Instance Manager on my local machine. Using the FillDB tool, I generated 100,000 content items nested in a folder under the site root. Each of these items is of the Sample Item template that ships with a clean Sitecore installation. Full Publish on the entire site was used in each example. Each time the content was being published for the first time.

For benchmarking purposes, my local machine has the following specs,

  • Intel  i7, 8 Core, 2.3 GHz CPU
  • 16 GB RAM
  • Seagate SSHD (not an SSD, but it claims to perform like an SSD!)
  • Windows 7 x64, SP1
  • SQL Server Express 2015
  • .NET 4.6 and .NET Core installed

Default Publishing

The first test was doing a full site publish after generating 100,000 content items using the out-of-the-box publishing configuration. This is probably how most of Sitecore sites are configured unless you took steps to optimize the publishing processes. The results are, as expected, not great.

21620 12:19:30 INFO  Job started: Publish
21620 13:51:18 INFO  Job ended: Publish (units processed: 106669)

That’s over 90 minutes to publish these items, and the content items themselves only had 2 fields with any data.

Parallel Publishing

Next I tested parallel publishing, introduced in Sitecore 7.2. To use this, you need to enable Sitecore.Publishing.Parallel.config. Since I have an 8 core CPU, I set the Publishing.MaxDegreeOfParallelism setting to 8.

There is also Sitecore.Publishing.Optimizations.config, which contains, as the name implies, some optimization settings for publishing. The file comments state that the settings are experimental, and that you should evaluate them before using them in production. For purposes of this test, I ignored this file.

With parallel publishing enabled we see a much shorter publish time of around 25 minutes.

12164 14:27:10 INFO  Job started: Publish to 'web'
12164 14:52:58 INFO  Job ended: Publish to 'web' (units processed: 106669)

Publishing Optimizations

I reran the previous test with the Sitecore.Publishing.Optimizations.config enabled, along with the parallel publishing. This shortened the publish to around 15 minutes.

9836 15:52:34 INFO  Job started: Publish to 'web'
9836 16:07:20 INFO  Job ended: Publish to 'web' (units processed: 106669)

Sitecore Publishing Service

New in Sitecore 8.2 is the Publishing Service, which is a separate web application written in .NET Core that replaces the existing publishing mechanism in your Sitecore site. The documentation on setting up this service is thorough, so kudos to Sitecore for that, however it can be a bit dense. I found this blog post quite helpful in clearing up my confusion. Using it in conjunction with the official documentation, I was able to set up this service in less than an hour.

I ran into a problem using this method, however. The Publishing Service uses some new logic to gather the items it needs to publish, and one of the things it keys off of is the Revision field. Using the FillDb tool doesn’t explicitly write to the Revision field, therefore the service didn’t publish any of my generated items. I wound up running a script with Sitecore Powershell to make a simple edit to these items forcing the Revision field to be written. After that, my items published as expected.

The results were amazing. The new Publish Service was able to publish the entire site, over 100,000 items, in just over 4 minutes. That’s over 20x faster than the default publish settings.

2016-10-19 16:34:17.027 -04:00 [Information] New Job queued : 980bee8e-a132-4041-82d8-155b8496b19f - Targets: "Internet"
2016-10-19 16:39:07.304 -04:00 [Information] Job Result: 95b88a85-64f4-465e-b33d-a7a901331488 - "Complete" - "OK". Duration: 00:04:05.2786436

Summary

Each of these optimizations come with caveats. Parallel Publishing can introduce concurrency issues if you’re firing events during publish. The optimization config settings need to be vetted before rolling out, as it disables or alters many features you may be using, even if you don’t realize you’re using them.

If you’re on Sitecore 8.2 I strongly recommend giving the Publishing Service a look. Like any change to your system, you’ll want to test the effects it has on your publishing events and other hooks before rolling it out.

Remote Debugging Your Sitecore Application

In this post I’ll walk you through attaching a debugger to a Sitecore application running on a remote server. If you’re in a pinch dealing with a production issue and looking for a TL;DR on the MSDN documentation hopefully this will help. I’m assuming you’re using Visual Studio 2015, but the steps are largely the same for earlier versions.

Installing and Starting the Remote Debugger service

You’ll need to download the appropriate remote debugger for the version of Visual Studio you’re using. To start the service, you’ll need administrator permissions on the remote machine. If you don’t have that, find someone who does to set up and start the service for you.

If you have an account on the remote machine, to connect you’ll need an account on the local machine with the same domain and username to connect. It’s easier to select “No Authentication” and “Allow Any User to Debug”, but be warned there are security concerns. I don’t recommend this option unless there’s an additional layer of security in your system, such as IP filtering. You should also change the default port.

rdb_host

Attaching the Debugger

Once the service is listening, open your project in Visual Studio and select Attach to Process from the Debug menu. In the dialog box, under Transport, select Default if using Windows Auth, or No Authentication if using the No Authentication option on the remote server. In the Qualifier box, put in the server:port and hit enter. If the remote debugger is listening, you’ll see a list of processes running on the remote machine. Select w3wp.exe from the list and attach. That’s it!  You’re now attached and ready to debug your remote application.

rdb_attach

Why can’t I hit my breakpoints?

Chances are, unless you built the app and deployed the DLLs from your local machine, you won’t be able to hit any breakpoints you set. In order for the remote debugger to function, it needs the exact same symbols (pdb) files from the remote machine.  You’ll need to copy these down to the machine you’re debugging from and tell Visual Studio where to find them.

Once you’ve copied the symbols to a local folder, open the Tools -> Options dialog from Visual Studio. Select Debugging -> Symbols, and add a new path by clicking the folder icon. Paste in the path to where you copied down the pdb files and make sure the box is checked next to that new entry.  Close this and reattach the remote debugger.

rdb_symb

Why can’t see my variable values?

Try this. In the Tools -> Options dialog, under Debugging, check the “Use Legacy C# and VB Expression evaluators”.

rdb_options

Happy debugging, and remember to turn off your remote debugger when you’re finished!

Test Regular Expressions in Real-Time with NCrunch

odns_regexToday I had to write some utility classes to parse query parameters from one search provider and transform them to work with another search provider.  Naturally this meant a lot of work with regular expressions.

Many developers dread working with regular expressions. The syntax is arcane, the patterns can be absurdly long, and debugging them is a chore. A single character change in a regex pattern, to address one edge case, can cause failures in many other inputs. Unit test covering an array of inputs is essential to have confidence in your regex code and faith that your changes after the fact don’t break existing use cases.

But how can we accelerate the initial implementation? There are an array of tools out there to help with developing regexes, but if you don’t need any of that if you have NCrunch installed to Visual Studio.

Chances are if you are a .NET developer and practice TDD, you’ve heard of NCrunch. If you’re not familiar with it, NCrunch is a tool that (among other things) runs your unit tests for you in the background, as you type. This gives you real-time feedback on your tests which is a significant accelerator, saving you the trouble of writing a test, writing some code to make the test pass, rebuilding, running the test, repeat until green. You know right away if your code is working as intended. If it isn’t, you know immediately what you broke when you see the red lights next to your test methods. Overall it’s a great tool and I highly recommend it.

Consider this example, testing an email validation pattern:

public class Email_Regex
{
	private const string EmailPattern = @".+";
	public bool IsValidEmail(string email)
	{
		return Regex.IsMatch(email, EmailPattern);
	}
}


[TestFixture]
public class Email_Regex_Tests
{
	private Email_Regex _regex;

	[SetUp]
	private void SetUp()
	{
		_regex = new Email_Regex();
	}

	[Test]
	public void IsValidEmail_SimpleEmail_IsValid()
	{
		string testEmail = "chris.sulham@gmail.com";
		Assert.IsTrue(_regex.IsValidEmail(testEmail));
	}
}

This test will pass, but clearly the pattern isn’t going to cut it in the real world. We can add another test with an invalid email, that should make the pattern not match.

[Test]
public void IsValidEmail_NonEmailInput_IsNotValid()
{
	string testEmail = "I'm not telling you my email address!";
	Assert.IsFalse(_regex.IsValidEmail(testEmail));
}

Here we can set up an array of tests against representative inputs that we’ll run through our regex. Using NCrunch, we can make edits to the pattern and see in real time whether or not our inputs are matching. As you add more tests and have to tweak the pattern, you’ll once again know immediately which test inputs are working with your changes. Seeing the lights go red the moment you add or change a character in the pattern will remove many, many painful cycles of change, build, run tests, wonder what broke everything, change again…

Developing regular expressions this way almost makes the whole process fun.  Almost.

Download NCrunch here