Real-time Search Updates with Experience Edge Webhooks: Part 1

In a previous post, we went over how to use GraphQL and a custom Next.js web service to crawl and index our Sitecore XM Cloud content into a search provider. That crawler runs on a schedule, so what happens when your authors update their content? They’ll need to wait for the next run of the crawler to see their content in the search index. This is a step back in capabilities from legacy Sitecore XP, which updated indexes at the end of every publish.

It’s possible to recreate this functionality using Experience Edge webhooks. Experience Edge offers quite a few webhook options (see the list here). To enable near real-time updates of our search index, we’ll use the ContentUpdated webhook, which fires after a publish to Edge from XM Cloud finishes. Let’s take a look at an example payload from that webhook:

{
"invocation_id": "56a95d51-b2af-496d-9bf1-a4f7eea5a7cf",
"updates": [
{
"identifier": "FA27B51EE4394CBB89F8F451B13FF9DC",
"entity_definition": "Item",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "80471F77728345D4859E0FD004F42FEB",
"entity_definition": "Item",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "80471F77728345D4859E0FD004F42FEB-layout",
"entity_definition": "LayoutData",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "7C77981F5CE04246A98BF4A95279CBFB",
"entity_definition": "Item",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "FFF8F4010B2646AF8804BA39EBEE8E83-layout",
"entity_definition": "LayoutData",
"operation": "Update",
"entity_culture": "en"
}
],
"continues": false
}

As you can see, we have item data here and layout data. The layout data is what we’re interested in, as this represents our actual web pages, and that is what we want to index.

The general process is as follows:

  1. Set up a receiver for this webhook. We’ll do this with a Next.js function.
  2. Loop over the webhook payload and for each piece of LayoutData, then make a GraphQL query to get the field data from Experience Edge.
  3. Finally, roll up the field data into a JSON object and push it to our search index.

Let’s start by setting up our webhook. You’ll need to create an Edge administration credential in the XM Cloud Deploy app. Make note of the Client ID and Client Secret. The secret will only be displayed once, so if you lose it you will need to create new credentials.

The next step is to create an auth token, you’ll need this to perform any Experience Edge administration actions. I used the ThunderClient plugin for Visual Studio Code to interact with the Sitecore APIs. To create an auth token, make a post request to https://auth.sitecorecloud.io/oauth/token with the following form data, using the client id and secret you just created in XM Cloud:

You’ll get back a json object containing an access token. This token is needed to be sent along with any API requests to Experience Edge. This token is passed as a Bearer Token in the Auth header. We can test it with a simple GET request that will list all the webhooks in this Edge tenant.

You should get back a json object containing a list of all the webhooks currently set up in your tenant (which is likely none to begin). The auth tokens expire after a day or so. If you get a message like edge.cdn.4:JWT Verification failed in your response, you have a problem with your token and should generate a new one.

Next let’s create our ContentUpdated webhook. You’ll need something to receive the webhook. Since we haven’t created our function in Next.js yet, we can use a testing service like Webhook.site. Create a POST request to https://edge.sitecorecloud.io/api/admin/v1/webhooks with the following body:

The important parameters here are uri and executionMode. The uri is where the webhook will be sent, in this case our testing endpoint at webhook.site. The execution mode OnUpdate indicates this will fire when content is Updated. (Note: There are separate webhooks for create and delete, which you will probably need to set up later following this same pattern.)

Send this request and you’ll get a response that looks like this:

{
"id": "3cc79139-294a-449e-9366-46bc629ffddc",
"tenantId": "myTenantName2157-xmcloudvani7a73-dev-2bda",
"label": "OnUpdate Webhook Sandbox",
"uri": "https://webhook.site/#!/view/d4ebda52-f7d8-4ae6-9ea2-968c40bc7f2f",
"method": "POST",
"headers": {
"x-acme": "ContentUpdated"
},
"body": "",
"createdBy": "CMS",
"created": "2024-04-03T15:42:43.079003+00:00",
"bodyInclude": null,
"executionMode": "OnUpdate"
}

Try your GET request again on https://edge.sitecorecloud.io/api/admin/v1/webhooks, and you should see your webhook returned in the json response.

Try making some content updates and publishing from XM Cloud. Over at webhook.site, wait a few minutes and make sure you’re getting the json payload sent over. If so, you’ve set this up correctly.

To delete this webhook, you can send a DELETE request to https://edge.sitecorecloud.io/api/admin/v1/webhooks/<your-webhook-id>. Make sure you include your auth bearer token!

In the next post, we’ll go over handling this webhook to push content updates into our search index.

Indexing XM Cloud content with Sitecore Search

This is the first post in a series where I’ll go over a basic set up to configure Sitecore Search to crawl and query your XM Cloud content. This post will cover crawling and indexing your content using a Web Crawler and a Sitemap trigger.

To begin, it’s helpful to define some terms used in Sitecore Search. We’ll start with some structural terms.

  1. Entity: This is a document in the search index. Sitecore Search ships with two types of entities, Product and Content. We will be using the Content entity.
  2. Attribute: These are the fields on your entities. Attributes are defined on the entity and make up the schema of your content in the search index, much in the same way you’d use fields on a Sitecore template to define your content schema in the CMS.
  3. Source: Sources are what you use to index content.
  4. Connector: Sources have connectors, which control how the source receives data and how you read it to index content. A connector must be selected when creating a source. Each type of connector serves a different purpose and dictates what options are available to extract your content.
  5. Trigger: Triggers are added to a source to control how the source data is fetched. Each type of trigger behaves differently.
  6. Extractor: You use extractors to pull data from your source and write it to your entity’s attributes. The types of extractors available depend on the connector you’re using.

Before we begin, a few callouts. These apply to the state of the product as of the time of this blog post, so it’s possible future enhancements will mitigate these.

  • You must have the Tech Admin role in Sitecore search. This is a role above the Admin role, and you will probably need to have your organization’s primary contact request it for you from Support as it’s not available to be assigned to users via the UI.
  • When you make a change in Sitecore Search, you must Publish it for it to take effect. It is possible to revert a publish, but only to the previous version and no further. Your version history can be viewed via the UI, but it offers very little detail on what you changed.
  • Make only 1 change at a time, and publish between each change. If you change more than one thing, such as defining 2 or more attributes on an entity, then attempt to save and publish, the tool will error and lose your entity altogether, effectively disabling your instance. You can Revert your changes to the previously published state to resolve the error, but you will lose all your changes.

To start, select Administration from the left menu and then Domain Settings. If you don’t see this option, you need the Tech Admin role. Select Entities, and the popup will show your Content entity. Note the Attributes template. By default it is set to “Attributes Template for Content” and has several defined attributes already. You probably don’t need these, so change it to “Base Attributes Template”. Do this now before making any other changes, because once you start editing the entity you cannot change the template. Save and publish this.

Let’s define our attributes. With the base template, we get three to start: Id, Source, and Document Activeness. Of the three, Id is required. Let’s add some basic information like Title and Url.

Click Attributes in the subnav, then Add Attribute. You’ll be presented with a form. Let’s start by adding a Title field. Fill out the form as follows:

  • Entity: content
  • Display Name: Title
  • Attribute name: title (Note: use all lower case and no spaces for attribute names)
  • Placement: Standard
  • Data Type: String
  • Mark as: Check “Required” and “Return in api response”

Save and publish this. Repeat to create the Url attribute.

We’re making these fields required because we want all documents in the index to have a name and a url. These are the basic pieces of data we need for every document in our index. If a document is crawled and it is missing or otherwise cannot map required field data, the document is not indexed.

We also check “Return in api response” to make the contents of this field available in the search API. We will be checking this for any attribute we want included in the documents returned by the search API.

Next let’s add a taxonomy field called Topics. In Sitecore XMC, the Topic field is a multilist that allows authors to choose multiple Topic items to tag the content with. Add another attribute and define it like this:

  • Entity: content
  • Display Name: Topic
  • Attribute name: topic
  • Placement: Standard
  • Data Type: Array of Strings
  • Mark as: Check “Return in api response”

We’re not making Topic required because not all of our documents will have a topic. We’re choosing Array of Strings as the data type because the Topic may have multiple values. Save and publish again.

In order to facet on Topic, we need to perform another step. Select Feature Configuration from the subnav. You’ll see a lot of options here. If you select API Response, you’ll see all the attributes you added so far (assuming you checked “Return in api response”). Select Facets, then click Add Attribute. Add the Topic attribute here, save, and publish.

Now we have defined a barebones entity with fields for Id, Title, and Url, and included a single facet field on it called Topic. The next step is to set up our Source to crawl our Sitecore XM Cloud content. We’ll be using a Web Crawler connector with a Sitemap trigger to accomplish this.

Before your crawl, you need to make the data available to crawl. First, a sitemap. If you’re using SXA, you can configure a Sitemap to be generated automatically. On your pages, you’ll want to make the data available via meta tags. If you’ve set up open graph meta tags, you’ll have Title and Url covered. You’ll need to add another meta tag for Topics in your head web app like so,
<meta name="topic" content="Topic A">

First we need to create the source. Select Sources from the left menu, then Add Source. In the dialog, name your source Sitemap Crawler and choose “Web Crawler” from the connector dropdown. (We’ll cover Advanced Web Crawler in another post). Save this and open the Source Settings screen.

Scroll down to Web Crawler Settings and click Edit. Set the Trigger Type to Sitemap. Set the URL to the url of your sitemap. You can use the Experience Edge media url to the SXA generated sitemap file if you haven’t configured middleware to handle /sitemap.xml in your head app (which you should!) Set Max Depth to 0, because we don’t want to drill into any links on the page; we’re relying on the Sitemap to surface all our content we need crawled. Save and publish.

Next we’re going to configure our Attribute extractors. On Attribute Extraction, click Edit. On the next screen, click Add Attribute. You can select the attributes to be mapped in the popup. Configure your attribute Extraction Types as Meta Tag. Set the Value to the name of the meta tag on your page, “og:title” for Title, “og:url” for Url, and “topic” for Topic. Again, I recommend doing one at a time and saving and publishing between mapping each extractor in order to avoid errors. If prompted to run the crawler while saving, do not do it yet.

In this Attribute Extractors screen you can click Validate in the top nav. That presents a dialogue that lets you put in a url and text the extractors, which is a great feature. Try pasting in some of your web pages and make sure you’re extracting all the data correctly.

Finally you can return to the Sources screen and click the Recrawl and Reindex button on the right hand side of the listed source you just created. This button looks like a refresh icon of 2 curved arrows in a circle.

It takes Sitecore Search a bit to fire up a crawl job, but you can monitor this from the Analytics screen under Sources. If all went well you should see all your documents from your sitemap in the content entity index. If not, you’ll see that an error occurred here and you can troubleshoot from there.

From here, feel free to add more attributes to fill out your content entity schema, and to put your crawler on a schedule from the Sources -> Crawler Schedule screen.

In the next post, we’ll cover using the API to query content from the index.

.NET not mapping fields with spaces in the headless SDK

TLDR: Don’t put spaces in your field names for Sitecore headless builds.

Quick post about a bug I uncovered today working on an XM Cloud project with a .NET rendering host.

We’re seeing inconsistent behavior when mapping layout service field data to types in .NET. In short, fields with spaces in the name sometimes do not deserialize into .NET types when you attempt to do so with the SDK.

Consider a page with 2 fields: Topic and Content Type. In our code, we have a class that looks like this:

namespace MySite.Models.Foundation.Base
{
  public class PageBase
  {
    [SitecoreComponentField(Name = "Content Type")]
    public ItemLinkField<Enumeration>? ContentType { get; set; }

    [SitecoreComponentField(Name = "Topic")]
    public ContentListField<Enumeration>? Topic { get; set; }
  }
}

When I create a component as a ModelBoundView,
.AddModelBoundView<PageBase>("PageHeader")
the fields map properly, and I get data in ContentType and Topic properties of my model.

When I try to map it from a service, like so:

namespace MySite.Services.SEO
{
  public class MetadataService
  {
    private SitecoreLayoutResponse? _response;

    public PageMetadata GetPageMetadata(SitecoreLayoutResponse? response)
    {
      _response = response;
      PageBase pageBase= new PageBase ();
      var sitecoreData = _response.Content.Sitecore;

      sitecoreData.Route?.TryReadFields<PageBase>(out pageBase);
      return pageMetadata;
    }
  }
}

I get no data in the Content Type field but I do in the Topic field. If I rename Content Type to ContentType, the field data is bound to the ContentType property as expected.

I dug into the code a little bit and it seems that the HandleReadFields method on the Sitecore.LayoutService.Client.Response.Model.FieldsReader is ignoring the property attributes: [SitecoreComponentField(Name = "Content Type")]
Instead it is just using the property name, which of course has no spaces in it because it’s a C# identifier.

Until this bug is corrected, the workaround is to rename your fields to not have spaces in them.