The Trouble with Excerpts

Excerpts provide truncated previews of content with a “read more” link instead of displaying the full-length content. They’re staples of web design—especially for WordPress sites—and many page designs simply don’t work without them.

So when we started hearing from customers that our handling of excerpts was not satisfactory, we knew it was a meaningful issue and set aside some time to get to the bottom of things.

Here’s what we’ve learned.

The Problem

We use excerpts in many of the “views” in The Events Calendar and Events Calendar Pro. (Check out our demo site, WP Shindig, to see them in action.) The problem that folks encounter and reach out about is that, in many of these excerpt-containing views, The Events Calendar often strips HTML from the excerpts.

This means, for example, that if your content excerpt has bold text, italic text, links, or even basic HTML paragraph tags, all of that HTML would be removed. In something like the single event view, you might see that all of this sort of carefully-written HTML works as desired, and your content looks great and rich. But when you see the event in Events Calendar Pro’s “Photo View”, meanwhile, or in an event category archive, you just see text with no styling; no bold text, no links, no HTML at all.

A Catch-22

Event excerpts are stripped of HTML in some cases because of this one seemingly-innocuous line of code in The Events Calendar’s general.php template tag file:

$excerpt = wp_trim_words( $excerpt, $excerpt_length, $excerpt_more );

That function, wp_trim_words(), is what is used to, well, trim words. This is required for excerpts to work. It’s the function that takes your post content—which could be thousands of words—and trims it to a reasonable size for use as an excerpt. (By default, that “reasonable size” is 55 characters.)

But it’s this vital function, wp_trim_words(), that strips all HTML.

This is not just the fault of wp_trim_words() specifically, though—to reliably trim content to a character length or word count, you must strip HTML for two main reasons:

  1. If you don’t strip HTML, then the presence of those HTML tags in the content makes it more difficult to trim the content. Should the HTML tags be counted as words, or their characters counted towards the character limit? If so, then if some excerpts have a lot of HTML while others don’t, this could create much inconsistency in the length of excerpts.
  2. What if the word count or character limit is reached in the middle of an HTML tag? In other words, the content would be trimmed before an opened HTML tag is closed, so the excerpt will just have a dangling, unclosed HTML tag. At best, this would make the page’s HTML invalid, but wouldn’t much hinder the actual rendering of the page. But an unclosed HTML tag can often be enough of a breakage in the HTML to ruin the output of the whole page.

So it’s a bit of a catch-22: trim content to make excerpts, which requires the stripping of HTML; or preserve HTML, but thus make it impossible to trim the content.

Solutions

Our developers are interested in reworking The Events Calendar’s excerpt function and its use of wp_trim_words(), but it’s a challenge to implement an alternative because of the reasons mentioned above.

There are some possible workarounds, the ideas for which are so far circling around the use of PHP libraries that find HTML tags in text nodes. With these libraries, we could theoretically let The Events Calendar’s excerpt function patch any mid-HTML-tag breakages on the fly. But these libraries mostly require PHP versions that are too high for general WordPress support, and so cannot be used in our plugins at this time.

The only solution for now is to use “manual” excerpts if you need to preserve HTML.

“Manual” excerpts can be created in the separate Excerpts metabox in post edit screens. If you use this metabox to deliberately create your own excerpts, then the wp_trim_words() function is skipped, and the content’s HTML is preserved.

An Example

First, let’s take a look at an excerpt that is trimmed via wp_trim_words(). As you can see, this is a pretty flavorless bit of content:

An excerpt with its HTML Removed

An excerpt with its HTML stripped because of the wp_trim_words() function.

If you wanted this excerpt to have some HTML, like bold text and links for example, then all you’d have to do is make an excerpt in the “Excerpt” metabox. Here is a screenshot of that metabox, with an HTML-rich excerpt being composed within it:

An HTML-rich excerpt

An example of using the Excerpts metabox to make an HTML-rich excerpt.

Now the excerpt on the front end of the site will have all of that HTML intact, as shown in the following screenshot:

An Excerpt with Intact HTML

An excerpt with its HTML saved from the wp_trim_words() function and left intact.