code of the Ninja();

// in search of swift, efficient, and invisible code

2016-07-14

Code Formatting and Highlighting on Webpages

Tested inChrome, Firefox
Required softwarePrism

The Problem

Writing a blog about programming means talking about code - a lot. Up to and including whole blocks of example code.

This can actually be quite the nuisance, for three reasons.

1: Formatting

Code is usually best presented in a monospace font with whitespace preserved, both for readability and because of the importance of indentation.

Fortunately, HTML can solve this problem pretty easily out of the box. Using <pre> tags can turn this congested mess

if (foo) { var x = bar(); if (x == 42) { print("found the answer"); } } else { print("Hello World"); }

into this

if (foo)
{
    var x = bar();
    if (x == 42)
    {
        print("found the answer");
    }
}
else
{
    print("Hello World");
}

without the need to litter the source code with non-breaking spaces and line break tags.

The <pre> tag only gets you so far, though.

2: Escaping

HTML is pretty bad at talking about itself.

If I want to talk about an HTML tag, as in this very post with <pre>, the reader has seen it wrapped neatly in angled brackets, but I've actually been typing &lt;pre&gt;.

(And in order to make that last sentence, I had to actually type &amp;lt;pre&amp;gt;... and it's just going to keep getting worse if I continue down this rabbit hole.)

And it's not just when talking about HTML. The characters that HTML needs escaped are common in most programming languages, so any code that goes into a post has to be run through an escaping procedure.

This is tedious, as you can imagine.

3: Highlighting

Finally, for optimum readability, code really should be highlighted.

When keywords, numbers, and identifiers are all in colours that make them pop, it sure beats staring at a homogeneous grey soup of letters and numbers.

Doing this in pure HTML would mean manually wrapping these words in spans with styling information, which suddenly makes escaping angled brackets and ampersands look like a fun time. No one wants to have to write <span style="color: darkblue; font-weight: bold;"> (or whatever) each time they write a keyword like "if".

The Solution

When I first started writing this blog, I'm embarrassed to admit that I did code escaping and highlighting manually (when I bothered to do highlighting at all). I didn't want to hunt for a more complex solution, even if it did mean doing a bunch of grunt work.

Now that I'm getting back into blogging here again, though, there's no way I'm going to go back to that.

So now I'm using Prism, and I'll explain how I set it up for my blog.

About Prism

First of all, what is Prism? I'll let their website do the explaining.

Prism is a lightweight, extensible syntax highlighter, built with modern web standards in mind. It’s used in thousands of websites, including some of those you visit daily. prismjs.com

Prism consists of a JavaScript file and a CSS file you can include in your webpage, and works by the following method:

  • The Prism script hunts through the page, looking for <code> elements.
  • For each that it finds, it checks what language that code should be. It can't decide this by magic; you have to put a special class on the element yourself, e.g. class="language-javascript". (Alternatively, you can put that class on an ancestor of the element, so for example you could put it on the <body> element if your entire article was about the same programming language. In the latter case, individual <code> elements can still override the language if you put a class on them.)
  • Once the language is determined, the text contents of the <code> element are parsed, and any words that should be highlighted are automatically wrapped in <span> elements and given classes that correspond to their type, e.g. "keyword" or "variable".
  • And it's done; if you've included the Prism CSS file (or provided your own), the <span> elements will be styled based on their class and you'll have your highlighting.

Downloading Prism

If you want your webpage to load lickety-split, you'll want to keep the number of requests to the server and the amount of data to download at a minimum. Prism knows this, so instead of providing a one-size-fits-all download, they've done something a little more complicated.

On their download page, they let you select just which themes, languages, and plugins you want. Dynamic previews of the JavaScript and CSS files are generated based on your selection. Once you're happy with your build, you can download the files from the buttons at the bottom of the page.

For my purposes, I only selected the languages I'd be talking about on this blog. Things like Lua, JavaScript, AutoHotkey, and so on. They don't already have support for things like GML or the assembly languages used in Sega hacking, but it's easy for a user to add these languages themselves. I'll post about that in the future on Code of the Ninja.

As for plugins, I recommend at least Normalize Whitespace, which can trim those pesky extra line feeds from around code blocks inside <pre> tags.

Including Prism on Blogger

While it's technically possible to include Prism by putting the following code at the start of a post,

I decided to put it in my template because I'll be talking about code in every single post, and there's no reason to go to the extra trouble of pasting that into each one individually.

I went to the control panel of my blog, and then to Template → Edit HTML, and posted the following code at the end of the <head> element:

(The differences in syntax to the version above are to keep it consistent with the Blogger template.)

So now I had Prism working across my entire blog. That's great!

But this hasn't solved all three of the problems I enumerated at the start of this post. I can use <pre> tags to format, and Prism to highlight, but escaping code is still an issue.

Writing Unescaped Code

Prism actually has an Unescaped Markup plugin right there on the download page, designed exactly for this purpose.

It's pretty clever, letting you use either HTML comments or <script> tags to write unescaped markup, since HTML won't bother to interpret the text inside those.

The plugin has some limitations, though. First, it's only for markup languages such as HTML, and doesn't bother to add this functionality on any of the other language classes. Second, it only uses the comment trick for code in <pre> tags.

I wanted to be able to write unescaped code in any language, so I needed a plugin that treated all <code> elements equally. Therefore I decided not to include the Unescaped Markup plugin in my Prism download, and instead heavily edited it for my own purposes and included it as a second script, like so.

I decided that I only cared for the HTML comment method for achieving unescaped code; using <script> tags didn't sit right with me because I felt that all code should be in the same kind of tag.

So here's my edited version of the plugin. It's just a stripped down version of Unescaped Markup plugin that only does the comment method, but for every <code> element it finds, rather than just the ones inside <pre> tags.

Of course this does mean that any code I write about that contains HTML comments or something that looks like them will still cause trouble, but at least over 99% of cases are taken care of which is good enough for me for now.

Conclusion

So if you didn't know about Prism before I hope it helps out. It sure spruced things up around here.

I'm currently making some custom language plugins so that I can write about Sega Genesis hacking here at Code of the Ninja, and you can be sure I'll give the details on that process as well.

Until next time, happy coding!

No comments:

Post a Comment