CommonMark: A Formal Specification For Markdown

About The Author

Adebiyi Adedotun Lukman is a UI/Frontend Engineer based in Lagos, Nigeria who also happens to love UI/UX Design for the love of great software products. When … More about Adebiyi ↬

Email Newsletter

Weekly tips on front-end & UX.
Trusted by 200,000+ folks.

Markdown is a powerful markup language that allows editing and formatting in plain text format that can then be parsed and rendered as HTML. It has a declarative syntax that is both powerful and easy to learn for technical and non-technical folks. However, due to the consequential ambiguities in its original specification, there have been a number of distinct flavors (or custom versions) that aim to erase those ambiguities as well as extend the original syntax support. This has led to a steep divergence from what can be parsed and what is rendered. CommonMark aims to provide a standardized specification of Markdown that reflects its real-world usage.

CommonMark is a rationalized version of Markdown syntax with a spec whose goal is to remove the ambiguities and inconsistency surrounding the original Markdown specification. It offers a standardized specification that defines the common syntax of the language along with a suite of comprehensive tests to validate Markdown implementations against this specification.

GitHub uses Markdown as the markup language for its user content.

“CommonMark is an ambitious project to formally specify the Markdown syntax used by many websites on the internet in a way that reflects its real-world usage [...] It allows people to continue using Markdown the same way they always have while offering developers a comprehensive specification and reference implementations to interoperate and display Markdown in a consistent way between platforms.”

— “A Formal Spec For GitHub Flavored Markdown,” The GitHub Blog

In 2012, GitHub proceeded to create its own flavor of Markdown — GitHub Flavored Markdown (GFM) — to combat the lack of Markdown standardization, and extend the syntax to its needs. GFM was built on top of Sundown, a parser specifically built by GitHub to solve some of the shortcomings of the existing Markdown parsers at the time. Five years after, in 2017, it announced the deprecation of Sundown in favor of CommonMark parsing and rendering library, cmark in A formal spec for GitHub Flavored Markdown.

In the Common Questions section of Markdown and Visual Studio Code, it is documented that Markdown in VSCode targets the CommonMark Markdown specification using the markdown-it library, which in itself follows the CommonMark specification.

CommonMark has been widely adopted and implemented (see the List of CommonMark Implementations) for use in different languages like C (e.g cmark), C# (e.g CommonMark.NET), JavaScript (e.g markdown-it) etc. This is good news as developers and authors are gradually moving to a new frontier of been able to use Markdown with a consistent syntax, and a standardized specification.

A Short Note On Markdown Parsers

Markdown parsers are at the heart of converting Markdown text into HTML, directly or indirectly.

Parsers like cmark and commonmark.js do not convert Markdown to HTML directly, instead, they convert it to an Abstract Syntax Tree (AST), and then render the AST as HTML, making the process more granular and subject to manipulation. In between parsing — to AST — and rendering — to HTML — for example, the Markdown text could be extended.

CommonMark’s Markdown Syntax Support

Projects or platforms that already implement the CommonMark specification as the baseline of their specific flavor are often superset of the strict subset of the CommonMark Markdown specification. For the most part of it, CommonMark has mitigated a lot of ambiguities by building a spec that is built to be built on. GFM is a prime example, while it supports every CommonMark syntax, it also extends it to suits its usage.

CommonMark’s syntax support can be limited at first, for example, it has no support for this table syntax, but it is important to know that this is by design as this comment in this thread of conversation reveals: that the supported syntax is strict and said to be the core syntax of the language itself — the same specified by its creator, John Gruber in Markdown: Syntax.

At the time of writing, here are a number of supported syntax:

  1. Paragraphs and Line Breaks,
  2. Headers,
  3. Emphasis and Strong Emphasis,
  4. Horizontal Rules,
  5. Lists,
  6. Links,
  7. Images,
  8. Blockquotes,
  9. Code,
  10. Code Blocks.

To follow along with the examples, it is advised that you use the commonmark.js dingus editor to try out the syntax and get the rendered Preview, generated HTML, and AST.

Paragraphs And Line Breaks

In Markdown, paragraphs are continuous lines of text separated by at least a blank line.

The following rules define a paragraph:

  1. Markdown paragraphs are rendered in HTML as the Paragraph element, <p>.
  2. Different paragraphs are separated with one or more blank lines between them.
  3. For a line break, a paragraph should be post-fixed with two blank spaces (or its tab equivalent), or a backslash (\).
SyntaxRendered HTML
This is a line of text<p>This is a line of text</p>
This is a line of text
And another line of text
And another but the
same paragraph
<p>This is a line of text
And another line of text
And another but the
same paragraph</p>
This is a paragraph

And another paragraph

And another
<p>This is a paragraph</p>
<p>And another paragraph</p>
<p>And another</p>
Two spaces after a line of text
Or a post-fixed backslash\
Both means a line break
<p>Two spaces after a line of text<br /><br>Or a post-fixed backslash<br /><br>Both means a line break</p>

Headings

Headings in Markdown represents one of the HTML Heading elements. There are two ways to define headings:

  1. ATX heading.
  2. Setext heading.

The following rules define ATX headings:

  1. Heading level 1 (h1), through to heading level 6, (h6) are supported.
  2. Atx-style headings are prefixed with the hash (#) symbol.
  3. There needs to be at least a blank space separating the text and the hash (#) symbol.
  4. The count of hashes is equivalent to the cardinal number of the heading. One hash is h1, two hashes, h2, 6 hashes, h6.
  5. It is also possible to append an arbitrary number of hash symbol(s) to headings, although this doesn’t cause any effect (i.e. # Heading 1 #)
SyntaxRendered HTML
# Heading 1<h1>Heading 1</h1>
## Heading 2<h2>Heading 2</h2>
### Heading 3<h3>Heading 3</h3>
#### Heading 4<h4>Heading 4</h4>
##### Heading 5<h5>Heading 5</h5>
###### Heading 6<h6>Heading 6</h6>
## Heading 2 ##<h2>Heading 2</h2>

The following rules define Setext headings:

  1. Only Heading level 1 (h1), and heading level 2, (h2) are supported.
  2. Setext-style definition is done with the equals (=) and dash symbols respectively.
  3. With Setext, at least one equal or dash symbol is required.
SyntaxRendered HTML
Heading 1
=
<h1>Heading 1</h1>
Heading 2
-
<h2>Heading 2</h2>

Emphasis And Strong Emphasis

Emphasis in Markdown can either be italics or bold (strong emphasis).

The following rules define emphasis:

  1. Ordinary and strong emphasis are rendered in HTML as the Emphasis, <em>, and Strong, <strong> element, respectively.
  2. A text bounded by a single asterisk (*) or underscore (_ ) will be an emphasis.
  3. A text bounded by double asterisks or underscore will be a strong emphasis.
  4. The bounding symbols (asterisks or underscore) must match.
  5. There must be no space between the symbols and the enclosed text.
SyntaxRendered HTML
_Italic_<em>Italic</em>
*Italic*<em>Italic</em>
__Bold__<strong>Bold</strong>
**Bold**<strong>Bold</strong>

Horizontal Rule

A Horizontal rule, <hr/> is created with three or more asterisks (*), hyphens (-), or underscores (_), on a new line. The symbols are separated by any number of spaces, or not at all.

SyntaxRendered HTML
***<hr />
* * *<hr />
---<hr />
- - -<hr />
___<hr />
_ _ _<hr />

Lists

Lists in Markdown are either a bullet (unordered) list or an ordered list.

The following rules define a list:

  1. Bullet lists are rendered in HTML as the Unordered list element, <ul>.
  2. Ordered lists are rendered in HTML as the Ordered list element, <ol>.
  3. Bullet lists use asterisks, pluses, and hyphens as markers.
  4. Ordered lists use numbers followed by periods or closing parenthesis.
  5. The markers must be consistent (you must only use the marker you begin with for the rest of the list items definition).
SyntaxRendered HTML
* one
* two
* three
<ul>
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
+ one
+ two
+ three
<ul>
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
- one
- two
- three
<ul>
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
- one
- two
+ three
<ul>
<li>one</li>
<li>two</li>
</ul>
<ul>
<li>three</li>
</ul>
1. one
2. two
3. three
<ol>
<li>one</li>
<li>two</li>
<li>three</li>
</ol>
3. three
4. four
5. five
<ol start="3">
<li>three</li>
<li>four</li>
<li>five</li>
</ol>
1. one
2. two
3. three
<ol>
<li>one</li>
<li>two</li>
<li>three</li>
</ol>

Links are supported with the inline and reference format.

The following rules define a link:

  1. Links are rendered as the HTML Anchor element, <a>.
  2. The inline format has the syntax: [value](URL "optional-title") with no space between the brackets.
  3. The reference format has the syntax: [value][id] for the reference, and [id]: href "optional-title" for the hyperlink label, separated with at least a line.
  4. The id is the Definition Identifier and may consist of letters, numbers, spaces, and punctuation.
  5. Definition Identifiers are not case sensitive.
  6. There is also support for Automatic Links, where the URL is bounded by the less than (<) and greater than (>) symbol, and displayed literally.
<!--Markdown-->
[Google](https://google.com “Google”)
<!--Rendered HTML-->
<a href="https://google.com" title="Google">Google</a>

<!--Markdown-->
[Google](https://google.com)
<!--Rendered HTML-->
<a href="https://google.com">Google</a>

<!--Markdown-->
[Comparing Styling Methods in Next.js](/2020/09/comparing-styling-methods-next-js)
<!--Rendered HTML-->
<a href="/2020/09/comparing-styling-methods-next-js">Comparing Styling Methods In Next.js</a>

<!--Markdown-->
[Google][id]
<!--At least a line must be in-between-->

<!--Rendered HTML-->
<a href="https://google.com" title="Google">Google</a>

<!--Markdown-->
<https://google.com>
<!--Rendered HTML-->
<a href="https://google.com">google.com</a>

<!--Markdown-->
<mark@google.com>
<!--Rendered HTML-->
<a href="mailto:mark@google.com">mark@google.com</a>

Images

Images in Markdown follows the inline and reference formats for Links.

The following rules define images:

  1. Images are rendered as the HTML image element, <img>.
  2. The inline format has the syntax: ![alt text](image-url "optional-title").
  3. The reference format has the syntax: ![alt text][id] for the reference, and [id]: image-url "optional-title" for the image label. Both should be separated by at least a blank line.
  4. The image title is optional, and the image-url can be relative.
<!--Markdown-->
![alt text](image-url "optional-title")
<!--Rendered HTML-->
<img src="image-url" alt="alt text" title="optional-title" />

<!--Markdown-->
![alt text][id]
<!--At least a line must be in-between-->
<!--Markdown-->

<!--Rendered HTML-->
<img src="image-url" alt="alt text" title="optional-title" />

Blockquotes

The HTML Block Quotation element, <blockquote>, can be created by prefixing a new line with the greater than symbol (>).

<!--Markdown-->
> This is a blockquote element
> You can start every new line
> with the greater than symbol.
> That gives you greater control
> over what will be rendered.

<!--Rendered HTML-->
<blockquote>
<p>This is a blockquote element
You can start every new line
with the greater than symbol.
That gives you greater control
over what will be rendered.</p>
</blockquote>

Blockquotes can be nested:

<!--Markdown-->
> Blockquote with a paragraph
>> And another paragraph
>>> And another

<!--Rendered HTML-->
<blockquote>
<p>Blockquote with a paragraph</p>
<blockquote>
<p>And another paragraph</p>
<blockquote>
<p>And another</p>
</blockquote>
</blockquote>
</blockquote>

They can also contain other Markdown elements, like headers, code, list items, and so on.

<!--Markdown-->
> Blockquote with a paragraph
> # Heading 1
> Heading 2
> -
> 1. One
> 2. Two

<!--Rendered HTML-->
<blockquote>
<p>Blockquote with a paragraph</p>
<h1>Heading 1</h1>
<h2>Heading 2</h2>
<ol>
<li>One</li>
<li>Two</li>
</ol>
</blockquote>

Code

The HTML Inline Code element, <code>, is also supported. To create one, delimit the text with back-ticks (`), or double back-ticks if there needs to be a literal back-tick in the enclosing text.

<!--Markdown-->
`inline code snippet`
<!--Rendered HTML-->
<code>inline code snippet</code>

<!--Markdown-->
`<button type='button'>Click Me</button>`
<!--Rendered HTML-->
<code><button type='button'>Click Me</button></code>

<!--Markdown-->
`` There's an inline back-tick (`). ``
<!--Rendered HTML-->
<code>There's an inline back-tick (`).</code>

Code Blocks

The HTML Preformatted Text element, <pre>, is also supported. This can be done with at least three and an equal number of bounding back-ticks (`), or tildes (~) — normally referred to as a code-fence, or a new line starting indentation of at least 4 spaces.

<!--Markdown-->
```
const dedupe = (array) => [...new Set(array)];
```
<!--Rendered HTML-->
<pre><code>const dedupe = (array) => [...new Set(array)];</code></pre>

<!--Markdown-->
    const dedupe = (array) => [...new Set(array)];
<!--Rendered HTML-->
<pre><code>const dedupe = (array) => [...new Set(array)];</code></pre>

Using Inline HTML

According to John Grubers original spec note on inline HTML, any markup that is not covered by Markdown’s syntax, you simply use HTML itself, with The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces.

However, unless you are probably one of the people behind CommonMark itself, or thereabout, you most likely will be writing Markdown with a flavor that is already extended to handle a large number of syntax not currently supported by CommonMark.

Going Forward

CommonMark is a constant work in progress with its spec last updated on April 6, 2019. There are a number of popular applications supporting it in the pool of Markdown tools. With the awareness of CommonMark’s effort towards standardization, I think it is sufficient to conclude that in Markdown’s simplicity, is a lot of work going on behind the scenes and that it is a good thing for the CommonMark effort that the formal specification of GitHub Flavored Markdown is based on the specification.

The move towards the CommonMark standardization effort does not prevent the creation of flavors to extend its supported syntax, and as CommonMark gears up for release 1.0 with issues that must be resolved, there are some interesting resources about the continuous effort that you can use for your perusal.

Resources

Smashing Editorial (ks, ra, yk, il)