Discussion:
Maruku: a better Markdown interpreter for Ruby.
(too old to reply)
Andrea Censi
2006-12-27 18:11:23 UTC
Permalink
Hello to all!

Maruku is a Markdown interpreter written in Ruby. It is released under the GPL.

Maruku implements the original Markdown syntax and all the
improvements in PHP Markdown Extra. Moreover, it implements some ideas
from MultiMarkdown, and adds a syntax for specifying metadata for
block elements.

Unlike Bluecloth, Maruku creates an in-memory representation of the
document tree, and this makes it very easy to export to other formats.

Out of the box, Maruku offers export to HTML and to Latex/PDF.
There is syntax highlighting for code blocks via the `syntax` library
for HTML, and via the `listings` package for Latex.

Here there are some examples:

http://maruku.rubyforge.org/maruku.md
http://maruku.rubyforge.org/maruku.html
http://maruku.rubyforge.org/maruku.pdf

http://maruku.rubyforge.org/markdown_syntax.md
http://maruku.rubyforge.org/markdown_syntax.html
http://maruku.rubyforge.org/markdown_syntax.pdf

Download instructions at
http://maruku.rubyforge.org/maruku.html#download

In general it should suffice to do:

$ gem install maruku

This is my first time to release a gem, so let me know of any problem
you encounter while installing.

Try it with
$ maruku file.md # converts to html
$ marutex file.md # converts to tex and invokes pdflatex

Any feedback is appreciated.

In particular, what do you think of this (proposed) syntax for adding
meta-data to span-level elements?

http://maruku.rubyforge.org/maruku.html#future


Have a nice day!
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
John Gruber
2006-12-28 18:28:37 UTC
Permalink
Post by Andrea Censi
In particular, what do you think of this (proposed) syntax for adding
meta-data to span-level elements?
http://maruku.rubyforge.org/maruku.html#future
8.1. A syntax for specifying meta-data for span-level elements

Your use of braces here is very likely to conflict with future
extensions to Markdown itself. And the `@`s are very noisy,
visually.


8.2. Comments.

Why not just use HTML comments?


--J.G.
Andrea Censi
2006-12-28 18:57:52 UTC
Permalink
Post by John Gruber
Post by Andrea Censi
In particular, what do you think of this (proposed) syntax for adding
meta-data to span-level elements?
http://maruku.rubyforge.org/maruku.html#future
8.1. A syntax for specifying meta-data for span-level elements
Your use of braces here is very likely to conflict with future
extensions to Markdown itself.
About this, Michel Fortin referred to an old thread (2005) about
meta-data syntax.
I resurrected it and tried to summarize:

http://maruku.rubyforge.org/markdown_extra2.html
http://maruku.rubyforge.org/markdown_extra2.md

Any comment after so long time?
Post by John Gruber
8.2. Comments.
Why not just use HTML comments?
For one thing, if I have to write a one-line comment:

<!-- Please, Mark, have a look at the following list -->

I need no less than 7 characters. Practically every language except
xml allows a
<delimiter> <comment> <newline>
syntax.

For hiding a lot of text, I concede that the <!-- --> syntax is
sufficient, but is really overkill for short comments
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
John Gruber
2006-12-28 20:35:10 UTC
Permalink
Post by Andrea Censi
About this, Michel Fortin referred to an old thread (2005) about
meta-data syntax.
http://maruku.rubyforge.org/markdown_extra2.html
http://maruku.rubyforge.org/markdown_extra2.md
Any comment after so long time?
Yes. Something very similar to this is what I'm planning to add.
Post by Andrea Censi
Post by John Gruber
8.2. Comments.
Why not just use HTML comments?
<!-- Please, Mark, have a look at the following list -->
I need no less than 7 characters. Practically every language except
xml allows a
<delimiter> <comment> <newline>
syntax.
I agree that HTML's comment syntax is verbose. And don't get me
started on the rules regarding `--` within an HTML comment. I'm
not ruling out the idea of a Markdown-specific comment syntax, but
given that Markdown already supports HTML comments (in the same
way it "supports" all HTML tags), I'm very wary of using up any
additional special characters just to make it a bit more
convenient.

-J.G.
Michel Fortin
2006-12-28 21:37:13 UTC
Permalink
Post by Andrea Censi
About this, Michel Fortin referred to an old thread (2005) about
meta-data syntax.
If anyone else wants to revisit it, here's the link... look for the
"Attribute reference" thread on this page:

<http://six.pairlist.net/pipermail/markdown-discuss/2005-January/
thread.html>
Post by Andrea Censi
Post by Andrea Censi
http://maruku.rubyforge.org/markdown_extra2.html
http://maruku.rubyforge.org/markdown_extra2.md
Any comment after so long time?
Yes. Something very similar to this is what I'm planning to add.
That's also something I want to add to PHP Markdown Extra early in
2007. I'm not sure of many details yet however, like on which
elements it'll be supported.


Michel Fortin
***@michelf.com
http://www.michelf.com/
Andrea Censi
2006-12-28 21:47:58 UTC
Permalink
Post by John Gruber
Post by Andrea Censi
http://maruku.rubyforge.org/markdown_extra2.html
http://maruku.rubyforge.org/markdown_extra2.md
Any comment after so long time?
Yes. Something very similar to this is what I'm planning to add.
..which is? (I can't bear the suspense :-) )
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
John Gruber
2006-12-28 21:00:23 UTC
Permalink
Post by Andrea Censi
Maruku implements the original Markdown syntax and all the
improvements in PHP Markdown Extra. Moreover, it implements some ideas
from MultiMarkdown, and adds a syntax for specifying metadata for
block elements.
Maruku looks very nice overall, but I have one comment and one
question.


Comment:

If you're going to support syntax extensions like PHP Markdown
Extra's and MultiMarkdown's, you shouldn't call Maruku a Markdown
interpreter. By default, "Markdown" should mean Markdown. I know
there are a few features that I haven't yet officially added which
almost everyone (including me) wants, especially tables. And I
know it's been a long time since Markdown 1.0 shipped.

But there's a huge benefit to having "Markdown", as a syntax, be
as consistent as possible across implementations. Someone who
knows Markdown shouldn't have to worry about "which" Markdown
implementation they're using.

In Maruku's case, I would prefer to see its default mode translate
only the official Markdown syntax, and to handle its syntax
additions as options.

It's also the case that whatever the features are that will be
added to the official Markdown syntax, there will be many other
features that will not be added. Everyone is welcome to add their
own pet features to their own implementations, but these features
should not be added as though they're part of Markdown itself.

The way Michel Fortin has handled this with PHP Markdown is ideal.


Question:

Have you benchmarked Maruku against BlueCloth? If so, how does it
compare?

-J.G.
Michel Fortin
2006-12-28 21:38:25 UTC
Permalink
Post by John Gruber
The way Michel Fortin has handled this with PHP Markdown is ideal.
But please everyone take note that it can also be pretty cumbersome
at times. I don't want to dissuade anyone from keeping separate
implementations Markdown and added features -- that's certainly a
good idea, and a good learning experience too -- but implementers
must realize that the implementation of some features is really hard
to to achieve without writing two completely separate parsers. For
instance, PHP Markdown Extra's special handling of underscore-
emphasis completely override the corresponding parser function of PHP
Markdown.

In fact, one of the reasons why custom attributes aren't already
supported in PHP Markdown Extra 1.1 is because it's hard to figure
out a way to do it without changing every regular expression of PHP
Markdown and creating two completely different code bases to
maintain. My plan is to do this by adding the necessary hooks in PHP
Markdown's regular expressions, but the point is that it'd be much
simpler to do if the added Extra features were merged into the
regular PHP Markdown parser.


Michel Fortin
***@michelf.com
http://www.michelf.com/
Andrea Censi
2006-12-28 22:46:47 UTC
Permalink
Post by John Gruber
Maruku looks very nice overall, but I have one comment and one
question.
Have you benchmarked Maruku against BlueCloth? If so, how does it
compare?
Just did it.

This is the benchmark I used:

http://rubyforge.org/viewvc/trunk/lib/maruku/tests/benchmark.rb?revision=27&root=maruku

The input data is

http://daringfireball.net/projects/markdown/syntax.text

These are the results on my Powerbook G4 (ruby 1.8.5, gcc 3.3):

BlueCloth (to_html): parsing 0.00 sec + rendering 2.16 sec = 2.17 sec
Maruku (to_html): parsing 1.83 sec + rendering 0.36 sec = 2.20 sec
Maruku (to_latex): parsing 1.87 sec + rendering 0.34 sec = 2.21 sec

These are the results on a Pentium 3 1.8ghz (ruby 1.8.5, gcc 3.4.3):

BlueCloth (to_html): parsing 0.00 sec + rendering 1.38 sec = 1.38 sec
Maruku (to_html): parsing 1.05 sec + rendering 0.25 sec = 1.29 sec
Maruku (to_latex): parsing 1.08 sec + rendering 0.20 sec = 1.28 sec

Considering that
1) for parsing, I am doing something very elegant to write but not the
more efficient thing.
2) for latex rendering, there are things very very inefficient.
3) I'm a novice ruby programmer so there are probably things done very stupidly
I guess that Maruku can be a 20-30% faster than Bluecloth if one
invests time in doing so. But at the moment, that's not my priority.

I was surprised to see it's already comparable in speed. :-D
Probably the magic is in the line-oriented parser: I'm using regexp
only for span-level elements.
Post by John Gruber
If you're going to support syntax extensions like PHP Markdown
Extra's and MultiMarkdown's, you shouldn't call Maruku a Markdown
interpreter. By default, "Markdown" should mean Markdown.
...
Post by John Gruber
The way Michel Fortin has handled this with PHP Markdown is ideal.
This deserves time to answer properly -- and in Italy it's time to bed.
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
Andrea Censi
2006-12-30 11:24:33 UTC
Permalink
Post by John Gruber
If you're going to support syntax extensions like PHP Markdown
Extra's and MultiMarkdown's, you shouldn't call Maruku a Markdown
interpreter. By default, "Markdown" should mean Markdown. I know
there are a few features that I haven't yet officially added which
almost everyone (including me) wants, especially tables. And I
know it's been a long time since Markdown 1.0 shipped.
But there's a huge benefit to having "Markdown", as a syntax, be
as consistent as possible across implementations. Someone who
knows Markdown shouldn't have to worry about "which" Markdown
implementation they're using.
In Maruku's case, I would prefer to see its default mode translate
only the official Markdown syntax, and to handle its syntax
additions as options.
I don't agree. I would agree if I Maruku was called "Markdown.rb":
that way one would expect an output identical to Markdown.pl.

Neither I want Maruku to be 100% compatible - bug by bug - with
Bluecloth (nor I want Markdown.pl's bugs).

So I think that, as long appropriate notices are given in the
documentation, it's OK for Maruku to act like it does. I'll clarify
all these aspects in Maruku's documentation. It's OK for me to change
the tag-line "Maruku: a Markdown interpreter" to "Maruku: a
Markdown-superset interpreter" or something similar (by any other
name, it would work as sweet).

As Michel said, it is quite problematic to have your implementation
disable some features in a configurable way, and I think that anyway
it's not worth it. A user can write his Markdown 1.0 document and
Maruku will give her just the output she expects: advanced features
are hard to trigger unintentionally .

Again, I understand you concern that all Markdown implementations
should give the same results. But at the moment, and I think that is
not only my opinion, the Markdown specification has a lot of holes. So
even if wanted, I could not make Maruku "Markdown 1.0 compatible".

I promise one thing: if Markdown2 will be released with a decent set
of features (tables, footnotes, definition lists, metadata) and a
clear documentation (if it's not a formal grammar, at least an
unambiguous specification), then Maruku will be Markdown2-canonical as
its primary mode of operation.
Post by John Gruber
But there's a huge benefit to having "Markdown", as a syntax, be
as consistent as possible across implementations.
I would just say:
"there's a huge benefit to having Markdown as a syntax"
:-)
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
John Gruber
2006-12-28 22:15:00 UTC
Permalink
Post by Andrea Censi
Post by John Gruber
Yes. Something very similar to this is what I'm planning to add.
..which is? (I can't bear the suspense :-) )
I'm not holding back. What I mean is that I'm not certain.

I can't think of anything to complain about in the proposal from
2005, except that I don't like this:

Blah blah blah blah this is a
paragraph of text.

{#id_ref}


My gut feeling is that it'd have to be like this, with no blank
line:

Blah blah blah blah this is a
paragraph of text.
{#id_ref}


But as for the rest, I'd have to try it and live with it for a
while before declaring it finalized. Markdown's original syntax
changed a lot from my original concept after I tried it out in the
real world.

So maybe the actual syntax will be exactly like this proposal. But
I can't say for sure until after it's implemented and I see how it
works.

-J.G.
Andrea Censi
2006-12-28 22:34:06 UTC
Permalink
Post by John Gruber
I can't think of anything to complain about in the proposal from
Blah blah blah blah this is a
paragraph of text.
{#id_ref}
My gut feeling is that it'd have to be like this, with no blank
Blah blah blah blah this is a
paragraph of text.
{#id_ref}
Okay. I guess that's easier for Maruku to use either one as I'm using
a line-oriented parser.
Post by John Gruber
But as for the rest, I'd have to try it and live with it for a
while before declaring it finalized. Markdown's original syntax
changed a lot from my original concept after I tried it out in the
real world.
So maybe the actual syntax will be exactly like this proposal. But
I can't say for sure until after it's implemented and I see how it
works.
That's good to hear that. Also Michel wants to implement something
very similar for its next PHP Markdown.

I guess that the 2005 proposal is as defined it can be before any
implementation exists.
In the next days, I'll try to refine that proposal and then implement
it to see how it works.

Michel told me he plans his next release in the month of January, so
we can hear his experience too in a short time.
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
Jan Erik Moström
2006-12-28 23:15:25 UTC
Permalink
Post by Michel Fortin
In fact, one of the reasons why custom attributes aren't
already supported in PHP Markdown Extra 1.1 is because it's
hard to figure out a way to do it without changing every
regular expression of PHP Markdown and creating two completely
different code bases to maintain. My plan is to do this by
adding the necessary hooks in PHP Markdown's regular
expressions, but the point is that it'd be much simpler to do
if the added Extra features were merged into the regular PHP
Markdown parser.
A question: am I correct when I guess that all implementations
use regexp to parse the text or is there some implementation
that do "real" parsing?

jem
Jacob Rus
2006-12-29 01:18:44 UTC
Permalink
Post by Jan Erik Moström
A question: am I correct when I guess that all implementations
use regexp to parse the text
Yes
Post by Jan Erik Moström
or is there some implementation
that do "real" parsing?
No
John MacFarlane
2006-12-29 07:30:10 UTC
Permalink
Post by Jan Erik Moström
A question: am I correct when I guess that all implementations
use regexp to parse the text
Yes
Post by Jan Erik Moström
or is there some implementation
that do "real" parsing?
No
Yes. Pandoc <http://sophos.berkeley.edu/macfarlane/pandoc> does "real"
parsing, using Haskell's Parsec library of parser combinators. I suppose
one might quibble about whether it counts as an "implementation"
of markdown; it comes close to full compatibility, but differs from
standard markdown on a few edge cases (described in the documentation),
and also provides a few syntax extensions. With some effort, I suppose,
a "full compatibility" mode could be added.

John
Jan Erik Moström
2006-12-29 12:36:08 UTC
Permalink
Post by John Gruber
I agree that HTML's comment syntax is verbose. And don't get me
started on the rules regarding `--` within an HTML comment. I'm
not ruling out the idea of a Markdown-specific comment syntax, but
given that Markdown already supports HTML comments (in the same
way it "supports" all HTML tags), I'm very wary of using up any
additional special characters just to make it a bit more
convenient.
While I can live with the HTML style it would be really nice to
have something else
Bob Hutchison
2006-12-29 17:04:53 UTC
Permalink
Post by Jan Erik Moström
Post by John Gruber
I agree that HTML's comment syntax is verbose. And don't get me
started on the rules regarding `--` within an HTML comment. I'm
not ruling out the idea of a Markdown-specific comment syntax, but
given that Markdown already supports HTML comments (in the same
way it "supports" all HTML tags), I'm very wary of using up any
additional special characters just to make it a bit more
convenient.
While I can live with the HTML style it would be really nice to
have something else
Especially something that didn't pass the comment through to the
generated HTML.

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
John Gruber
2006-12-29 22:35:38 UTC
Permalink
Post by Andrea Censi
I was surprised to see it's already comparable in speed. :-D
Probably the magic is in the line-oriented parser: I'm using regexp
only for span-level elements.
Yeah, I'll bet yours makes fewer copies of the entire source
input. With Ruby in particular, that's probably fairly expensive.
Good to see that the benchmark looks good.

-J.G.
Jan Erik Moström
2006-12-30 15:01:51 UTC
Permalink
Post by Andrea Censi
Again, I understand you concern that all Markdown implementations
should give the same results. But at the moment, and I think that is
not only my opinion, the Markdown specification has a lot of holes. So
even if wanted, I could not make Maruku "Markdown 1.0 compatible".
I don't think it's a matter of copying the bugs it's a matter
what the actual commands are. I already see a number of
variations of Markdown (supersets) and I personally think this
is "dangerous" ... if it's Markdown encoded text it should be
rendered correctly (bugs are as I say a different thing) by all
Markdown implementations.
Post by Andrea Censi
I promise one thing: if Markdown2 will be released with a decent set
of features (tables, footnotes, definition lists, metadata) and a
clear documentation (if it's not a formal grammar, at least an
unambiguous specification), then Maruku will be Markdown2-canonical as
its primary mode of operation.
I think there is one very important thing here to consider,
readability of the source. One of the reasons I like Markdown is
that it's (in my opinion) more readable than what for example
reStructuredText <http://docutils.sourceforge.net/rst.html> use.
I want to be able to take the "source" and paste it in an email
message and it should be perfectly readable to my dad when he
gets it ... and the same text should be able to produce a HTML
page (and LaTeX if I could wish).

If the penalty for this is that I have to forget about certain
markup abilities then OK ... I can live with that. I would of
course like to have both readable source and a "full fledged"
markup language but the first is much more important for me ...
otherwise I would have used reStructuredText.

jem
A. Pagaltzis
2006-12-30 15:30:08 UTC
Permalink
Post by Jan Erik Moström
Post by Andrea Censi
Again, I understand you concern that all Markdown
implementations should give the same results. But at the
moment, and I think that is not only my opinion, the Markdown
specification has a lot of holes. So even if wanted, I could
not make Maruku "Markdown 1.0 compatible".
I don't think it's a matter of copying the bugs it's a matter
what the actual commands are. I already see a number of
variations of Markdown (supersets) and I personally think this
is "dangerous" ... if it's Markdown encoded text it should be
rendered correctly (bugs are as I say a different thing) by all
Markdown implementations.
The problem is that there is effectively no spec. There is a
format documentation, but there are many cases it does not
specify precisely (or at all). In that case, only the behaviour
of Markdown.pl can be of guidance; but it is not clear which
behaviours are correct, which are incidental, and which are bugs;
precisely because no thorough spec exists.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
Jan Erik Moström
2006-12-30 15:36:36 UTC
Permalink
Post by A. Pagaltzis
The problem is that there is effectively no spec. There is a
format documentation, but there are many cases it does not
specify precisely (or at all). In that case, only the behaviour
of Markdown.pl can be of guidance; but it is not clear which
behaviours are correct, which are incidental, and which are bugs;
precisely because no thorough spec exists.
I agree about the lack of a formal definition (this was
basically the reason I asked yesterday about formatters doing
"real parsing") but in my opinion this is not the main problem,
the main problem is that there are different supersets of markdown.

jem
A. Pagaltzis
2006-12-30 16:17:53 UTC
Permalink
Post by Jan Erik Moström
Post by A. Pagaltzis
The problem is that there is effectively no spec. There is a
format documentation, but there are many cases it does not
specify precisely (or at all). In that case, only the
behaviour of Markdown.pl can be of guidance; but it is not
clear which behaviours are correct, which are incidental, and
which are bugs; precisely because no thorough spec exists.
I agree about the lack of a formal definition (this was
basically the reason I asked yesterday about formatters doing
"real parsing") but in my opinion this is not the main problem,
the main problem is that there are different supersets of
markdown.
That doesn’t seem problematic to me. If you write documents that
use features that are not part of Markdown, and the formatter
documentation makes it clear that the features you are using are
not part of Markdown, then you know what you are getting yourself
into. I don’t see what’s bad about that.

But as matters stand, you can write documents that conform to the
format documentation and use no other features, and yet find that
their interpretation varies among the different implementations
of Markdown. This means your documents might be unwittingly tied
to the specific implementation of Markdown you are using, even
when they do not use any non-Markdown features. That seems
actually problematic to me.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
Jan Erik Moström
2006-12-31 00:46:41 UTC
Permalink
That doesn't seem problematic to me. If you write documents that
use features that are not part of Markdown, and the formatter
documentation makes it clear that the features you are using are
not part of Markdown, then you know what you are getting yourself
into. I don't see what's bad about that.
Yes, and as I understand this is what John Gruber suggested.
Create/base any kind of markup on Markdown but only call it
Markdown if it's Markdown ... otherwise people are going to
confuse different versions of Markdown and complain about
implementations that doesn't work.

Let's not make mistake of having different versions of the same
language (remember for example early version of LaTeX and C).

jem
John Gruber
2006-12-31 04:33:10 UTC
Permalink
Post by Bob Hutchison
Post by Jan Erik Moström
While I can live with the HTML style it would be really nice to
have something else
Especially something that didn't pass the comment through to the
generated HTML.
Right. That would clearly be the biggest reason to add a
Markdown-specific comment syntax.

-J.G.

Loading...