Discussion:
Markdown generates invalid html for a list immediately followed by a quote
(too old to reply)
Matt Kraai
2007-05-23 03:48:05 UTC
Permalink
Howdy,

[Please preserve the CC to 424919-***@bugs.debian.org on any
replies.]

The following bug in Markdown was reported to the Debian bug tracking
system. In short, running both the released version of Markdown and
the latest beta on

* foo
bar
baz
produces invalid HTML.

----- Forwarded message from Joey Hess <***@debian.org> -----

From: Joey Hess <***@debian.org>
To: Debian Bug Tracking System <***@bugs.debian.org>
Subject: Bug#424919: generates invalid html for a list element immediately followed by a quote
Date: Thu, 17 May 2007 16:07:35 -0400
X-Spam-Status: No, hits=-8.5 required=4.0 tests=BAYES_00,FROMDEVELOPER,
HAS_PACKAGE,HTML_MESSAGE,RCVD_IN_SORBS autolearn=ham
version=2.60-bugs.debian.org_2005_01_02

Package: markdown
Version: 1.0.1-6
Severity: normal

***@kodama:~>cat foo
* foo
bar
baz
***@kodama:~>markdown foo
<p><ul>
<li>foo</p>

<blockquote>
<p>bar
baz</li>
</ul></p>
</blockquote>

Notice that the closing tags are not in the right order..

If a newline is added before the quote, it closes the list before starting
the blockquote, so that's a workaround.

(This also happens with markdown 1.0.2~b8-1)

-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.20-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages markdown depends on:
ii perl 5.8.8-7 Larry Wall's Practical Extraction

markdown recommends no packages.

-- no debconf information
--
see shy jo



----- End forwarded message -----
--
Matt
Michel Fortin
2007-05-23 14:01:31 UTC
Permalink
Post by Matt Kraai
* foo
bar
baz
Although it should certainly be valid HTML, the output Markdown
should generate for that is a pretty tricky question in my opinion. I
see three valid interpretations according to the Markdown syntax
documentation. Here is the simplest:

<ul>
<li>foo
Post by Matt Kraai
bar
baz</li>
</ul>

It also happens to be PHP Markdown's output. bar and baz are taken as
part of the list item since the following lines do not need to be
indented, and since the list item does not contain any blank line the
content gets treated as a span-level, hence no blockquote.

The next one is what most people would expect I think:

<ul>
<li>foo</li>
</ul>

<blockquote>
<p>bar
baz</p>
</blockquote>

Blockquote markers are obeyed and are on the same level as the list
since they aren't indented.

Third option:

<ul>
<li><p>foo</p>

<blockquote>
<p>bar
baz</p>
</blockquote></li>
</ul>

Blockquote markers are seen as inside the list item since adjacent
lines do not need indentation, and are obeyed making the list item
content's block-level.

I think, as a general rule, the explicit syntax should take
precedence over the lazy one. This would make the second option above
the preferred one over the others. Other tricky cases could work like
the following.

A list item containing a "foo" paragraph and a "bar baz" blockquote:

* foo
Post by Matt Kraai
bar
baz
A list item containing a "foo" paragraph and a "bar" blockquote,
followed by a "baz" blockquote:

* foo
Post by Matt Kraai
bar
baz
A list item containing a "foo" paragraph, followed by a "bar baz"
blockquote:

* foo
Post by Matt Kraai
bar
baz

A list item containing "foo" (no paragraph), followed by a blockquote
containing "bar", followed by a list item containing "baz" (no
paragraph):

* foo
Post by Matt Kraai
bar
* baz

Basically, I'd eliminate any "half-lazy" syntax were you can be lazy
about list item indentation while not being lazy on blockquote
markers. This just creates confusion; syntax markers shouldn't be
allowed to be lazy.

Removing half-lazy things would also fix a surprising issue with
Post by Matt Kraai
foo
bar
baz
This would be seen as a blockquote containing a "foo" paragraph, a
nested "bar" blockquote and a "baz" paragraph, instead of the
completly counter-intuitive output produced today. To make "baz" part
Post by Matt Kraai
foo
bar
baz
foo
bar
baz

but not something in between.


Michel Fortin
***@michelf.com
http://www.michelf.com/
John Gruber
2007-07-07 15:12:03 UTC
Permalink
Post by Matt Kraai
<ul>
<li>foo
Post by Matt Kraai
bar
baz</li>
</ul>
It also happens to be PHP Markdown's output. bar and baz are taken
as part of the list item since the following lines do not need to be
indented, and since the list item does not contain any blank line
the content gets treated as a span-level, hence no blockquote.
That's what I'm thinking the output should be. I think there should be
an official rule that all block-level constructs must be separated by
a blank line.

-J.G.
Andrea Censi
2007-07-07 15:26:54 UTC
Permalink
Post by John Gruber
That's what I'm thinking the output should be. I think there should be
an official rule that all block-level constructs must be separated by
a blank line.
That would make for a *much* simpler formal specification of the language.
--
Andrea Censi
"Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi
Michel Fortin
2007-07-08 22:47:26 UTC
Permalink
Post by John Gruber
That's what I'm thinking the output should be. I think there should be
an official rule that all block-level constructs must be separated by
a blank line.
I'm all for it. But wouldn't that compromise a lot of currently
existing documents?


Michel Fortin
***@michelf.com
http://www.michelf.com/
Waylan Limberg
2007-07-08 23:54:19 UTC
Permalink
True, but if you read between the lines, it is already insinuated that
that is the way things work. I know that's the way I initially
understood it. It was only after a few errors (in my typing) that I
realized the extra line break isn't always necessary. I just
considered it sloppy typing -- I'd say that unless there is something
somewhere that specifically says it's supposed to work without the
blank line, then we should expect the blank line to be there. Anything
else is just a sloppy/lazy author, which we can't make exceptions for.
But maybe that's just me.

For that matter, I believe python already is pretty strict about
requiring the blank line between any block-level constructs. There may
be an exception or two in a few minor edge cases, but that's it. I'd
say python is ahead of the curve on this one.
--
----
Waylan Limberg
***@gmail.com
Jacob Rus
2007-07-09 03:01:27 UTC
Permalink
Post by Waylan Limberg
For that matter, I believe python already is pretty strict about
requiring the blank line between any block-level constructs. There may
be an exception or two in a few minor edge cases, but that's it. I'd
say python is ahead of the curve on this one.
Wrong. Python completely ignores blank lines.
Fletcher T. Penney
2007-07-09 02:02:09 UTC
Permalink
Post by Michel Fortin
Post by John Gruber
That's what I'm thinking the output should be. I think there
should be
an official rule that all block-level constructs must be separated by
a blank line.
I'm all for it. But wouldn't that compromise a lot of currently
existing documents?
I say evaluate the decision on its own merits.... Worrying too much
about existing documents could be a bit of the "throwing good money
after bad" phenomenon...

F-
--
Fletcher T. Penney
***@fletcherpenney.net

If you had a million Shakespeares, could they write like a monkey?
- Steven Wright
John Gruber
2007-07-09 19:28:21 UTC
Permalink
Post by Michel Fortin
I'm all for it. But wouldn't that compromise a lot of currently
existing documents?
Probably.

One reason I've held off on major updates for so long is the idea
that if I'm going to break compatibility with Markdown 1.0, it'd best
to break it once, in multiple ways. Better for one Markdown 2.0 that
breaks/changes/clarifies several things all at once than than a bunch
of 1.1, 1.2, 1.3 updates that break things here and there.

Plus, anything outright new, like tables, is by definition going to
break compatibility with 1.0 implementations.

-J.G.
Michel Fortin
2007-07-12 12:44:53 UTC
Permalink
Post by John Gruber
One reason I've held off on major updates for so long is the idea
that if I'm going to break compatibility with Markdown 1.0, it'd
best to break it once, in multiple ways. Better for one Markdown
2.0 that breaks/changes/clarifies several things all at once than
than a bunch of 1.1, 1.2, 1.3 updates that break things here and
there.
That's a good plan.
Post by John Gruber
Plus, anything outright new, like tables, is by definition going to
break compatibility with 1.0 implementations.
Anything, from the simplest bug fix to the most complex feature, is
by definition going to break compatibility with 1.0 implementations
because the output for a given input is going to change; there's no
way to avoid that. But many changes in 1.0.1 are more problematic
from a compatibility point of view than the addition of a table
syntax like the one in PHP Markdown Extra -- it's pretty difficult to
write a table by accident because it requires a lot of pipes and
dashes following a certain pattern; on the other side, backslashes in
code blocks and spans are not a rare occurrence.

In other words, I think adding a feature and changing (or fixing) a
feature are on two different scales for compatibility problems, the
later being generally much more risky than the former because the
syntax is already in use.


Michel Fortin
***@michelf.com
http://www.michelf.com/
Rob Shearer
2007-07-12 14:13:33 UTC
Permalink
Post by Michel Fortin
Post by John Gruber
Plus, anything outright new, like tables, is by definition going
to break compatibility with 1.0 implementations.
Anything, from the simplest bug fix to the most complex feature, is
by definition going to break compatibility with 1.0 implementations
because the output for a given input is going to change; there's no
way to avoid that.
It might be nice if "version 2.0" were a little stricter in defining
what constitutes a valid markdown document. That way it's possible to
extend the language to interpret previously invalid documents,
instead of changing behavior for valid ones---so people authoring
"valid" markdown need not worry about features from future versions.

Such an approach only goes so far for a language like markdown that
tries to let everything look very natural, but most of the extensions
I've seen are fairly esoteric and don't sit well in plain-text
anyway, so hiding the markdown encoding behind a few extra characters
really doesn't hurt too much.

-rob
Jacob Rus
2007-07-09 03:05:17 UTC
Permalink
Post by John Gruber
That's what I'm thinking the output should be. I think there should be
an official rule that all block-level constructs must be separated by
a blank line.
This sounds just terrible to me. I don't want to require huge numbers
of blank lines whenever I want deeply nested lists, etc. People don't
always use such blank lines when they write (non-markdown) plain text
emails or documents, and I think mandating them in general is a mistake.
The current inconsistent hodge-podge behavior, which horribly breaks
on all the edge cases, is of course also not the right solution.

-Jacob
John Fraser
2007-07-09 19:04:55 UTC
Permalink
If I'm reading Gruber right, you could still do tightly-packed nested
lists; blank lines would remain optional both between list items and
around nested lists.

+1 for line-skipping.

-John
http://wmd-editor.com/
Post by Jacob Rus
Post by John Gruber
That's what I'm thinking the output should be. I think there should be
an official rule that all block-level constructs must be separated by
a blank line.
This sounds just terrible to me. I don't want to require huge numbers
of blank lines whenever I want deeply nested lists, etc. People don't
always use such blank lines when they write (non-markdown) plain text
emails or documents, and I think mandating them in general is a mistake.
The current inconsistent hodge-podge behavior, which horribly breaks
on all the edge cases, is of course also not the right solution.
-Jacob
_______________________________________________
Markdown-Discuss mailing list
http://six.pairlist.net/mailman/listinfo/markdown-discuss
Allan Odgaard
2007-07-15 22:50:24 UTC
Permalink
[...]
I think there should be an official rule that all block-level
constructs must be separated by a blank line.
When revising the rules, let me point your attention to a post of
mine about problems with nesting block-level constructs and lazy-
mode: <http://six.pairlist.net/pipermail/markdown-discuss/2006-August/
000151.html>

Would be nice to also address the ambiguities raised there, and in
I wrote something
you replied
and now here is my reply to your reply.

I really would like to see that interpreted as it reads, rather than
a single paragraph of double-quoted text (which is presently the
case) -- seems your new proposal would make my preferred
interpretation even less likely.
John Gruber
2007-07-09 19:51:36 UTC
Permalink
Post by John Gruber
That's what I'm thinking the output should be. I think there should be
an official rule that all block-level constructs must be separated by
a blank line.
This sounds just terrible to me. I don't want to require huge numbers of
blank lines whenever I want deeply nested lists, etc.
Nested lists are an exception. I think conceptually, a nested
hierarchical list is, to the writer, a single thing.

My plan for lists is to simplify them as follows:

* A list is a series of list items.

* If any of the items in a list are separated by a *single*
blank line, the entire list is in paragraph mode, and the
contents of each item in the list will be wrapped in `<p>`
tags.

* Otherwise, the list is not in paragraph mode and none of the
items' contents get `<p>` tags.

* Two consecutive blank lines ends the current list, no exceptions.

So you can still do this:

--
* One
* sub-one
* sub-two
* Two
* Three
--

And the output will be the same as currently.

But if you do this:

--
* One
* sub-one with something that looks like
a paragraph

* sub-two with something that looks like
a paragraph

* Two
* Three
--

or this:

--
* One
* sub-one
* sub-two

* Two

* Three
--

then *all* of the list items will be paragraph mode.

The "double blank line to end list" rule means you'll be able to write
this:

--
* Red
* Green
* Blue


* Cyan
* Yellow
* Magenta
* Black
--

To generate two consecutive lists.

The "must precede blocks with a blank line" rule also means that any
list item that contains block-level constructs, like blockquotes or
code blocks, will put that list into paragraph mode.
People don't always use such blank lines when they write
(non-markdown) plain text emails or documents, and I think mandating
them in general is a mistake.
People do all sorts of things in non-Markdown plain text that can't be
parsed in Markdown.

-J.G.
A. Pagaltzis
2007-07-09 20:26:59 UTC
Permalink
Post by John Gruber
--
* One
* sub-one with something that looks like
a paragraph
* sub-two with something that looks like
a paragraph
* Two
* Three
--
--
* One
* sub-one
* sub-two
* Two
* Three
--
then *all* of the list items will be paragraph mode.
Ack, please no. The latter should definitely put only the outer
list in paragraph mode. Whether the former should put the outer
list in paragraph mode or not is debatable and I think I could go
either way. But tight inner lists should never go into paragraph
mode just because the outer list has.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
John Fraser
2007-07-09 22:33:20 UTC
Permalink
Post by John Gruber
* A list is a series of list items.
* If any of the items in a list are separated by a *single*
blank line, the entire list is in paragraph mode, and the
contents of each item in the list will be wrapped in `<p>`
tags.
* Otherwise, the list is not in paragraph mode and none of the
items' contents get `<p>` tags.
* Two consecutive blank lines ends the current list, no exceptions.
But tight inner lists should never go into paragraph
mode just because the outer list has.
With AP's modification, I like this a lot.

While we're talking about whitespace in lists, it might be worth
looking at the other axis: indentation. Right now Markdown uses
4-character tab stops to differentiate nesting levels, which can be
pretty confusing at first. It's natural to indent a nested list two
spaces, and it seems to work when you're only one level deep:

- level 1
- level 2

But things break down in weird ways when you try to go deeper with
2-character indentation:

- level 1
- level 2
- level 2
- level 3
- level 3
- level 4

I think it makes sense to get rid of this invisible 4-character grain,
and just consider a list item to be the start of a nested list if it's
indented more than the previous list item. So:

- level 1
- level 2
- level 3
- level 4
- level 5
- level 6

Another problem is that Markdown doesn't degrade gracefully when the
indentation gets wacky:

- level 1
- level 2
- level 3
- level 3
- level 3

To handle bizarre indentation, the rule should probably be that an
outdented list item belongs to the most recent list item that's
indented less. So:

- level 1
- level 2
- level 2
- level 3
- level 1

Still kind of odd, but no more odd than the input.
Michel Fortin
2007-07-12 11:51:33 UTC
Permalink
Post by John Fraser
To handle bizarre indentation, the rule should probably be that an
outdented list item belongs to the most recent list item that's
- level 1
- level 2
- level 2
- level 3
- level 1
Would that work with right-aligned numbers? I think not.

1. Test
1. Test
2. Test
3. Test
4. Test
5. Test
6. Test
7. Test
8. Test
9. Test
10. Test
2. Test

Surely there's a way to make that work, but it's certainly not going
to be easy to explain how it works. (It'd be magic for most people.)
I still believe the "four spaces or one tab" rule is easier to
understand. If we're going to make a lot of incompatible changes,
it'd be a good time to change Markdown so this rule works as expected.


Michel Fortin
***@michelf.com
http://www.michelf.com/
Már Örlygsson
2007-07-12 12:26:42 UTC
Permalink
Post by Michel Fortin
Would that work with right-aligned numbers? I think not.
I think it would.
Post by Michel Fortin
Post by John Fraser
- level 1
- level 2
- level 2
- level 3
- level 1
9. level 1
9. level 2
10. level 2
1. level 3
10. level 1
Which looks about right. The rule...
Post by Michel Fortin
an outdented list item belongs to the most recent list item
that's indented less.
...sounds like it could work.
--
Már
Michel Fortin
2007-07-17 01:57:53 UTC
Permalink
Post by Már Örlygsson
Post by Michel Fortin
Would that work with right-aligned numbers? I think not.
I think it would.
Yes, if we use John Fraser's example, it'd work as he said. It would
not work as a user would expect with my example though (the one you
skipped from my email):

1. Test
1. Test
2. Test
3. Test
4. Test
5. Test
6. Test
7. Test
8. Test
9. Test
10. Test
2. Test

Here, using John Fraser's suggested algorithm, subitem number 10
would be sent to the first list which is indented less, which means
between item 1 and 2 of the outer list. I think a user would expect
item 10 to come after item 9 in the nested list.


Michel Fortin
***@michelf.com
http://www.michelf.com/
John Fraser
2007-07-17 16:08:35 UTC
Permalink
Post by Michel Fortin
1. Test
1. Test
2. Test
3. Test
4. Test
5. Test
6. Test
7. Test
8. Test
9. Test
10. Test
2. Test
Here, using John Fraser's suggested algorithm, subitem number 10
would be sent to the first list which is indented less, which means
between item 1 and 2 of the outer list.
Nope, item 10 would be correctly sent to the nested list. It "belongs
to the most recent list item that's indented less." In other words,
item 1.

By "belongs to" I mean that it's a descendant of that node -- not a
sibling. So item 10 is part of the list that's nested within item 1.

John Fraser

Loading...