Improving pinned topic excerpts


(Sam Saffron) #1

Our current pinned topic excerpt algorithm leaves much to be desired.

And

It basically takes all the words in the post that fit in the first 220 chars, strips formatting, mushes them them together and TADA.

This leaves moderators little to no control over what is displayed in the excerpt and can lead to a cluttered view.

Instead I would like to simplify the algorithm to:

Take all the words in the first paragraph (P), up to 220 length. Don’t cut words halfway through.

This will give moderators way better control and assist in cleaning up the topic lists where pinned topics are involved.

Thoughts?

cc @PJH


(Sam Saffron) #2

One side-effect problem is that posts like this: http://discourse.stonehearth.net/t/stonehearth-announced-features/2361 will end up with the crappy excerpt: hello everyone,. The fix is trivial though, just remove the salutation.


(Jeff Atwood) #3

I actually think all the ones you posted screenshots of look fine.

Blindly stripping pin excerpts at the first para would literally break the excerpt of every single one of the pinned topics you just screenshotted… hard to see why you’re even proposing that.

Have you looked at GMail? It also concats in a similar way.

concat from

Notice that Gmail does NOT stop at the first paragraph, and neither should we…

Maybe the thing to do is add some other kind of markup people can use, who want finer control than this. Certainly not a V1 thing.


(Sam Saffron) #4

I thought the simplest thing here that at least grants control is to say break on html comment <!-- break -->, unfortunately we now strip out html comments from rendered markdown so its non trivial.

Other option is HR but is litters the post. Last option is custom markup, but it also is confusing.

My major issue is that there is zero mod control here, you have an excerpt and can not control how it looks or when it stops.


(PJH) #5

<algorithm/thoughts prepared, but forgot to post, 4 hrs ago deleted>

Because there’d be no point, apparently.


(Sam Saffron) #6

I dunno, I think the major pain point is lack of control. No matter what fancy algorithm we come with we will mess up sometimes. Plenty of improvement though we can do with current system, like stop cutting words in half (gmail does not) and some other bits an pieces.


(PJH) #7

Not with the stuff I came up with; had 3 config parameters, an easy override for individual posts…


(Sam Saffron) #8

I just added trivial support for cutting exceprts, was a fairly trivial change

<span class='excerpt'>My excerpt here</span>

Allows you to control the excerpt as a mod. Will fix the word truncating as well, surprisingly it is ever so slightly more complicated than this change.

https://github.com/discourse/discourse/commit/de7e6a95459f2e114c89bd1554e3a21861a3ffa7


Do we still need the blurb for pinned topics?
Setting up the pinned 'about' threads for forum subsections
(Mittineague) #9

Yes, it is more complex than it would first seem to be.
Gets real messy when it cuts between opening and closing tags,


(Sam Saffron) #10

Not following, is that relating to the new feature?


(Mittineague) #11

Sorry to confuse.

I was commenting on your

A while back I wrote some code to parse HTML and create excerpts and it surprized me how invloved it was for it to come out OK.


(Dave McClure) #12

Thanks for this. I just found that it only seems to work if the markup is at the beginning of the post. Would be nice if we could snip an excerpt from anywhere in the post.


(Sam Saffron) #13

Since this is such a power feature, I don’t mind if you change it and submit a PR


(Dave McClure) #14

OK, sounds good. I’ll take a look at some point in the next week.


(Dave McClure) #15

Small improvement made

https://github.com/discourse/discourse/pull/2746

A notable limitation is that the <span class='excerpt'> part still needs to start early enough in the post, before the max excerpt length.


(Sam Saffron) #16

I wonder if this limitation will just end up causing confusion here.


(Dave McClure) #17

Certainly could.

I thought about removing it but wasn’t sure if it was worth parsing the whole post for the power-user feature for that benefit…


(Sam Saffron) #18

you can do a cheap “look for string” and then do the more expensive parse if it is there.


(Dave McClure) #19

Makes sense. I’ll keep this on my list in that case.


(Dave McClure) #20

@sam - do you think that the length of the content within the <span class='excerpt'> tags should obey the max excerpt length site setting? or should it override it in some manner? (like allow it to be up to 2x the length of the site setting?).

I know I’m overthinking it, but figured I’d ask in case there are any strong opinions either way…