Skip to content

anchors don't work when contains punctuation marks just like or ( #26367

@lazyky

Description

@lazyky

Description

Markdown Heading ID contains Unicode is inconsistent with Github.
For #### test(1) in gitea , the id is "user-content-test-1" and in github it is "user-content-test1"

The markdown below is available for jumping on github, but not for gitea.

#### test(1)
to [test(1)](#test1)

gitea

id = "user-content-test-1"
8d42bfb9f7ea6f43e4f3b6b8f339aff

github

id = "user-content-test1"
89b1f42dcd9174ec189487dd55057a4

Gitea Version

1.20.2

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

I was able to reproduce it using try.gitea.io.

Database

None

Activity

CaiCandong

CaiCandong commented on Aug 8, 2023

@CaiCandong
Member

What's the impact of this problem?

bioinformatist

bioinformatist commented on Aug 8, 2023

@bioinformatist

@CaiCandong

Sometimes we need to use section titles like this:

image

However, the malfunction of the anchors pointing to them makes reading somewhat difficult.

changed the title [-]Markdown Heading ID contains Unicode “(” is inconsistent with Github[/-] [+]anchors don't work when contains unicode `(`[/+] on Aug 8, 2023
CaiCandong

CaiCandong commented on Aug 8, 2023

@CaiCandong
Member

Markdown Heading ID contains Unicode is inconsistent with Github.

Thanks for the report, I understand the problem, besides does this problem also occur when using ( directly?

changed the title [-]anchors don't work when contains unicode `(`[/-] [+]anchors don't work when contains punctuation marks just like `(` or `(`[/+] on Aug 8, 2023
lazyky

lazyky commented on Aug 8, 2023

@lazyky
Author

Markdown Heading ID contains Unicode is inconsistent with Github.

Thanks for the report, I understand the problem, besides does this problem also occur when using ( directly?

Yes. I also test (, !, :, *, and . They are same as . @CaiCandong

#### test(0)

#### test!1

#### test:2

#### test*3

#### test!4

#### test:5

gitea

image

github

image

CaiCandong

CaiCandong commented on Aug 8, 2023

@CaiCandong
Member

I've located the code for this problem, it has to do with the user-conent-* generation rules, but I'm not particularly sure how github handles this, can you give me some more examples to help me refine the code?

#### test:ad # df
#### test:ad # df
#### test:ad #23 df 2*/*
lazyky

lazyky commented on Aug 8, 2023

@lazyky
Author

test:ad # df

test:ad # df

test:ad #23 df 2*/*

github

There are the examples on github
image

CaiCandong

CaiCandong commented on Aug 8, 2023

@CaiCandong
Member
def cheanValue(anchor_name):
    anchor_name = anchor_name.strip()
    ret = []
    for c in anchor_name:
        if c.isalpha() or c.isdigit() or c == '_' or c == '-':
            ret.append(c.lower())
        if c == ' ':
            ret.append('-')
    return ''.join(ret)

def test():
    cases = [
        ["", ""],
        ["test(0)", "test0"],
        ["test!1", "test1"],
        ["test:2", "test2"],
        ["test*3", "test3"],
        ["test!4", "test4"],
        ["test:5", "test5"],
        ["test*6", "test6"],
        ["test:6 a", "test6-a"],
        ["test:6 !b", "test6-b"],
        ["test:ad # df", "testad--df"],
        ["test:ad #23 df 2*/*", "testad-23-df-2"],
        ["test:ad 23 df 2*/*", "testad-23-df-2"],
        ["test:ad # 23 df 2*/*", "testad--23-df-2"],
        ["Anchors in Markdown", "anchors-in-markdown"],
        ["a_b_c", "a_b_c"],
        ["a-b-c", "a-b-c"],
        ["a-b-c----", "a-b-c----"],
        ["test:6a", "test6a"],
        ["test:a6", "testa6"],
        ["tes a a   a  a", "tes-a-a---a--a"],
        ["  tes a a   a  a  ", "tes-a-a---a--a"]]
    for parm,expect in cases:
        if cheanValue(parm) != expect:
            print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm)))
test()

Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github?
@lazyky @bioinformatist

wxiaoguang

wxiaoguang commented on Aug 8, 2023

@wxiaoguang
Contributor

This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745

Quote the old comment from that issue:


I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

There are various characters would be removed&replaced during URL generation. For example, the single quote ' in your demo file, too.

https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click
https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click

Since there is no standard, so there is no right or wrong, as long as it works.

Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream goldmark use CommonMark standard.

CaiCandong

CaiCandong commented on Aug 8, 2023

@CaiCandong
Member

I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

I understand what you're saying, and it's not a bug. But do we need to adjust it so that github/vscode is consistent?

wxiaoguang

wxiaoguang commented on Aug 8, 2023

@wxiaoguang
Contributor

Just to share the information from old issues. I am neutral for it.

bioinformatist

bioinformatist commented on Aug 8, 2023

@bioinformatist

This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745

Quote the old comment from that issue:

I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

There are various characters would be removed&replaced during URL generation. For example, the single quote ' in your demo file, too.

https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click
https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click

Since there is no standard, so there is no right or wrong, as long as it works.

Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream goldmark use CommonMark standard.

Got that. Sure it is not a bug, but it seems that the logic of github is more straightforward and easier to use.

lazyky

lazyky commented on Aug 8, 2023

@lazyky
Author
def cheanValue(anchor_name):
    anchor_name = anchor_name.strip()
    ret = []
    for c in anchor_name:
        if c.isalpha() or c.isdigit() or c == '_' or c == '-':
            ret.append(c.lower())
        if c == ' ':
            ret.append('-')
    return ''.join(ret)

def test():
    cases = [
        ["", ""],
        ["test(0)", "test0"],
        ["test!1", "test1"],
        ["test:2", "test2"],
        ["test*3", "test3"],
        ["test!4", "test4"],
        ["test:5", "test5"],
        ["test*6", "test6"],
        ["test:6 a", "test6-a"],
        ["test:6 !b", "test6-b"],
        ["test:ad # df", "testad--df"],
        ["test:ad #23 df 2*/*", "testad-23-df-2"],
        ["test:ad 23 df 2*/*", "testad-23-df-2"],
        ["test:ad # 23 df 2*/*", "testad--23-df-2"],
        ["Anchors in Markdown", "anchors-in-markdown"],
        ["a_b_c", "a_b_c"],
        ["a-b-c", "a-b-c"],
        ["a-b-c----", "a-b-c----"],
        ["test:6a", "test6a"],
        ["test:a6", "testa6"],
        ["tes a a   a  a", "tes-a-a---a--a"],
        ["  tes a a   a  a  ", "tes-a-a---a--a"]]
    for parm,expect in cases:
        if cheanValue(parm) != expect:
            print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm)))
test()

Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github? @lazyky @bioinformatist

Yes. That's right, but "" will not be rendered

github

d2cafcdf8f6f7a19fd0b848ad81d4ac
39c7a2bd9e2a1db1f853cda7972e15f

CaiCandong

CaiCandong commented on Aug 8, 2023

@CaiCandong
Member

Yes. That's right, but "" will not be rendered

These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?

lazyky

lazyky commented on Aug 8, 2023

@lazyky
Author

Yes. That's right, but "" will not be rendered

These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?

Ok. Below is the examples I tested on github

[
    ["tes()", "tes"],
    ["tes…@a", "tesa"],
    ["tes¥& a", "tes-a"],
    ["tes= a", "tes-a"],
    ["tes|a", "tesa"],
    ["tes\a", "tesa"],
    ["tes/a", "tesa"]
]
added a commit that references this issue on Aug 9, 2023
d41aee1
locked as resolved and limited conversation to collaborators on Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @wxiaoguang@bioinformatist@lazyky@CaiCandong

      Issue actions

        anchors don't work when contains punctuation marks just like `(` or `(` · Issue #26367 · go-gitea/gitea