Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid output in batch_reader.go #29101

Closed
KN4CK3R opened this issue Feb 8, 2024 · 2 comments · Fixed by #29297
Closed

Invalid output in batch_reader.go #29101

KN4CK3R opened this issue Feb 8, 2024 · 2 comments · Fixed by #29297
Labels

Comments

@KN4CK3R
Copy link
Member

KN4CK3R commented Feb 8, 2024

strconv.ParseInt: parsing "number": invalid syntax

Steps to reproduce:

  1. Create a repository with actions enabled
  2. Create a file /.gitea/workflows with the content not a number
  3. Create a file /.github/workflows with the content not a number
  4. Open the page /<owner>/<repo>/actions
  5. If no error is shown, refresh the page multiple times until an error occurs

The reason is this code

tree, err := commit.SubTree(".gitea/workflows")
if _, ok := err.(git.ErrNotExist); ok {
tree, err = commit.SubTree(".github/workflows")
}

which fails in line 179

func ReadBatchLine(rd *bufio.Reader) (sha []byte, typ string, size int64, err error) {
typ, err = rd.ReadString('\n')
if err != nil {
return sha, typ, size, err
}
if len(typ) == 1 {
typ, err = rd.ReadString('\n')
if err != nil {
return sha, typ, size, err
}
}
idx := strings.IndexByte(typ, ' ')
if idx < 0 {
log.Debug("missing space typ: %s", typ)
return sha, typ, size, ErrNotExist{ID: string(sha)}
}
sha = []byte(typ[:idx])
typ = typ[idx+1:]
idx = strings.IndexByte(typ, ' ')
if idx < 0 {
return sha, typ, size, ErrNotExist{ID: string(sha)}
}
sizeStr := typ[idx+1 : len(typ)-1]
typ = typ[:idx]
size, err = strconv.ParseInt(sizeStr, 10, 64)
return sha, typ, size, err
}

If the error occurs the code tries to parse not a number as output of the batch reader. A normal output looks like 8b2bfd3a117fcb7ae93f93616cb2f7a9706c513a tree 104. I don't know how the content of a file can be present here. And even worse is that it does not fail every time. If I create only one workflows file it does not fail but that may just be a timing issue.

Git Version

2.43.0, non-gogit

@KN4CK3R
Copy link
Member Author

KN4CK3R commented Feb 8, 2024

Stacktrace:

code.gitea.io/gitea/modules/git.ReadBatchLine(0xc0034f5008?)
        /src/modules/git/batch_reader.go:182 +0x1c8
code.gitea.io/gitea/modules/git.(*Repository).getTree(0xc005f8c870, {0x3a90360, 0xc0034f5008})
        /src/modules/git/repo_tree_nogogit.go:19 +0xf0
code.gitea.io/gitea/modules/git.(*Tree).SubTree(0xc004ba2f70, {0x327c01b?, 0x316d8c8?})
        /src/modules/git/tree.go:42 +0x1a5
code.gitea.io/gitea/modules/actions.ListWorkflows(0xc004ba2f70)
        /src/modules/actions/workflows.go:50 +0x85
code.gitea.io/gitea/routers/web/repo/actions.List(0xc004be8c80)
        /src/routers/web/repo/actions/actions.go:74 +0x1d7

@yp05327
Copy link
Contributor

yp05327 commented Feb 19, 2024

cat-file --batch will output <sha> SP <type> SP <size> LF\n<content>
After some tests, I notice that:
normally, reader buffer will not be empty after read one line, as we have <content> behind.
but reader buffer will randomly be empty after read one line, e.g. we will get <sha> SP <type> SP <size> LF\n in reader bugger without content. And the content will be read at the next time, which caused this bug.

Some related codes:

func (repo *Repository) CatFileBatch(ctx context.Context) (WriteCloserError, *bufio.Reader, func()) {
if repo.batchCancel == nil || repo.batchReader.Buffered() > 0 {
log.Debug("Opening temporary cat file batch for: %s", repo.Path)
return CatFileBatch(ctx, repo.Path)
}
return repo.batchWriter, repo.batchReader, func() {}

if repo.batchReader.Buffered() is 0, as what mentioned above, reader buffer will randomly be empty after read one line,
CatFileBatch will return the existing reader buffer , and in for loop in func SubTree, in the next time, func getTree will read <content> as the first line.

KN4CK3R added a commit that referenced this issue Feb 21, 2024
Fixes the reason why #29101 is hard to replicate.
Related #29297

Create a repo with a file with minimum size 4097 bytes (I use 10000) and
execute the following code:
```go
gitRepo, err := gitrepo.OpenRepository(db.DefaultContext, <repo>)
assert.NoError(t, err)

commit, err := gitRepo.GetCommit(<sha>)
assert.NoError(t, err)

entry, err := commit.GetTreeEntryByPath(<file>)
assert.NoError(t, err)

b := entry.Blob()

// Create a reader
r, err := b.DataAsync()
assert.NoError(t, err)
defer r.Close()

// Create a second reader
r2, err := b.DataAsync()
assert.NoError(t, err) // Should be no error but is ErrNotExist
defer r2.Close()
```

The problem is the check in `CatFileBatch`:

https://github.com/go-gitea/gitea/blob/79217ea63c1f77de7ca79813ae45950724e63d02/modules/git/repo_base_nogogit.go#L81-L87
`Buffered() > 0` is used to check if there is a "operation" in progress
at the moment. This is a problem because we can't control the internal
buffer in the `bufio.Reader`. The code above demonstrates a sequence
which initiates an operation for which the code thinks there is no
active processing. The second call to `DataAsync()` therefore reuses the
existing instances instead of creating a new batch reader.
KN4CK3R added a commit to KN4CK3R/gitea that referenced this issue Feb 21, 2024
Fixes the reason why go-gitea#29101 is hard to replicate.
Related go-gitea#29297

Create a repo with a file with minimum size 4097 bytes (I use 10000) and
execute the following code:
```go
gitRepo, err := gitrepo.OpenRepository(db.DefaultContext, <repo>)
assert.NoError(t, err)

commit, err := gitRepo.GetCommit(<sha>)
assert.NoError(t, err)

entry, err := commit.GetTreeEntryByPath(<file>)
assert.NoError(t, err)

b := entry.Blob()

// Create a reader
r, err := b.DataAsync()
assert.NoError(t, err)
defer r.Close()

// Create a second reader
r2, err := b.DataAsync()
assert.NoError(t, err) // Should be no error but is ErrNotExist
defer r2.Close()
```

The problem is the check in `CatFileBatch`:

https://github.com/go-gitea/gitea/blob/79217ea63c1f77de7ca79813ae45950724e63d02/modules/git/repo_base_nogogit.go#L81-L87
`Buffered() > 0` is used to check if there is a "operation" in progress
at the moment. This is a problem because we can't control the internal
buffer in the `bufio.Reader`. The code above demonstrates a sequence
which initiates an operation for which the code thinks there is no
active processing. The second call to `DataAsync()` therefore reuses the
existing instances instead of creating a new batch reader.
silverwind pushed a commit that referenced this issue Feb 22, 2024
Backport #29298
Fixes the reason why #29101 is hard to replicate.
Related #29297

Create a repo with a file with minimum size 4097 bytes (I use 10000) and
execute the following code:
```go
gitRepo, err := gitrepo.OpenRepository(db.DefaultContext, <repo>)
assert.NoError(t, err)

commit, err := gitRepo.GetCommit(<sha>)
assert.NoError(t, err)

entry, err := commit.GetTreeEntryByPath(<file>)
assert.NoError(t, err)

b := entry.Blob()

// Create a reader
r, err := b.DataAsync()
assert.NoError(t, err)
defer r.Close()

// Create a second reader
r2, err := b.DataAsync()
assert.NoError(t, err) // Should be no error but is ErrNotExist
defer r2.Close()
```

The problem is the check in `CatFileBatch`:


https://github.com/go-gitea/gitea/blob/79217ea63c1f77de7ca79813ae45950724e63d02/modules/git/repo_base_nogogit.go#L81-L87
`Buffered() > 0` is used to check if there is a "operation" in progress
at the moment. This is a problem because we can't control the internal
buffer in the `bufio.Reader`. The code above demonstrates a sequence
which initiates an operation for which the code thinks there is no
active processing. The second call to `DataAsync()` therefore reuses the
existing instances instead of creating a new batch reader.
lunny pushed a commit that referenced this issue Feb 22, 2024
Fixes #29101
Related #29298

Discard all read data to prevent misinterpreting existing data. Some
discard calls were missing in error cases.

---------

Co-authored-by: yp05327 <576951401@qq.com>
GiteaBot pushed a commit to GiteaBot/gitea that referenced this issue Feb 22, 2024
Fixes go-gitea#29101
Related go-gitea#29298

Discard all read data to prevent misinterpreting existing data. Some
discard calls were missing in error cases.

---------

Co-authored-by: yp05327 <576951401@qq.com>
lunny pushed a commit that referenced this issue Feb 22, 2024
Backport #29297 by @KN4CK3R

Fixes #29101
Related #29298

Discard all read data to prevent misinterpreting existing data. Some
discard calls were missing in error cases.

Co-authored-by: KN4CK3R <admin@oldschoolhack.me>
Co-authored-by: yp05327 <576951401@qq.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
2 participants