1
0
mirror of https://github.com/git/git.git synced 2024-09-28 07:40:44 +02:00
git/parallel-checkout.h
Matheus Tavares 04155bdad8 unpack-trees: add basic support for parallel checkout
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.

Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:

- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
  detects it by looking for EEXIST and EISDIR errors after an
  open(O_CREAT | O_EXCL) failure.

- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
  at the has_dirs_only_path() check, which is done for the leading path
  of each item in the parallel checkout queue.

Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).

After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).

Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 11:57:05 -07:00

33 lines
816 B
C

#ifndef PARALLEL_CHECKOUT_H
#define PARALLEL_CHECKOUT_H
struct cache_entry;
struct checkout;
struct conv_attrs;
enum pc_status {
PC_UNINITIALIZED = 0,
PC_ACCEPTING_ENTRIES,
PC_RUNNING,
};
enum pc_status parallel_checkout_status(void);
/*
* Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
* only when in the PC_UNINITIALIZED state.
*/
void init_parallel_checkout(void);
/*
* Return -1 if parallel checkout is currently not accepting entries or if the
* entry is not eligible for parallel checkout. Otherwise, enqueue the entry
* for later write and return 0.
*/
int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
/* Write all the queued entries, returning 0 on success.*/
int run_parallel_checkout(struct checkout *state);
#endif /* PARALLEL_CHECKOUT_H */