1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-04-28 08:35:23 +02:00
git/refs/refs-internal.h
Michael Haggerty 3bc581b940 refs: introduce an iterator interface
Currently, the API for iterating over references is via a family of
for_each_ref()-type functions that invoke a callback function for each
selected reference. All of these eventually call do_for_each_ref(),
which knows how to do one thing: iterate in parallel through two
ref_caches, one for loose and one for packed refs, giving loose
references precedence over packed refs. This is rather complicated code,
and is quite specialized to the files backend. It also requires callers
to encapsulate their work into a callback function, which often means
that they have to define and use a "cb_data" struct to manage their
context.

The current design is already bursting at the seams, and will become
even more awkward in the upcoming world of multiple reference storage
backends:

* Per-worktree vs. shared references are currently handled via a kludge
  in git_path() rather than iterating over each part of the reference
  namespace separately and merging the results. This kludge will cease
  to work when we have multiple reference storage backends.

* The current scheme is inflexible. What if we sometimes want to bypass
  the ref_cache, or use it only for packed or only for loose refs? What
  if we want to store symbolic refs in one type of storage backend and
  non-symbolic ones in another?

In the future, each reference backend will need to define its own way of
iterating over references. The crux of the problem with the current
design is that it is impossible to compose for_each_ref()-style
iterations, because the flow of control is owned by the for_each_ref()
function. There is nothing that a caller can do but iterate through all
references in a single burst, so there is no way for it to interleave
references from multiple backends and present the result to the rest of
the world as a single compound backend.

This commit introduces a new iteration primitive for references: a
ref_iterator. A ref_iterator is a polymorphic object that a reference
storage backend can be asked to instantiate. There are three functions
that can be applied to a ref_iterator:

* ref_iterator_advance(): move to the next reference in the iteration
* ref_iterator_abort(): end the iteration before it is exhausted
* ref_iterator_peel(): peel the reference currently being looked at

Iterating using a ref_iterator leaves the flow of control in the hands
of the caller, which means that ref_iterators from multiple
sources (e.g., loose and packed refs) can be composed and presented to
the world as a single compound ref_iterator.

It also means that the backend code for implementing reference iteration
will sometimes be more complicated. For example, the
cache_ref_iterator (which iterates over a ref_cache) can't use the C
stack to recurse; instead, it must manage its own stack internally as
explicit data structures. There is also a lot of boilerplate connected
with object-oriented programming in C.

Eventually, end-user callers will be able to be written in a more
natural way—managing their own flow of control rather than having to
work via callbacks. Since there will only be a few reference backends
but there are many consumers of this API, this is a good tradeoff.

More importantly, we gain composability, and especially the possibility
of writing interchangeable parts that can work with any ref_iterator.

For example, merge_ref_iterator implements a generic way of merging the
contents of any two ref_iterators. It is used to merge loose + packed
refs as part of the implementation of the files_ref_iterator. But it
will also be possible to use it to merge other pairs of reference
sources (e.g., per-worktree vs. shared refs).

Another example is prefix_ref_iterator, which can be used to trim a
prefix off the front of reference names before presenting them to the
caller (e.g., "refs/heads/master" -> "master").

In this patch, we introduce the iterator abstraction and many utilities,
and implement a reference iterator for the files ref storage backend.
(I've written several other obvious utilities, for example a generic way
to filter references being iterated over. These will probably be useful
in the future. But they are not needed for this patch series, so I am
not including them at this time.)

In a moment we will rewrite do_for_each_ref() to work via reference
iterators (allowing some special-purpose code to be discarded), and do
something similar for reflogs. In future patch series, we will expose
the ref_iterator abstraction in the public refs API so that callers can
use it directly.

Implementation note: I tried abstracting this a layer further to allow
generic iterators (over arbitrary types of objects) and generic
utilities like a generic merge_iterator. But the implementation in C was
very cumbersome, involving (in my opinion) too much boilerplate and too
much unsafe casting, some of which would have had to be done on the
caller side. However, I did put a few iterator-related constants in a
top-level header file, iterator.h, as they will be useful in a moment to
implement iteration over directory trees and possibly other types of
iterators in the future.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-20 11:38:20 -07:00

501 lines
17 KiB
C

#ifndef REFS_REFS_INTERNAL_H
#define REFS_REFS_INTERNAL_H
/*
* Data structures and functions for the internal use of the refs
* module. Code outside of the refs module should use only the public
* functions defined in "refs.h", and should *not* include this file.
*/
/*
* Flag passed to lock_ref_sha1_basic() telling it to tolerate broken
* refs (i.e., because the reference is about to be deleted anyway).
*/
#define REF_DELETING 0x02
/*
* Used as a flag in ref_update::flags when a loose ref is being
* pruned. This flag must only be used when REF_NODEREF is set.
*/
#define REF_ISPRUNING 0x04
/*
* Used as a flag in ref_update::flags when the reference should be
* updated to new_sha1.
*/
#define REF_HAVE_NEW 0x08
/*
* Used as a flag in ref_update::flags when old_sha1 should be
* checked.
*/
#define REF_HAVE_OLD 0x10
/*
* Used as a flag in ref_update::flags when the lockfile needs to be
* committed.
*/
#define REF_NEEDS_COMMIT 0x20
/*
* 0x40 is REF_FORCE_CREATE_REFLOG, so skip it if you're adding a
* value to ref_update::flags
*/
/*
* Used as a flag in ref_update::flags when we want to log a ref
* update but not actually perform it. This is used when a symbolic
* ref update is split up.
*/
#define REF_LOG_ONLY 0x80
/*
* Internal flag, meaning that the containing ref_update was via an
* update to HEAD.
*/
#define REF_UPDATE_VIA_HEAD 0x100
/*
* Return true iff refname is minimally safe. "Safe" here means that
* deleting a loose reference by this name will not do any damage, for
* example by causing a file that is not a reference to be deleted.
* This function does not check that the reference name is legal; for
* that, use check_refname_format().
*
* We consider a refname that starts with "refs/" to be safe as long
* as any ".." components that it might contain do not escape "refs/".
* Names that do not start with "refs/" are considered safe iff they
* consist entirely of upper case characters and '_' (like "HEAD" and
* "MERGE_HEAD" but not "config" or "FOO/BAR").
*/
int refname_is_safe(const char *refname);
enum peel_status {
/* object was peeled successfully: */
PEEL_PEELED = 0,
/*
* object cannot be peeled because the named object (or an
* object referred to by a tag in the peel chain), does not
* exist.
*/
PEEL_INVALID = -1,
/* object cannot be peeled because it is not a tag: */
PEEL_NON_TAG = -2,
/* ref_entry contains no peeled value because it is a symref: */
PEEL_IS_SYMREF = -3,
/*
* ref_entry cannot be peeled because it is broken (i.e., the
* symbolic reference cannot even be resolved to an object
* name):
*/
PEEL_BROKEN = -4
};
/*
* Peel the named object; i.e., if the object is a tag, resolve the
* tag recursively until a non-tag is found. If successful, store the
* result to sha1 and return PEEL_PEELED. If the object is not a tag
* or is not valid, return PEEL_NON_TAG or PEEL_INVALID, respectively,
* and leave sha1 unchanged.
*/
enum peel_status peel_object(const unsigned char *name, unsigned char *sha1);
/*
* Return 0 if a reference named refname could be created without
* conflicting with the name of an existing reference. Otherwise,
* return a negative value and write an explanation to err. If extras
* is non-NULL, it is a list of additional refnames with which refname
* is not allowed to conflict. If skip is non-NULL, ignore potential
* conflicts with refs in skip (e.g., because they are scheduled for
* deletion in the same operation). Behavior is undefined if the same
* name is listed in both extras and skip.
*
* Two reference names conflict if one of them exactly matches the
* leading components of the other; e.g., "foo/bar" conflicts with
* both "foo" and with "foo/bar/baz" but not with "foo/bar" or
* "foo/barbados".
*
* extras and skip must be sorted.
*/
int verify_refname_available(const char *newname,
const struct string_list *extras,
const struct string_list *skip,
struct strbuf *err);
/*
* Copy the reflog message msg to buf, which has been allocated sufficiently
* large, while cleaning up the whitespaces. Especially, convert LF to space,
* because reflog file is one line per entry.
*/
int copy_reflog_msg(char *buf, const char *msg);
int should_autocreate_reflog(const char *refname);
/**
* Information needed for a single ref update. Set new_sha1 to the new
* value or to null_sha1 to delete the ref. To check the old value
* while the ref is locked, set (flags & REF_HAVE_OLD) and set
* old_sha1 to the old value, or to null_sha1 to ensure the ref does
* not exist before update.
*/
struct ref_update {
/*
* If (flags & REF_HAVE_NEW), set the reference to this value:
*/
unsigned char new_sha1[20];
/*
* If (flags & REF_HAVE_OLD), check that the reference
* previously had this value:
*/
unsigned char old_sha1[20];
/*
* One or more of REF_HAVE_NEW, REF_HAVE_OLD, REF_NODEREF,
* REF_DELETING, REF_ISPRUNING, REF_LOG_ONLY, and
* REF_UPDATE_VIA_HEAD:
*/
unsigned int flags;
struct ref_lock *lock;
unsigned int type;
char *msg;
/*
* If this ref_update was split off of a symref update via
* split_symref_update(), then this member points at that
* update. This is used for two purposes:
* 1. When reporting errors, we report the refname under which
* the update was originally requested.
* 2. When we read the old value of this reference, we
* propagate it back to its parent update for recording in
* the latter's reflog.
*/
struct ref_update *parent_update;
const char refname[FLEX_ARRAY];
};
/*
* Add a ref_update with the specified properties to transaction, and
* return a pointer to the new object. This function does not verify
* that refname is well-formed. new_sha1 and old_sha1 are only
* dereferenced if the REF_HAVE_NEW and REF_HAVE_OLD bits,
* respectively, are set in flags.
*/
struct ref_update *ref_transaction_add_update(
struct ref_transaction *transaction,
const char *refname, unsigned int flags,
const unsigned char *new_sha1,
const unsigned char *old_sha1,
const char *msg);
/*
* Transaction states.
* OPEN: The transaction is in a valid state and can accept new updates.
* An OPEN transaction can be committed.
* CLOSED: A closed transaction is no longer active and no other operations
* than free can be used on it in this state.
* A transaction can either become closed by successfully committing
* an active transaction or if there is a failure while building
* the transaction thus rendering it failed/inactive.
*/
enum ref_transaction_state {
REF_TRANSACTION_OPEN = 0,
REF_TRANSACTION_CLOSED = 1
};
/*
* Data structure for holding a reference transaction, which can
* consist of checks and updates to multiple references, carried out
* as atomically as possible. This structure is opaque to callers.
*/
struct ref_transaction {
struct ref_update **updates;
size_t alloc;
size_t nr;
enum ref_transaction_state state;
};
int files_log_ref_write(const char *refname, const unsigned char *old_sha1,
const unsigned char *new_sha1, const char *msg,
int flags, struct strbuf *err);
/*
* Check for entries in extras that are within the specified
* directory, where dirname is a reference directory name including
* the trailing slash (e.g., "refs/heads/foo/"). Ignore any
* conflicting references that are found in skip. If there is a
* conflicting reference, return its name.
*
* extras and skip must be sorted lists of reference names. Either one
* can be NULL, signifying the empty list.
*/
const char *find_descendant_ref(const char *dirname,
const struct string_list *extras,
const struct string_list *skip);
int rename_ref_available(const char *oldname, const char *newname);
/* We allow "recursive" symbolic refs. Only within reason, though */
#define SYMREF_MAXDEPTH 5
/* Include broken references in a do_for_each_ref*() iteration: */
#define DO_FOR_EACH_INCLUDE_BROKEN 0x01
/*
* Reference iterators
*
* A reference iterator encapsulates the state of an in-progress
* iteration over references. Create an instance of `struct
* ref_iterator` via one of the functions in this module.
*
* A freshly-created ref_iterator doesn't yet point at a reference. To
* advance the iterator, call ref_iterator_advance(). If successful,
* this sets the iterator's refname, oid, and flags fields to describe
* the next reference and returns ITER_OK. The data pointed at by
* refname and oid belong to the iterator; if you want to retain them
* after calling ref_iterator_advance() again or calling
* ref_iterator_abort(), you must make a copy. When the iteration has
* been exhausted, ref_iterator_advance() releases any resources
* assocated with the iteration, frees the ref_iterator object, and
* returns ITER_DONE. If you want to abort the iteration early, call
* ref_iterator_abort(), which also frees the ref_iterator object and
* any associated resources. If there was an internal error advancing
* to the next entry, ref_iterator_advance() aborts the iteration,
* frees the ref_iterator, and returns ITER_ERROR.
*
* The reference currently being looked at can be peeled by calling
* ref_iterator_peel(). This function is often faster than peel_ref(),
* so it should be preferred when iterating over references.
*
* Putting it all together, a typical iteration looks like this:
*
* int ok;
* struct ref_iterator *iter = ...;
*
* while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
* ok = ref_iterator_abort(iter);
* break;
* }
*
* // Access information about the current reference:
* if (!(iter->flags & REF_ISSYMREF))
* printf("%s is %s\n", iter->refname, oid_to_hex(&iter->oid));
*
* // If you need to peel the reference:
* ref_iterator_peel(iter, &oid);
* }
*
* if (ok != ITER_DONE)
* handle_error();
*/
struct ref_iterator {
struct ref_iterator_vtable *vtable;
const char *refname;
const struct object_id *oid;
unsigned int flags;
};
/*
* Advance the iterator to the first or next item and return ITER_OK.
* If the iteration is exhausted, free the resources associated with
* the ref_iterator and return ITER_DONE. On errors, free the iterator
* resources and return ITER_ERROR. It is a bug to use ref_iterator or
* call this function again after it has returned ITER_DONE or
* ITER_ERROR.
*/
int ref_iterator_advance(struct ref_iterator *ref_iterator);
/*
* If possible, peel the reference currently being viewed by the
* iterator. Return 0 on success.
*/
int ref_iterator_peel(struct ref_iterator *ref_iterator,
struct object_id *peeled);
/*
* End the iteration before it has been exhausted, freeing the
* reference iterator and any associated resources and returning
* ITER_DONE. If the abort itself failed, return ITER_ERROR.
*/
int ref_iterator_abort(struct ref_iterator *ref_iterator);
/*
* An iterator over nothing (its first ref_iterator_advance() call
* returns ITER_DONE).
*/
struct ref_iterator *empty_ref_iterator_begin(void);
/*
* Return true iff ref_iterator is an empty_ref_iterator.
*/
int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
/*
* A callback function used to instruct merge_ref_iterator how to
* interleave the entries from iter0 and iter1. The function should
* return one of the constants defined in enum iterator_selection. It
* must not advance either of the iterators itself.
*
* The function must be prepared to handle the case that iter0 and/or
* iter1 is NULL, which indicates that the corresponding sub-iterator
* has been exhausted. Its return value must be consistent with the
* current states of the iterators; e.g., it must not return
* ITER_SKIP_1 if iter1 has already been exhausted.
*/
typedef enum iterator_selection ref_iterator_select_fn(
struct ref_iterator *iter0, struct ref_iterator *iter1,
void *cb_data);
/*
* Iterate over the entries from iter0 and iter1, with the values
* interleaved as directed by the select function. The iterator takes
* ownership of iter0 and iter1 and frees them when the iteration is
* over.
*/
struct ref_iterator *merge_ref_iterator_begin(
struct ref_iterator *iter0, struct ref_iterator *iter1,
ref_iterator_select_fn *select, void *cb_data);
/*
* An iterator consisting of the union of the entries from front and
* back. If there are entries common to the two sub-iterators, use the
* one from front. Each iterator must iterate over its entries in
* strcmp() order by refname for this to work.
*
* The new iterator takes ownership of its arguments and frees them
* when the iteration is over. As a convenience to callers, if front
* or back is an empty_ref_iterator, then abort that one immediately
* and return the other iterator directly, without wrapping it.
*/
struct ref_iterator *overlay_ref_iterator_begin(
struct ref_iterator *front, struct ref_iterator *back);
/*
* Wrap iter0, only letting through the references whose names start
* with prefix. If trim is set, set iter->refname to the name of the
* reference with that many characters trimmed off the front;
* otherwise set it to the full refname. The new iterator takes over
* ownership of iter0 and frees it when iteration is over. It makes
* its own copy of prefix.
*
* As an convenience to callers, if prefix is the empty string and
* trim is zero, this function returns iter0 directly, without
* wrapping it.
*/
struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
const char *prefix,
int trim);
/*
* Iterate over the packed and loose references in the specified
* submodule that are within find_containing_dir(prefix). If prefix is
* NULL or the empty string, iterate over all references in the
* submodule.
*/
struct ref_iterator *files_ref_iterator_begin(const char *submodule,
const char *prefix,
unsigned int flags);
/* Internal implementation of reference iteration: */
/*
* Base class constructor for ref_iterators. Initialize the
* ref_iterator part of iter, setting its vtable pointer as specified.
* This is meant to be called only by the initializers of derived
* classes.
*/
void base_ref_iterator_init(struct ref_iterator *iter,
struct ref_iterator_vtable *vtable);
/*
* Base class destructor for ref_iterators. Destroy the ref_iterator
* part of iter and shallow-free the object. This is meant to be
* called only by the destructors of derived classes.
*/
void base_ref_iterator_free(struct ref_iterator *iter);
/* Virtual function declarations for ref_iterators: */
typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
struct object_id *peeled);
/*
* Implementations of this function should free any resources specific
* to the derived class, then call base_ref_iterator_free() to clean
* up and free the ref_iterator object.
*/
typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
struct ref_iterator_vtable {
ref_iterator_advance_fn *advance;
ref_iterator_peel_fn *peel;
ref_iterator_abort_fn *abort;
};
/*
* Call fn for each reference in the specified submodule for which the
* refname begins with prefix. If trim is non-zero, then trim that
* many characters off the beginning of each refname before passing
* the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
* include broken references in the iteration. If fn ever returns a
* non-zero value, stop the iteration and return that value;
* otherwise, return 0.
*
* This is the common backend for the for_each_*ref* functions.
*/
int do_for_each_ref(const char *submodule, const char *prefix,
each_ref_fn fn, int trim, int flags, void *cb_data);
/*
* Read the specified reference from the filesystem or packed refs
* file, non-recursively. Set type to describe the reference, and:
*
* - If refname is the name of a normal reference, fill in sha1
* (leaving referent unchanged).
*
* - If refname is the name of a symbolic reference, write the full
* name of the reference to which it refers (e.g.
* "refs/heads/master") to referent and set the REF_ISSYMREF bit in
* type (leaving sha1 unchanged). The caller is responsible for
* validating that referent is a valid reference name.
*
* WARNING: refname might be used as part of a filename, so it is
* important from a security standpoint that it be safe in the sense
* of refname_is_safe(). Moreover, for symrefs this function sets
* referent to whatever the repository says, which might not be a
* properly-formatted or even safe reference name. NEITHER INPUT NOR
* OUTPUT REFERENCE NAMES ARE VALIDATED WITHIN THIS FUNCTION.
*
* Return 0 on success. If the ref doesn't exist, set errno to ENOENT
* and return -1. If the ref exists but is neither a symbolic ref nor
* a sha1, it is broken; set REF_ISBROKEN in type, set errno to
* EINVAL, and return -1. If there is another error reading the ref,
* set errno appropriately and return -1.
*
* Backend-specific flags might be set in type as well, regardless of
* outcome.
*
* It is OK for refname to point into referent. If so:
*
* - if the function succeeds with REF_ISSYMREF, referent will be
* overwritten and the memory formerly pointed to by it might be
* changed or even freed.
*
* - in all other cases, referent will be untouched, and therefore
* refname will still be valid and unchanged.
*/
int read_raw_ref(const char *refname, unsigned char *sha1,
struct strbuf *referent, unsigned int *type);
#endif /* REFS_REFS_INTERNAL_H */