9.0 KiB
title | weight | toc |
---|---|---|
Value concepts | 5 | true |
Overview
Starlark has eleven core data types. An application that embeds the Starlark intepreter may define additional types that behave like Starlark values. All values, whether core or application-defined, implement a few basic behaviors:
str(x) -- return a string representation of x
type(x) -- return a string describing the type of x
bool(x) -- convert x to a Boolean truth value
Identity and mutation
Starlark is an imperative language: programs consist of sequences of
statements executed for their side effects.
For example, an assignment statement updates the value held by a
variable, and calls to some built-in functions such as print
change
the state of the application that embeds the interpreter.
Values of some data types, such as NoneType
, bool
, int
, float
, and
string
, are immutable; they can never change.
Immutable values have no notion of identity: it is impossible for a
Starlark program to tell whether two integers, for instance, are
represented by the same object; it can tell only whether they are
equal.
Values of other data types, such as list
, dict
, and set
, are
mutable: they may be modified by a statement such as a[i] = 0
or
items.clear()
. Although tuple
and function
values are not
directly mutable, they may refer to mutable values indirectly, so for
this reason we consider them mutable too. Starlark values of these
types are actually references to variables.
Copying a reference to a variable, using an assignment statement for instance, creates an alias for the variable, and the effects of operations applied to the variable through one alias are visible through all others.
x = [] # x refers to a new empty list variable
y = x # y becomes an alias for x
x.append(1) # changes the variable referred to by x
print(y) # "[1]"; y observes the mutation
Starlark uses call-by-value parameter passing: in a function call, argument values are assigned to function parameters as if by assignment statements. If the values are references, the caller and callee may refer to the same variables, so if the called function changes the variable referred to by a parameter, the effect may also be observed by the caller:
def f(y):
y.append(1) # changes the variable referred to by x
x = [] # x refers to a new empty list variable
f(x) # f's parameter y becomes an alias for x
print(x) # "[1]"; x observes the mutation
As in all imperative languages, understanding aliasing, the relationship between reference values and the variables to which they refer, is crucial to writing correct programs.
Freezing a value
Starlark has a feature unusual among imperative programming languages: a mutable value may be frozen so that all subsequent attempts to mutate it fail with a dynamic error; the value, and all other values reachable from it, become immutable.
Immediately after execution of a Starlark module, all values in its top-level environment are frozen. Because all the global variables of an initialized Starlark module are immutable, the module may be published to and used by other threads in a parallel program without the need for locks. For example, the Bazel build system loads and executes BUILD and .bzl files in parallel, and two modules being executed concurrently may freely access variables or call functions from a third without the possibility of a race condition.
Hashing
The dict
and set
data types are implemented using hash tables, so
only hashable values are suitable as keys of a dict
or elements of
a set
. Attempting to use a non-hashable value as the key in a hash
table results in a dynamic error.
The hash of a value is an unspecified integer chosen so that two equal
values have the same hash, in other words, x == y => hash(x) == hash(y)
.
A hashable value has the same hash throughout its lifetime.
Values of the types NoneType
, bool
, int
, float
, and string
,
which are all immutable, are hashable.
Values of mutable types such as list
, dict
, and set
are not
hashable. These values remain unhashable even if they have become
immutable due to freezing.
A tuple
value is hashable only if all its elements are hashable.
Thus ("localhost", 80)
is hashable but ([127, 0, 0, 1], 80)
is not.
Values of the types function
and builtin_function_or_method
are also hashable.
Although functions are not necessarily immutable, as they may be
closures that refer to mutable variables, instances of these types
are compared by reference identity (see Comparisons),
so their hash values are derived from their identity.
Sequence types
Many Starlark data types represent a sequence of values: lists, tuples, and sets are sequences of arbitrary values, and in many contexts dictionaries act like a sequence of their keys.
We can classify different kinds of sequence types based on the operations they support. Each is listed below using the name of its corresponding interface in the interpreter's Go API.
Iterable
: an iterable value lets us process each of its elements in a fixed order. Examples:dict
,set
,list
,tuple
, but notstring
.Sequence
: a sequence of known length lets us know how many elements it contains without processing them. Examples:dict
,set
,list
,tuple
, but notstring
.Indexable
: an indexed type has a fixed length and provides efficient random access to its elements, which are identified by integer indices. Examples:string
,tuple
, andlist
.SetIndexable
: a settable indexed type additionally allows us to modify the element at a given integer index. Example:list
.Mapping
: a mapping is an association of keys to values. Example:dict
.
Although all of Starlark's core data types for sequences implement at
least the Sequence
contract, it's possible for an application
that embeds the Starlark interpreter to define additional data types
representing sequences of unknown length that implement only the Iterable
contract.
Strings are not iterable, though they do support the len(s)
and
s[i]
operations. Starlark deviates from Python here to avoid a common
pitfall in which a string is used by mistake where a list containing a
single string was intended, resulting in its interpretation as a sequence
of bytes.
Most Starlark operators and built-in functions that need a sequence of values will accept any iterable.
It is a dynamic error to mutate a sequence such as a list, set, or dictionary while iterating over it.
def increment_values(dict):
for k in dict:
dict[k] += 1 # error: cannot insert into hash table during iteration
dict = {"one": 1, "two": 2}
increment_values(dict)
Indexing
Many Starlark operators and functions require an index operand i
,
such as a[i]
or list.insert(i, x)
. Others require two indices i
and j
that indicate the start and end of a sub-sequence, such as
a[i:j]
, list.index(x, i, j)
, or string.find(x, i, j)
.
All such operations follow similar conventions, described here.
Indexing in Starlark is zero-based. The first element of a string
or list has index 0, the next 1, and so on. The last element of a
sequence of length n
has index n-1
.
"hello"[0] # "h"
"hello"[4] # "o"
"hello"[5] # error: index out of range
For sub-sequence operations that require two indices, the first is
inclusive and the second exclusive. Thus a[i:j]
indicates the
sequence starting with element i
up to but not including element
j
. The length of this sub-sequence is j-i
. This convention is known
as half-open indexing.
"hello"[1:4] # "ell"
Either or both of the index operands may be omitted. If omitted, the first is treated equivalent to 0 and the second is equivalent to the length of the sequence:
"hello"[1:] # "ello"
"hello"[:4] # "hell"
It is permissible to supply a negative integer to an indexing operation. The effective index is computed from the supplied value by the following two-step procedure. First, if the value is negative, the length of the sequence is added to it. This provides a convenient way to address the final elements of the sequence:
"hello"[-1] # "o", like "hello"[4]
"hello"[-3:-1] # "ll", like "hello"[2:4]
Second, for sub-sequence operations, if the value is still negative, it
is replaced by zero, or if it is greater than the length n
of the
sequence, it is replaced by n
. In effect, the index is "truncated" to
the nearest value in the range [0:n]
.
"hello"[-1000:+1000] # "hello"
This truncation step does not apply to indices of individual elements:
"hello"[-6] # error: index out of range
"hello"[-5] # "h"
"hello"[4] # "o"
"hello"[5] # error: index out of range