1
1
mirror of https://github.com/mcuadros/ascode synced 2024-11-22 17:02:03 +01:00
ascode/_documentation/starlark/expressions.md

27 KiB

title weight toc
Expressions 6 true

Overview

An expression specifies the computation of a value.

The Starlark grammar defines several categories of expression. An operand is an expression consisting of a single token (such as an identifier or a literal), or a bracketed expression. Operands are self-delimiting. An operand may be followed by any number of dot, call, or slice suffixes, to form a primary expression. In some places in the Starlark grammar where an expression is expected, it is legal to provide a comma-separated list of expressions denoting a tuple. The grammar uses Expression where a multiple-component expression is allowed, and Test where it accepts an expression of only a single component.

Expression = Test {',' Test} .

Test = LambdaExpr | IfExpr | PrimaryExpr | UnaryExpr | BinaryExpr .

PrimaryExpr = Operand
            | PrimaryExpr DotSuffix
            | PrimaryExpr CallSuffix
            | PrimaryExpr SliceSuffix
            .

Operand = identifier
        | int | float | string
        | ListExpr | ListComp
        | DictExpr | DictComp
        | '(' [Expression] [,] ')'
        | ('-' | '+') PrimaryExpr
        .

DotSuffix   = '.' identifier .
CallSuffix  = '(' [Arguments [',']] ')' .
SliceSuffix = '[' [Expression] [':' Test [':' Test]] ']' .

TODO: resolve position of +x, -x, and 'not x' in grammar: Operand or UnaryExpr?

Identifiers

Primary = identifier

An identifier is a name that identifies a value.

Lookup of locals and globals may fail if not yet defined.

Literals

Starlark supports literals of three different kinds:

Primary = int | float | string

Evaluation of a literal yields a value of the given type (string, int, or float) with the given value. See Literals for details.

Parenthesized expressions

Primary = '(' [Expression] ')'

A single expression enclosed in parentheses yields the result of that expression. Explicit parentheses may be used for clarity, or to override the default association of subexpressions.

1 + 2 * 3 + 4                   # 11
(1 + 2) * (3 + 4)               # 21

If the parentheses are empty, or contain a single expression followed by a comma, or contain two or more expressions, the expression yields a tuple.

()                              # (), the empty tuple
(1,)                            # (1,), a tuple of length 1
(1, 2)                          # (1, 2), a 2-tuple or pair
(1, 2, 3)                       # (1, 2, 3), a 3-tuple or triple

In some contexts, such as a return or assignment statement or the operand of a for statement, a tuple may be expressed without parentheses.

x, y = 1, 2

return 1, 2

for x in 1, 2:
   print(x)

Starlark (like Python 3) does not accept an unparenthesized tuple expression as the operand of a list comprehension:

[2*x for x in 1, 2, 3]	       	# parse error: unexpected ','

Dictionary expressions

A dictionary expression is a comma-separated list of colon-separated key/value expression pairs, enclosed in curly brackets, and it yields a new dictionary object. An optional comma may follow the final pair.

DictExpr = '{' [Entries [',']] '}' .
Entries  = Entry {',' Entry} .
Entry    = Test ':' Test .

Examples:

{}
{"one": 1}
{"one": 1, "two": 2,}

The key and value expressions are evaluated in left-to-right order. Evaluation fails if the same key is used multiple times.

Only hashable values may be used as the keys of a dictionary. This includes all built-in types except dictionaries, sets, and lists; a tuple is hashable only if its elements are hashable.

List expressions

A list expression is a comma-separated list of element expressions, enclosed in square brackets, and it yields a new list object. An optional comma may follow the last element expression.

ListExpr = '[' [Expression [',']] ']' .

Element expressions are evaluated in left-to-right order.

Examples:

[]                      # [], empty list
[1]                     # [1], a 1-element list
[1, 2, 3,]              # [1, 2, 3], a 3-element list

Unary operators

There are three unary operators, all appearing before their operand: +, -, ~, and not.

UnaryExpr = '+' PrimaryExpr
          | '-' PrimaryExpr
          | '~' PrimaryExpr
          | 'not' Test
          .
+ number        unary positive          (int, float)
- number        unary negation          (int, float)
~ number        unary bitwise inversion (int)
not x           logical negation        (any type)

The + and - operators may be applied to any number (int or float) and return the number unchanged. Unary + is never necessary in a correct program, but may serve as an assertion that its operand is a number, or as documentation.

if x > 0:
	return +1
else if x < 0:
	return -1
else:
	return 0

The not operator returns the negation of the truth value of its operand.

not True                        # False
not False                       # True
not [1, 2, 3]                   # False
not ""                          # True
not 0                           # True

The ~ operator yields the bitwise inversion of its integer argument. The bitwise inversion of x is defined as -(x+1).

~1                              # -2
~-1                             # 0
~0                              # -1

Binary operators

Starlark has the following binary operators, arranged in order of increasing precedence:

or
and
==   !=   <    >   <=   >=   in   not in
|
^
&
<<   >>
-    +
*    /    //   %

Comparison operators, in, and not in are non-associative, so the parser will not accept 0 <= i < n. All other binary operators of equal precedence associate to the left.

BinaryExpr = Test {Binop Test} .

Binop = 'or'
      | 'and'
      | '==' | '!=' | '<' | '>' | '<=' | '>=' | 'in' | 'not' 'in'
      | '|'
      | '^'
      | '&'
      | '-' | '+'
      | '*' | '%' | '/' | '//'
      | '<<' | '>>'
      .

or and and

The or and and operators yield, respectively, the logical disjunction and conjunction of their arguments, which need not be Booleans. The expression x or y yields the value of x if its truth value is True, or the value of y otherwise.

False or False		# False
False or True		# True
True  or False		# True
True  or True		# True

0 or "hello"		# "hello"
1 or "hello"		# 1

Similarly, x and y yields the value of x if its truth value is False, or the value of y otherwise.

False and False		# False
False and True		# False
True  and False		# False
True  and True		# True

0 and "hello"		# 0
1 and "hello"		# "hello"

These operators use "short circuit" evaluation, so the second expression is not evaluated if the value of the first expression has already determined the result, allowing constructions like these:

len(x) > 0 and x[0] == 1		# x[0] is not evaluated if x is empty
x and x[0] == 1
len(x) == 0 or x[0] == ""
not x or not x[0]

Comparisons

The == operator reports whether its operands are equal; the != operator is its negation.

The operators <, >, <=, and >= perform an ordered comparison of their operands. It is an error to apply these operators to operands of unequal type, unless one of the operands is an int and the other is a float. Of the built-in types, only the following support ordered comparison, using the ordering relation shown:

NoneType        # None <= None
bool            # False < True
int             # mathematical
float           # as defined by IEEE 754
string          # lexicographical
tuple           # lexicographical
list            # lexicographical

Comparison of floating point values follows the IEEE 754 standard, which breaks several mathematical identities. For example, if x is a NaN value, the comparisons x < y, x == y, and x > y all yield false for all values of y.

Applications may define additional types that support ordered comparison.

The remaining built-in types support only equality comparisons. Values of type dict or set compare equal if their elements compare equal, and values of type function or builtin_function_or_method are equal only to themselves.

dict                            # equal contents
set                             # equal contents
function                        # identity
builtin_function_or_method      # identity

Arithmetic operations

The following table summarizes the binary arithmetic operations available for built-in types:

Arithmetic (int or float; result has type float unless both operands have type int)
   number + number              # addition
   number - number              # subtraction
   number * number              # multiplication
   number / number              # real division  (result is always a float)
   number // number             # floored division
   number % number              # remainder of floored division
   number ^ number              # bitwise XOR
   number << number             # bitwise left shift
   number >> number             # bitwise right shift

Concatenation
   string + string
     list + list
    tuple + tuple

Repetition (string/list/tuple)
      int * sequence
 sequence * int

String interpolation
   string % any                 # see String Interpolation

Sets
      int | int                 # bitwise union (OR)
      set | set                 # set union
      int & int                 # bitwise intersection (AND)
      set & set                 # set intersection
      set ^ set                 # set symmetric difference

The operands of the arithmetic operators +, -, *, //, and % must both be numbers (int or float) but need not have the same type. The type of the result has type int only if both operands have that type. The result of real division / always has type float.

The + operator may be applied to non-numeric operands of the same type, such as two lists, two tuples, or two strings, in which case it computes the concatenation of the two operands and yields a new value of the same type.

"Hello, " + "world"		# "Hello, world"
(1, 2) + (3, 4)			# (1, 2, 3, 4)
[1, 2] + [3, 4]			# [1, 2, 3, 4]

The * operator may be applied to an integer n and a value of type string, list, or tuple, in which case it yields a new value of the same sequence type consisting of n repetitions of the original sequence. The order of the operands is immaterial. Negative values of n behave like zero.

'mur' * 2               # 'murmur'
3 * range(3)            # [0, 1, 2, 0, 1, 2, 0, 1, 2]

Applications may define additional types that support any subset of these operators.

The & operator requires two operands of the same type, either int or set. For integers, it yields the bitwise intersection (AND) of its operands. For sets, it yields a new set containing the intersection of the elements of the operand sets, preserving the element order of the left operand.

The | operator likewise computes bitwise or set unions. The result of set | set is a new set whose elements are the union of the operands, preserving the order of the elements of the operands, left before right.

The ^ operator accepts operands of either int or set type. For integers, it yields the bitwise XOR (exclusive OR) of its operands. For sets, it yields a new set containing elements of either first or second operand but not both (symmetric difference).

The << and >> operators require operands of int type both. They shift the first operand to the left or right by the number of bits given by the second operand. It is a dynamic error if the second operand is negative. Implementations may impose a limit on the second operand of a left shift.

0x12345678 & 0xFF               # 0x00000078
0x12345678 | 0xFF               # 0x123456FF
0b01011101 ^ 0b110101101        # 0b111110000
0b01011101 >> 2                 # 0b010111
0b01011101 << 2                 # 0b0101110100

set([1, 2]) & set([2, 3])       # set([2])
set([1, 2]) | set([2, 3])       # set([1, 2, 3])
set([1, 2]) ^ set([2, 3])       # set([1, 3])

Implementation note: The Go implementation of Starlark requires the -set flag to enable support for sets. The Java implementation does not support sets.

Membership tests

      any in     sequence		(list, tuple, dict, set, string)
      any not in sequence

The in operator reports whether its first operand is a member of its second operand, which must be a list, tuple, dict, set, or string. The not in operator is its negation. Both return a Boolean.

The meaning of membership varies by the type of the second operand: the members of a list, tuple, or set are its elements; the members of a dict are its keys; the members of a string are all its substrings.

1 in [1, 2, 3]                  # True
4 in (1, 2, 3)                  # False
4 not in set([1, 2, 3])         # True

d = {"one": 1, "two": 2}
"one" in d                      # True
"three" in d                    # False
1 in d                          # False
[] in d				# False

"nasty" in "dynasty"            # True
"a" in "banana"                 # True
"f" not in "way"                # True

String interpolation

The expression format % args performs string interpolation, a simple form of template expansion. The format string is interpreted as a sequence of literal portions and conversions. Each conversion, which starts with a % character, is replaced by its corresponding value from args. The characters following % in each conversion determine which argument it uses and how to convert it to a string.

Each % character marks the start of a conversion specifier, unless it is immediately followed by another %, in which case both characters together denote a literal percent sign.

If the "%" is immediately followed by "(key)", the parenthesized substring specifies the key of the args dictionary whose corresponding value is the operand to convert. Otherwise, the conversion's operand is the next element of args, which must be a tuple with exactly one component per conversion, unless the format string contains only a single conversion, in which case args itself is its operand.

Starlark does not support the flag, width, and padding specifiers supported by Python's % and other variants of C's printf.

After the optional (key) comes a single letter indicating what operand types are valid and how to convert the operand x to a string:

%       none            literal percent sign
s       any             as if by str(x)
r       any             as if by repr(x)
d       number          signed integer decimal
i       number          signed integer decimal
o       number          signed octal
x       number          signed hexadecimal, lowercase
X       number          signed hexadecimal, uppercase
e       number          float exponential format, lowercase
E       number          float exponential format, uppercase
f       number          float decimal format, lowercase
F       number          float decimal format, uppercase
g       number          like %e for large exponents, %f otherwise
G       number          like %E for large exponents, %F otherwise
c       string          x (string must encode a single Unicode code point)
        int             as if by chr(x)

It is an error if the argument does not have the type required by the conversion specifier. A Boolean argument is not considered a number.

Examples:

"Hello %s, your score is %d" % ("Bob", 75)      # "Hello Bob, your score is 75"

"%d %o %x %c" % (65, 65, 65, 65)                # "65 101 41 A" (decimal, octal, hexadecimal, Unicode)

"%(greeting)s, %(audience)s" % dict(            # "Hello, world"
  greeting="Hello",
  audience="world",
)

"rate = %g%% APR" % 3.5                         # "rate = 3.5% APR"

One subtlety: to use a tuple as the operand of a conversion in format string containing only a single conversion, you must wrap the tuple in a singleton tuple:

"coordinates=%s" % (40.741491, -74.003680)	# error: too many arguments for format string
"coordinates=%s" % ((40.741491, -74.003680),)	# "coordinates=(40.741491, -74.003680)"

TODO: specify %e and %f more precisely.

Conditional expressions

A conditional expression has the form a if cond else b. It first evaluates the condition cond. If it's true, it evaluates a and yields its value; otherwise it yields the value of b.

IfExpr = Test 'if' Test 'else' Test .

Example:

"yes" if enabled else "no"

Comprehensions

A comprehension constructs new list or dictionary value by looping over one or more iterables and evaluating a body expression that produces successive elements of the result.

A list comprehension consists of a single expression followed by one or more clauses, the first of which must be a for clause. Each for clause resembles a for statement, and specifies an iterable operand and a set of variables to be assigned by successive values of the iterable. An if cause resembles an if statement, and specifies a condition that must be met for the body expression to be evaluated. A sequence of for and if clauses acts like a nested sequence of for and if statements.

ListComp = '[' Test {CompClause} ']'.
DictComp = '{' Entry {CompClause} '}' .

CompClause = 'for' LoopVariables 'in' Test
           | 'if' Test .

LoopVariables = PrimaryExpr {',' PrimaryExpr} .

Examples:

[x*x for x in range(5)]                 # [0, 1, 4, 9, 16]
[x*x for x in range(5) if x%2 == 0]     # [0, 4, 16]
[(x, y) for x in range(5)
        if x%2 == 0
        for y in range(5)
        if y > x]                       # [(0, 1), (0, 2), (0, 3), (0, 4), (2, 3), (2, 4)]

A dict comprehension resembles a list comprehension, but its body is a pair of expressions, key: value, separated by a colon, and its result is a dictionary containing the key/value pairs for which the body expression was evaluated. Evaluation fails if the value of any key is unhashable.

As with a for loop, the loop variables may exploit compound assignment:

[x*y+z for (x, y), z in [((2, 3), 5), (("o", 2), "!")]]         # [11, 'oo!']

Starlark, following Python 3, does not accept an unparenthesized tuple or lambda expression as the operand of a for clause:

[x*x for x in 1, 2, 3]		# parse error: unexpected comma
[x*x for x in lambda: 0]	# parse error: unexpected lambda

Comprehensions in Starlark, again following Python 3, define a new lexical block, so assignments to loop variables have no effect on variables of the same name in an enclosing block:

x = 1
_ = [x for x in [2]]            # new variable x is local to the comprehension
print(x)                        # 1

The operand of a comprehension's first clause (always a for) is resolved in the lexical block enclosing the comprehension. In the examples below, identifiers referring to the outer variable named x have been distinguished by subscript.

x₀ = (1, 2, 3)
[x*x for x in x₀]               # [1, 4, 9]
[x*x for x in x₀ if x%2 == 0]   # [4]

All subsequent for and if expressions are resolved within the comprehension's lexical block, as in this rather obscure example:

x₀ = ([1, 2], [3, 4], [5, 6])
[x*x for x in x₀ for x in x if x%2 == 0]     # [4, 16, 36]

which would be more clearly rewritten as:

x = ([1, 2], [3, 4], [5, 6])
[z*z for y in x for z in y if z%2 == 0]     # [4, 16, 36]

Function and method calls

CallSuffix = '(' [Arguments [',']] ')' .

Arguments = Argument {',' Argument} .
Argument  = Test | identifier '=' Test | '*' Test | '**' Test .

A value f of type function or builtin_function_or_method may be called using the expression f(...). Applications may define additional types whose values may be called in the same way.

A method call such as filename.endswith(".star") is the composition of two operations, m = filename.endswith and m(".star"). The first, a dot operation, yields a bound method, a function value that pairs a receiver value (the filename string) with a choice of method (string·endswith).

Only built-in or application-defined types may have methods.

See Functions for an explanation of function parameter passing.

Dot expressions

A dot expression x.f selects the attribute f (a field or method) of the value x.

Fields are possessed by none of the main Starlark data types, but some application-defined types have them. Methods belong to the built-in types string, list, dict, and set, and to many application-defined types.

DotSuffix = '.' identifier .

A dot expression fails if the value does not have an attribute of the specified name.

Use the built-in function hasattr(x, "f") to ascertain whether a value has a specific attribute, or dir(x) to enumerate all its attributes. The getattr(x, "f") function can be used to select an attribute when the name "f" is not known statically.

A dot expression that selects a method typically appears within a call expression, as in these examples:

["able", "baker", "charlie"].index("baker")     # 1
"banana".count("a")                             # 3
"banana".reverse()                              # error: string has no .reverse field or method

But when not called immediately, the dot expression evaluates to a bound method, that is, a method coupled to a specific receiver value. A bound method can be called like an ordinary function, without a receiver argument:

f = "banana".count
f                                               # <built-in method count of string value>
f("a")                                          # 3
f("n")                                          # 2

Index expressions

An index expression a[i] yields the ith element of an indexable type such as a string, tuple, or list. The index i must be an int value in the range -ni < n, where n is len(a); any other index results in an error.

SliceSuffix = '[' [Expression] [':' Test [':' Test]] ']' .

A valid negative index i behaves like the non-negative index n+i, allowing for convenient indexing relative to the end of the sequence.

"abc"[0]                        # "a"
"abc"[1]                        # "b"
"abc"[-1]                       # "c"

("zero", "one", "two")[0]       # "zero"
("zero", "one", "two")[1]       # "one"
("zero", "one", "two")[-1]      # "two"

An index expression d[key] may also be applied to a dictionary d, to obtain the value associated with the specified key. It is an error if the dictionary contains no such key.

An index expression appearing on the left side of an assignment causes the specified list or dictionary element to be updated:

a = range(3)            # a == [0, 1, 2]
a[2] = 7                # a == [0, 1, 7]

coins["suzie b"] = 100

It is a dynamic error to attempt to update an element of an immutable type, such as a tuple or string, or a frozen value of a mutable type.

Slice expressions

A slice expression a[start:stop:stride] yields a new value containing a sub-sequence of a, which must be a string, tuple, or list.

SliceSuffix = '[' [Expression] [':' Test [':' Test]] ']' .

Each of the start, stop, and stride operands is optional; if present, and not None, each must be an integer. The stride value defaults to 1. If the stride is not specified, the colon preceding it may be omitted too. It is an error to specify a stride of zero.

Conceptually, these operands specify a sequence of values i starting at start and successively adding stride until i reaches or passes stop. The result consists of the concatenation of values of a[i] for which i is valid.`

The effective start and stop indices are computed from the three operands as follows. Let n be the length of the sequence.

If the stride is positive: If the start operand was omitted, it defaults to -infinity. If the end operand was omitted, it defaults to +infinity. For either operand, if a negative value was supplied, n is added to it. The start and end values are then "clamped" to the nearest value in the range 0 to n, inclusive.

If the stride is negative: If the start operand was omitted, it defaults to +infinity. If the end operand was omitted, it defaults to -infinity. For either operand, if a negative value was supplied, n is added to it. The start and end values are then "clamped" to the nearest value in the range -1 to n-1, inclusive.

"abc"[1:]               # "bc"  (remove first element)
"abc"[:-1]              # "ab"  (remove last element)
"abc"[1:-1]             # "b"   (remove first and last element)
"banana"[1::2]          # "aaa" (select alternate elements starting at index 1)
"banana"[4::-2]         # "nnb" (select alternate elements in reverse, starting at index 4)

Unlike Python, Starlark does not allow a slice expression on the left side of an assignment.

Slicing a tuple or string may be more efficient than slicing a list because tuples and strings are immutable, so the result of the operation can share the underlying representation of the original operand (when the stride is 1). By contrast, slicing a list requires the creation of a new list and copying of the necessary elements.

Lambda expressions

A lambda expression yields a new function value.

LambdaExpr = 'lambda' [Parameters] ':' Test .

Parameters = Parameter {',' Parameter} .
Parameter  = identifier
           | identifier '=' Test
           | '*'
           | '*' identifier
           | '**' identifier
           .

Syntactically, a lambda expression consists of the keyword lambda, followed by a parameter list like that of a def statement but unparenthesized, then a colon :, and a single expression, the function body.

Example:

def map(f, list):
    return [f(x) for x in list]

map(lambda x: 2*x, range(3))    # [2, 4, 6]

As with functions created by a def statement, a lambda function captures the syntax of its body, the default values of any optional parameters, the value of each free variable appearing in its body, and the global dictionary of the current module.

The name of a function created by a lambda expression is "lambda".

The two statements below are essentially equivalent, but the function created by the def statement is named twice and the function created by the lambda expression is named lambda.

def twice(x):
   return x * 2

twice = lambda x: x * 2

Implementation note: The Go implementation of Starlark requires the -lambda flag to enable support for lambda expressions. The Java implementation does not support them. See Google Issue b/36358844.