-
-
Notifications
You must be signed in to change notification settings - Fork 166
OSH Word Evaluation Algorithm
This page documents a portion of the OSH implementation. It differs significantly from other shells in this respect.
- They tend to use a homogeneous tree with various flags (e.g.
nosplit
,assignment
, etc.). - OSH uses a typed, heterogeneous tree (now statically checked with MyPy).
For example, word_part = LiteralPart(...) | BracedVarSub(...) | CommandSub(...) | ...
https://github.com/oilshell/oil/blob/master/frontend/syntax.asdl#L107
(Specifying ML-like data structures with ASDL was an implementation style borrowed from CPython itself: see posts tagged #ASDL)
- As much parsing as possible is done in a single pass, with lexer modes.
- There are some subsequent tweaks for detecting assignments, tildes, etc.
- There is a "metaprogramming" pass for brace expansion:
i=0; {$((i++)),x,y}
There are three stages (not four as in POSIX):
- Evaluation of the typed tree. (using
osh/word_eval.py
)- There is a restricted variant of word evaluation for completion, e.g. so arbitrary processes aren't run with you hit TAB.
- Splitting with IFS. Ths is specified with a state machine in
osh/split.py
. (I think OSH is unique in this regard too.)- Splitting involves the concept of "frames", to handle things like
x='a b'; y='c d'; echo $x"${@}"$y
. The last part of$x
has to be joined withargv[0]
, andargv[n-1]
has to be joined with$y
.
- Splitting involves the concept of "frames", to handle things like
- Globbing.
There is no such thing as "quote removal" in OSH (e.g. any more than a Python or JavaScript interpreter has "quote removal"). It's just evaluation.
Bug: Internally, splitting and globbing both use \
to inhibit expansion. That is, \*
is an escaped glob. And \
is an escaped space (IFS character).
This causes problems when IFS='\'
. I think I could choose a different character for OSH, maybe even the NUL
byte.
OSH wants to treat all sublanguages uniformly. (Command, Word, Arith, and the non-POSIX bool [[
) are the main sublanguages.) For some "dynamic" sublanguages like flag syntax, it fell a bit short.
This matters for interctive completion, which wants to understand the code statically.
For example, note that you can have variable references in several sublanguages:
Static:
-
x=1
-- assignments are in the command language 3. `[[ $x -
echo ${x:-${x:-default}}
-- word -
echo $(( x + 1 ))
-- arithmetic
Dynamic:
-
code='x=1'; readonly $code
-- the dynamic builtin language - Other builtins that manage variables:
getopts
read
unset
-
printf -v
in bash
- Variable references in
${!x}
in bash/ksh