Skip to content

OSH Word Evaluation Algorithm

andychu edited this page Jul 8, 2019 · 30 revisions

This page documents a portion of the OSH implementation. It differs significantly from other shells in this respect.

  • They tend to use a homogeneous tree with various flags (e.g. nosplit, assignment, etc.).
  • OSH uses a typed, heterogeneous tree (now statically checked with MyPy).

For example, word_part = LiteralPart(...) | BracedVarSub(...) | CommandSub(...) | ...

https://github.com/oilshell/oil/blob/master/frontend/syntax.asdl#L107

(Specifying ML-like data structures with ASDL was an implementation style borrowed from CPython itself: see posts tagged #ASDL)

Preliminaries

  1. As much parsing as possible is done in a single pass, with lexer modes.
  2. There are some subsequent tweaks for detecting assignments, tildes, etc.
  3. There is a "metaprogramming" pass for brace expansion: i=0; {$((i++)),x,y}

Word Evaluation Algorithm

There are three stages (not four as in POSIX):

  1. Evaluation of the typed tree. (using osh/word_eval.py)
    • There is a restricted variant of word evaluation for completion, e.g. so arbitrary processes aren't run with you hit TAB.
  2. Splitting with IFS. Ths is specified with a state machine in osh/split.py. (I think OSH is unique in this regard too.)
    • Splitting involves the concept of "frames", to handle things like x='a b'; y='c d'; echo $x"${@}"$y. The last part of $x has to be joined with argv[0], and argv[n-1] has to be joined with $y.
  3. Globbing.

There is no such thing as "quote removal" in OSH (e.g. any more than a Python or JavaScript interpreter has "quote removal"). It's just evaluation.

Bug: Internally, splitting and globbing both use \ to inhibit expansion. That is, \* is an escaped glob. And \ is an escaped space (IFS character).

This causes problems when IFS='\'. I think I could choose a different character for OSH, maybe even the NUL byte.

Motivation

OSH wants to treat all sublanguages uniformly. (Command, Word, Arith, and the non-POSIX bool [[) are the main sublanguages.) For some "dynamic" sublanguages like flag syntax, it fell a bit short.

This matters for interctive completion, which wants to understand the code statically.

For example, note that you can have variable references in several sublanguages:

Static:

  1. x=1 -- assignments are in the command language 3. `[[ $x
  2. echo ${x:-${x:-default}} -- word
  3. echo $(( x + 1 )) -- arithmetic

Dynamic:

  1. code='x=1'; readonly $code -- the dynamic builtin language
  2. Other builtins that manage variables:
    • getopts
    • read
    • unset
    • printf -v in bash
  3. Variable references in ${!x} in bash/ksh
Clone this wiki locally