[Main website]

dok/example/attribute-grammars

TODO this is the most important part of the PL, so refine it studying other approaches and integrating idea TODO merge with content ofdok/spec/attribute-grammar

Examples from UUAG

In UUAG:

data Tree
| Node left :: Tree
       right :: Tree
| Tip value :: Int

attr Tree
  syn sum :: Int

sem Tree
  | Node lhs.sum = @left.sum + @right.sum
  | Tip lhs.sum = @value

In Dok

data Tree [
  variant ../Node [
    slot left::Self
    slot right::Self
  ]

  variant ../Tip [
    slot value::Int
  ]

  attr sum::Int

  visit self -when ../Node {
    set self.sum self.left.~ + self.right.~
  } -when ../Tip {
    set self.sum self.value
  }
]

In Dok:

  • catamorphism like set self.sum self.left.sum + self.right.sum can be simplified by sugar-syntax “.~”
  • attr is by default a bottom-up attribute
  • in this example it is not showed, but visit self -init { ... } -when { ... } can be used for initializing variables before entering in the visit loop

In UUAG a Tree can be replaced by a tree with the same structure but a single value.

data Tree
  | Node left :: Tree
         right :: Tree
  | Tip value :: Int
deriving Tree : Show

attr Tree
  inh replacement :: Int
  syn transformed :: Tree

sem Tree
  | Node lhs.transformed = Node @left.transformed @right.transformed
         left.replacement = @lhs.replacement
         right.replacement = @lhs.replacement
  | Tip lhs.transformed = Tip @lhs.replacement

{
main :: IO ()
main = print (show result)

testTree :: Tree
testTree = Node (Tip 1) (Node (Tip 2) (Tip 3))

test :: T_Tree
test = sem_Tree testTree

result :: Tree
result = transformed_Syn_Tree (wrap_Tree test Inh_Tree{replacement_Inh_Tree=37} )
}
-- output of the program will be "Node (Tip 37) (Node (Tip 37) (Tip 37))"

In Dok

data Tree::Tree [
  cntx $replacement::Int

  attr transformed::Self

  visit self -when ../Node {
    set self.tranformed self.(with left self.left.~, with right self.right.~)
  } -when ../Tip {
    set self.transformed self.(set value self.$replacement)
  }
]

var test-tree::Tree Node(
  with left Tip(with value 1),
  with right Node(
    slot left Tip(with value 2),
    slot right Tip(with value 3)))

var result::Tree with $Tree/replacement 37 { return test-tree.transformed }

visit set the attribute value to each part of the Self.

This is another example using the sum attribute, and extending the initial Tree.

data Tree::Tree [

  fun transformed-with-global-sum::Self {
    return with $replacement self.sum self.transformed
  }
]

In Dok a context (i.e. a top-down/inherited attribute) follows automatically copy semantic.

In UUAG attributes can be both synthetized (bottom-up) and inherited (top-down). They are called chained. TODO

In this UUAG example there is a loc.min when loc.min is a local variables (due to loc.) in scope of all production rules. Instead, in Dok it is used init { ... } -visit self { ... }.

attr Tree
  inh gmin :: Int
  syn min :: Int
  syn result :: Tree

sem Tree
  | Bin left.gmin = { @lhs.gmin }
  -- "left.gmin" refers to the inherited attribute "gmin"
  -- of the child "left"
  | Bin right.gmin = { @lhs.gmin }
  -- "@lhs.gmin" refers to the inherited attribute "gmin"
  -- of nonterminal "Tree"
  | Bin loc.min = { min @left.min @right.min }
  -- "min" is a new local variable of the constructor "Bin"

sem Tree
  | Bin lhs.result = { Bin @left.result @right.result }
  -- "@left.result" refers to the synthesized attribute "result"
  -- of child "left"
  | Bin lhs.min = { @min }
  -- "@min" refers to the local variable "min"
  | Leaf lhs.result = { Leaf @lhs.gmin }
  -- "@lhs.gmin" refers to the inherited attribute "gmin"
  -- of nonterminal "Tree"
  | Leaf lhs.min = { @int }
  -- "@int" refers to the value of field "int" of "Leaf"
data Tree::Tree [

  cntx $gmin::Int
  attr min::Int
  attr result::Self

  visit self -init {
    var loc-min::Int
  } -when ../Bin {
    set self.left.$gmin self.$gmin
    set self.right.$gmin self.$gmin
    # in reality this is the copy-semantic and it can be simplified with
    # set self.$gmin self.$gmin
    # or nothing in this case, because the new value is similar to the old value

    set loc-min [self.left.min,, self.right.min]*.minimum

    set self.min loc-min
    set self.result self.(slot left self.left.~, slot right self.right.~)

  } -when ../Leaf {
    set self.min self.value
    set self.result self.(slot value self.$gmin)
  }
]

An inherited attribute can be seen in two way:

  • as a slot of the object
  • in subj.f(), inside f code, the attribute is a cntx variable, which value is the value of the attribute of subj

Parametric attributes

In future Dok will support attributes with parameters. They can be implemented in an efficient way, because sometime they are the base attribute with some final computation according the parameter, while in other cases, if the parameter affect intermediate phases of the computation, a more agressive incremental update of intermediate results must be applied. In worst case scenario, the attribute is recomputed from scratch.

Higher Order Attribute Grammars

Vogt, H H, S D Swierstra, and M F Kuiper. “Higher Order Attribute Grammars,” n.d.

The value of an attribute can be another tree. This is similar to FP language, where a function can return another function.

They can be used for deriving an AST from the parser generated tree.

TODO check if reference AG are better for this type of computation

JastAdd

Hedin, Görel. “An Introductory Tutorial on JastAdd Attribute Grammars.” In Generative and Transformational Techniques in Software Engineering III, edited by João M. Fernandes, Ralf Lämmel, Joost Visser, and João Saraiva, 6491:166–200. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. https://doi.org/10.1007/978-3-642-18023-1_4.

“The attribute grammars used in JastAdd go much beyond the classical attribute grammars defined by Knuth [Knu68]. In this tutorial, we particularly cover reference attributes [Hed00], parameterized attributes [Hed00,Ekm06], circular attributes [Far86,MH07] and collection attributes [Boy96,MEH09].”

The usage of references allows to annotate the AST with information like call flow and name binding that are usually described in distinct graphs and data structures.

In any case, AG are still declarative: the order of evaluation does not affect the semantic, and it is decided by the AG compiler.

Reference Attributes

In a Reference AG (RAG), an attribute can reference another attribute. For example during name-analysis a variable usage can be linked to variable declaration.

Some analysis requiring graphs, can be done on AST with reference attributes.

MAYBE in Dok I can use Zippers to other AST parts, instead of references

MAYBE in Dok, only the function visit can create $some-ref to another node, during visit.

Parametrized Attributes

JustAdd support attributes having parameters.

Rewrites

“… allow sub ASTs to be replaced conditionally, depending on attribute values. This is useful when the AST constructed by the parser is not specific enough, or in order to normalize language constructs to make further compilation easier. Nonterminal attributes [VSK89] allow the AST to be extended dynamically, defining new AST nodes using equations. This is useful for macro expansion and transformation problems.

Nonterminal attributes

In JastAdd eq attributes corresponds to fun methods added to an Object in Dok. They are simply executed, because they derive information from other slots and attributes.

Inter-AST references [ ̊AEH10]

“allow nodes in a new AST to be connected to nodes in an existing AST. This is useful when creating transformed ASTs: nodes in the transformed AST can have references back to suitable locations in the source AST, giving access to information there.”

Interfaces.

“Attributes and equations can be defined in interfaces, rather than in AST classes, allowing reuse of language independent computations, and supporting connection to language independent tools.””

Refine.

“Equations in existing aspects can be refined in new aspects. This is similar to object-oriented overriding, but without having to declare new subclasses. Refines are useful for adjusting the behavior of an aspect when reusing it for a new language or tool.”

Circular attributes

They are annotated with “circular”. They start with an initial value, and they are updated until a fixed-point is reached.

MAYBE in Dok I store only direct facts, and then the other facts can be derived by deductive rules. So these facts are not materialized, but managed by some deductive DBMS.

Collection attributes

Collection attributes are typically containers, but they can be also scalars, that are updated during the visit of the Tree. A node X will update the collection attribute of a node Y that is stored as reference in one of his attributes. So instead of having normal AG equations, we have a more direct semantic.

Collection attributes can be replaced by normal AG attributes, with more complex searches in the AST node, so they are used only for convenience.

MAYBE check if collection can be used only in case of references, otherwise it is difficult to reach a node

AST Rewriting

“The AST produced by a parser is seldom ideal for all analysis and compilation. Rewrite rules allow conditional changes to the AST to be declared in order to obtain an improved AST, better suited for other computations.”

Rewrites are performed automatically and on demand, until a fixed point is reached.

With rewriting rules:

  • one can transform a parsed tree to an AST with a more precise semantic, derived by analysis
  • syntax sugars can be removed
  • the AST can be normalized

MAYBE in Dok there are instead compilation pass

TODO JastAdd example

In this example there are references and collection attributes.

StateMachine ::= Declaration*;
abstract Declaration;
State : Declaration ::= <Label:String>;
Transition : Declaration ::=
  <Label:String> <SourceLabel:String> <TargetLabel:String>;

aspect Graph {
  coll Set<Transition> State.transitions()   // "transitions" is a collection attribute,
                                             // of "State" AST objects,
                                             // of type "Set<Transition>"
       [new HashSet<Transition>()]           // its initial value for each "State" node
       with add;                             // call "Set.add" every time an element is added to the collection


  Transition contributes this                // add the Transition object "this"
  when source() != null                    // condition
  to State.transitions()                   // the collection attribute to use
  for source();                            // the node where updating the attribute

  syn Set<State> State.successors() {
    Set<State> result = new HashSet<State>();
    for (Transition t : transitions()) {
      if (t.target() != null) result.add(t.target());
    }
    return result;
  }
}

aspect NameAnalysis {
  syn State Transition.source() = lookup(getSourceLabel()); // R1
  syn State Transition.target() = lookup(getTargetLabel()); // R2
  inh State Declaration.lookup(String label); // R3

  eq StateMachine.getDeclaration(int i).lookup(String label) { // R4
    for (Declaration d : getDeclarationList()) {
      State match = d.localLookup(label);
      if (match != null) return match;
    }
    return null;
  }

  syn State Declaration.localLookup(String label) = null; // R5

  eq State.localLookup(String label) = // R6
      (label.equals(getLabel())) ? this : null;
}

TODO convert to Dok using: references or zipper.

Ideas from other papers

Higher Order AG (HAGs)

Vogt, Harald Heinz. “Higher Order Attribute Grammars Door,” 1993.

Attributes containing trees can be used for rewriting parsing trees in AST elegant forms.

MAYBE they can be used as “references”, pointing to a value.

FACT the “referenced” tree is the same tree, but without the attributes. Instead in reference AG, there is a direct link.

FACT they can be used in multi-pass compilers, because every pass create a better Tree. Like intech/nanopass.

MAYBE the multiple pass can be fused toghether by AG algorithms.

FACT the paper suggests to use AG as values of attributes for storing compound info like environments. So AG is used whenever possible, instead of ad-hoc data structures. The advantages are that incremental updates can be used in every part of the editing and compilation pass.

MAYBE in Dok I use value based semantic, so if I have computed attributes expressed in a FP style, it is a lot easier deriving incremental updates version of them. For example if ENVIRONMENT is stored in a HashMap, the code updating it can be compiled to a delta update version.

This paper descrive an evaluation algo for HAGs.

Zipper and AG

Martins, Pedro, João Paulo Fernandes, and João Saraiva. “Zipper-Based Attribute Grammars and Their Extensions.” In Programming Languages, edited by André Rauber Du Bois and Phil Trinder, 8129:135–49. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. https://doi.org/10.1007/978-3-642-40922-6_10.

In this paper Haskell zippers are used for modelling references in RAGs. HAGs are supported.

The references are modelled in a good way, but circular attributes (i.e. collections and similar attributes) are rather complex, because they involve fix-points. The problem is that AG are directly converted to Haskell syntax/semantic, hence there is no ad-hoc analysis and compilation.

TODO check if it is better using this in Dok value semantic, or supporting Dok references to AG parts, using some $self, $self.left and similar references.

TODO check the DEV notes about integration between zippers and AG

TODO try to represent the Desk example in Dok

Higher Order AG (HOAG)

Saraiva, João, and Doaitse Swierstra. “Generating Spreadsheet-Like Tools from Strong Attribute Grammars.” In Generative Programming and Component Engineering, edited by Frank Pfenning and Yannis Smaragdakis, 2830:307–23. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. https://doi.org/10.1007/978-3-540-39815-8_19.

The benefit of AG respect FP:

  • they terminate
  • they can be calculated in an efficient way, using strict order computation, deforestation, reduced navigation of the same AST
  • they support delta update of results

HOAG allows to store compound data structures using other HOAG in attributes, and then process alla data using the same HOAG formalism. They have a value based semantic. The paper call “Strong AG” this paradigm where all computations are done using AG.

Their system is implemented using Lrc.

MAYBE: there are inverse tecnique where FP code can be abstract-interpreted by an AG. So maybe we can use Dok FP parts with AG without too much problems.

NOTE: a common design-pattern of AG algo is that some value is computed from a bottom-up attribute, and then it is passed to a top-down attribute. This create some order of dependency and computation.

Rewritable RAG (ReRAG)

Ekman, Torbjörn, and Görel Hedin. “Rewritable Reference Attributed Grammars.” In ECOOP 2004 – Object-Oriented Programming, edited by Martin Odersky, 3086:147–71. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. https://doi.org/10.1007/978-3-540-24851-4_7.

NOTE: in Nanopass there are compilation pass that can modify the AST or compile to slightly different AST.

TODO try to use ReRAG on Nanopass examples

In JastAdd they are like:

rewrite N {
  when (cond)
  to R result;
}

The returned AST must be still valid, hence the type R must be consistent with the original type N. All equations remain valid and the AST attributes can be incremental updated.

The AST rewriting must reach a canonical form, where no cond is anymore true and all rewrite will stop.

MAYBE in Dok I specify rewrite pass as Crocopat or similar rules and I apply them explicitely. I can convert also to AST of different formats, like in Nanopass. There are also IncA and other condition –> rewrite systems.

TODO study Supercompilation for studying how compilation passage can be combined toghether.

Desk example

Using normal attributes

Desk is a short but complete example of AG. It is used also in the paper Martins, Pedro, João Paulo Fernandes, and João Saraiva. “Zipper-Based Attribute Grammars and Their Extensions.” In Programming Languages, edited by André Rauber Du Bois and Phil Trinder, 8129:135–49. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. https://doi.org/10.1007/978-3-642-40922-6_10.

An example of Desk code is

PRINT x + y + 1 WHERE x = 2, y = 3

TODO write a Peg Parser or a LR parser using the Dok formalism and DSL languages

The AST of Desk can be, in Haskell notation

data Root = Root Prog
data Prog = PRINT Exp Cons
data Exp = Add Exp Fact | Fact Fact
data Fact = Name Name | Number String
data Name = Id Constant
data Cons = EmptyCons | WHERE DefList
data DefList = Comma DefList Def | Def Def
data Def = Equal Name Value
type Constant = String
type Value = Int
type SymbolTable = [(String,String)]

The representation of PRINT x + y + 1 WEHER x = 2, y = 3 is

exp = Add (Add (Fact (Name (Id "x")))
               (Name (Id "y")))
          (Number 1)

deflst = WHERE (Comma (Def (Equal (Id "y") 5))
                           (Equal (Id "x") 3))

program = PRINT exp deflst

The attributes are

  • code: synthesized target code
  • name: synthesized name of a constant
  • value: synthesized value of a constant or a number
  • ok: synthesized attribute that indicates correcteness
  • envs: synthesized environment (symbol table)
  • envi: inherited environment (symbol table)

The rules using the Zipper library and some combinators, is

name :: Zipper Root -> String
name ag = case (constructor ag) of
            "Id" -> lexeme ag
            "Equal" -> name (ag.$1)

envi :: Zipper Root -> SymbolTable
envi ag = case (constructor ag) of
            "PRINT" -> envs (ag.$2)
            otherwise -> envi (parent ag)

envs :: Zipper Root -> SymbolTable
envs ag = case (constructor ag) of
  "EmptyCons" -> []
  "WHERE" -> envs (ag.$1)
  "Comma" -> (name (ag.$2), value (ag.$2)) : envs (ag.$1)
  "Def" -> [(name (ag.$1), value (ag.$1))]

value :: Zipper Root -> String
value ag = case (constructor ag) of
  "Name" -> getValue (name (ag.$1)) (envi ag)
  "Number" -> lexeme ag "Equal" -> lexeme ag

ok :: Zipper Root -> Bool ok
ag = case (constructor ag) of
  "Name" -> isIn (name (ag.$1)) (envi ag)
  "Number" -> True
  "EmptyCons" -> True
  "WHERE" -> ok (ag.$1)
  "Comma" -> ok (ag.$1) && not isIn (name (ag.$2)) (envs (ag.$1))
  "Def" -> True

code :: Zipper Root -> String
code ag = case (constructor ag) of
  "Root" -> code (ag.$1)
  "PRINT" -> if ok (ag.$2)
             then code (ag.$1) ++ "PRINT, 0" ++ "HALT, 0"
             else "HALT, 0"
  "Add" -> if (ok (ag.$2))
           then code (ag.$1) ++ "ADD, " ++ value (ag.$2)
           else "HALT, 0"
  "Fact" -> if (ok (ag.$1))
            then "LOAD, " ++ value (ag.$1)
            else "HALT, 0"

In Dok

data Prog [
  data Value::Int

  data Def [ slot ~::Name, slot ~::Value ]

  data Name::String

  data Fact [
    variant ../Name::Name
    variant ../Number::Value
  ]

  data Exp [
    variant ../Add [
      slot arg1::Exp
      slot arg2::Fact
    ]

    variant ../Fact::Fact
  ]

  slot ~::Exp
  slot ~::Def*
]

The differences are:

  • you can use only already defined data
  • Prog.Fact is a nested data type of Prog, so this reduce the pollution of name space
  • an explicit list is used for Prog.Def instead of a consed data type

The representation of PRINT x + y + 1 WHERE x = 2, y = 3 is

var prog Prog(
  slot exp Exp/Add(
    slot arg1 Exp/Add(
      slot arg1 Exp/Fact/Name("x")
      slot arg2 Fact/Name("y"))
    slot arg2 Fact/Number(1))

  slot def* [
    Def(slot name "x", slot value 3),
    Def(slot name "y", slot value 5))
  ]*
)

In these AG rules, the cntx $env attribute is still used, also if not necessary, for simulating more complex cases where a top-down/inherited attribute is needed.

We begin with common declarations.

data Prog-with-attrs::~ [

  data Env::Map(with From Prog.Name, with To Prog.Value)

  attr name::Prog.Name
  attr value::Prog.Value
  attr ok::Bool
  attr code::String
  cntx $env::Env
]

In this first example, env is calculated using normal functions. Dok is a FP language, and the result of the function can be cached. So it is efficient doing this. Then only the Prog.exp part of the tree is visited.

data Prog1::Prog-with-attrs [

  fun env::Env {
    var r::Env
    repeat {
      var d -in self.def*
      do r!add(d.name, d.value)
    }
    return r
  }

  visit self.exp -init {
    set self.$env self.env
  } -when ../Exp/Add {
    case {
      when self.arg2.ok
      do self.code.!append("${self.arg1.code} (ADD, ${arg2.value})")
    } else {
      do self.code.!append("(HALT, 0)")
    }
  } -when ../Exp/Fact {
    case {
      when self.ok
      do self.code.!append("(LOAD, ${self.value})
    } else {
      do self.code.!append("(HALT, 0)")
    }
  } -when ../Fact/Name {
    var value self.$env.get(self)
    case {
      when value
      set self.ok True
      set self.value value
    } else  {
      set self.ok False
    }
  } -when ../Fact/Number {
    set self.ok True
    set self.value self
  }
]

In Dok visit can check the slot of the AST. So a different semantic meaning can be given to different parts of the AST, having the same types. Using this feature, and forcing a full AG code, and using also collection attributes, the previous example can be rewritten in this way:

data Prog2::Prog-with-attrs [

  attr env::Env

  visit self -when /Prog.def*::Def {
    do self.env.!add(self.name, self.value)
  } -when /Prog.exp {
    set self.exp.$env self.def*.env
  } -when /Prog.exp::Exp/Add {
    case {
      when self.arg2.ok
      do self.code.!append("${self.arg1.code} (ADD, ${arg2.value})")
    } else {
      do self.code.!append("(HALT, 0)")
    }
  } -when /Prog.exp::Exp/Fact {
    case {
      when self.ok
      do self.code.!append("(LOAD, ${self.value})
    } else {
      do self.code.!append("(HALT, 0)")
    }
  } -when /Prog.exp::Fact/Name {
    var value self.$env.get(self)
    case {
      when value
      set self.ok True
      set self.value value
    } else  {
      set self.ok False
    }
  } -when /Prog.exp::Fact/Number {
    set self.ok True
    set self.value self
  }
]

This code can be translated to normal AG code, using an inherited attribute for checking which part of the tree the visitor is exploring. So it is only syntax sugar.

Using references

The symbol table env can be extended for containing references to the point where a variable is defined. In the case of Desk this is not useful, but for real-world language this can make sense. RAG allows to use references to another part of the AST.

Dok is mainly a value based language, but during visit one can use a reference to another point of the AST.

TODO decide the semantic in case of rewrite of the Tree TODO without “rewrite” one can copy the Tree in the attribute, like in HOAG, and the FP nature will compact the data

data Prog3::Prog [
  data Env::Map(with From Prog.Name, with To $Prog.def*::Def)

  attr name::Prog.Name
  attr value::Prog.Value
  attr ok::Bool
  attr code::String
  cntx $env::Env

  attr env::Env

  visit self -when /Prog.def*::Def {
    do self.env.!add(self.name, $self)
  } -when /Prog.exp {
    set self.exp.$env self.def*.env
  } -when /Prog.exp::Exp/Add {
    case {
      when self.arg2.ok
      do self.code.!append("${self.arg1.code} (ADD, ${arg2.value})")
    } else {
      do self.code.!append("(HALT, 0)")
    }
  } -when /Prog.exp::Exp/Fact {
    case {
      when self.ok
      do self.code.!append("(LOAD, ${self.value})
    } else {
      do self.code.!append("(HALT, 0)")
    }
  } -when /Prog.exp::Fact/Name {
    var value self.$env.get(self).$value
    case {
      when value
      set self.ok True
      set self.value value
    } else  {
      set self.ok False
    }
  } -when /Prog.exp::Fact/Number {
    set self.ok True
    set self.value self
  }
]

Note: in the line data Env::Map(with From Prog.Name, with To $Prog.def*::Def), $Prog.def*::Def can be replaced by a reference $Def, but in this way it is more precise because it identifies also the position inside the AST and the type of parent one can expect.

Note: in the line do self.env.!add(self.name, $self) we are updating a collection attribute. The computation must be repeated only once for each element of the Tree and the result must be independent respect the order of application. Ideally the compiler should check this.

TODO continue studying the examples

MAYBE: lesson learned: whenever we can express information inside an HOAG, then express in this way the majority of computations. Then they can be calculated and updated in a very efficient way.

MAYBE: in case of simple FP-like formula, if they are more natural expressed in this way, then they can be converte to AG form from the tools.

Links to this note