I took advantage of the multi-paradigm nature of Racket and Lisp, for parsing Dok code.
I used Brag for parsing Dok. Then I converted the parsed tree to an Abstract Syntax Tree (AST) in the RACR format.
This is an example of Dok code to parse:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
data A [
fun f::Int {
return 0
}
variant ../B [
fun f::Int -override {
return 1
}
]
variant ../C [
fun f::Int -override {
return 2
}
]
]
var b::A/B
assert b.f == 1
var a::A
assert a.f == 0
|
The grammar in Brag is rather easy to write
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
code: stmts NEWLINE*
stmts:
NEWLINE+ (stmt NEWLINE+)+
| stmt
stmt:
"return" expr
| "assert" expr
| type-decl
| var-decl
var-decl:
"var" ID ("::" type-ref)? CMD-REST? expr?
type-decl:
"data" TYPE type-params? CMD-REST? ("[" data-content "]")?
data-content:
data-part ("," data-part)*
| NEWLINE+ (data-part NEWLINE+)*
data-part:
slot-decl
| variant-decl
| fun-decl
| type-decl
; TO CONTINUE ...
|
The corresponding AST written in RACR formalism has a very strange syntax, so I take advantage of the flexibility of S-expressions in Lisp, and I wrote it in a Dok-like formalism
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
(define-ast dok-spec
'(Code
~ :: Stmt*)
'(Stmt
(Return ~ :: Expr)
(Assert ~ :: Expr)
(Data-decl
type :: $String
~ :: Type-param*
cmd-rest :: $String
~ :: Data-part*)
(Var-decl
id :: $String
~ :: Type-ref
cmd-rest :: $String
~ :: Expr)
)
'(Data-part
(Slot-decl
id :: $String
~ :: Type-ref
~ :: Expr)
(Variant-decl
type-name :: $String
cmd-rest :: $String
~ :: Data-part*)
(Fun-decl
id :: $String
~ :: Fun-param-decl*
Result :: Type-ref
cmd-rest :: $String
~ :: Stmt*)
(Nested-data-decl
~ :: Stmt/Data-decl))
'(Fun-param-decl
id :: $String
~ :: Type-ref)
'(Type-ref
paths :: Type-path*
variants :: $ListOfString)
'(Type-path
type :: $String
~ :: Type-param*)
; TO CONTINUE ...
|
Then I wrote a simple walk function that converts this to the specific RACR syntax.
Now I had to convert the Syntax Tree to the AST. They are not much different, so I specified a list of S-expressions rewriting rules. This is an example of the rewriting rules:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
(define (dok-syntax-tree->AST stx)
; ...
(define (remove-variant-tokens xs)
(filter (lambda (x) (not (and (string? x) (string=? x "/")))) xs))
(define (remove-path-tokens xs)
(filter (lambda (x) (not (and (string? x) (string=? x ".")))) xs))
; ...
(define simplify
(replace-all-repeated
(list 'code xs ...) --> (list* 'Code (remove-extra-tokens xs))
(list 'stmts stmts ...) --> (remove-extra-tokens stmts)
(list 'data-part (list 'slot-decl "slot" x "::" tr)) --> (list 'Data-part/Slot-decl x tr)
(list 'data-part (list 'variant-decl "variant" "../" variant "[" content "]")) --> (list 'Data-part/Variant-decl variant "" content)
(list 'data-part (list 'fun-decl "fun" f "::" tr cmd-rest "{" stmts "}")) --> (list 'Data-part/Fun-decl f '() tr cmd-rest stmts)
(list 'data-part (list 'fun-decl "fun" f "::" tr "{" stmts "}")) --> (list 'Data-part/Fun-decl f '() tr "" stmts)
(list 'data-content cnt ...) --> (remove-extra-tokens cnt)
(list 'stmt (list 'var-decl "var" x "::" tr)) --> (list 'Stmt/Var-decl x tr "" (list 'Expr/Default-value))
(list 'stmt (list 'var-decl "var" x "::" tr cmd-rest)) --> (list 'Stmt/Var-decl x tr cmd-rest (list 'Expr/Default-value))
(list 'stmt "assert" expr) --> (list 'Stmt/Assert expr)
(list 'stmt "return" expr) --> (list 'Stmt/Return expr)
(list 'stmt (list 'type-decl "data" type-name type-params "[" (list 'data-content xs ...) "]")) --> (list 'Stmt/Data-decl type-name type-params "" (remove-extra-tokens xs))
(list 'stmt (list 'type-decl "data" type-name "[" (list 'data-content xs ...) "]")) --> (list 'Stmt/Data-decl type-name '() "" (remove-extra-tokens xs))
; ...
))
|
In Lisp, every time there is a task to solve, one can use the best approach/paradigm, and it is rather easy writing data/code transformations utilities for gluing all together, because it is all in the same S-expressions format.