Pharo VM Transpiler: My First Six Weeks

Progress done while contributing for Google Summer of Code

In this article I'll be sharing the progress made during my first six weeks of the coding period on the GSoC program in which I've been contributing to the Pharo-to-C VM Transpiler project.

My goals for this period

My proposal mentioned several inlining issues which were meant to help me "catch up" with the project however, during the community bonding period my mentors and I decided to steer into issues centered around preventing translation errors as they were a much better fit for my proposal's theme.

Some context about Slang's ASTs

As part of its transpilation pipeline Slang uses three different ASTs and understanding their responsibilities is key

Slang transpilation pipeline

RbAST (refactoring browser AST) is the exact representation of a Pharo project and, as its name indicates, it is used by the IDE for tasks such as refactoring, suggestions, etc.

TAST is an intermediate representation between Pharo and C, its nodes hold information that serves the C transpilation such as type system information, on top of that this AST also goes through several transformations such as renamings, inlinings, etc.

CAST is the C code representation, its sole responsibility is to be able to write C code.

Development

Locals type declarations

Pull Requests: #603

Slang has several pragmas (or annotations) for providing information needed for translations. When declaring a variable or an argument (a.k.a. a local) in Slang you must use the var:type: pragma so the generated C code can type the corresponding variable.

In this example, we are using var:type: to declare the type of the foo temporary variable

AClass >> aMethod
| foo |
<var: 'foo' type: 'char *'>

which would be translated to

void someMethod() {
    char *foo;
}

A validation was implemented so that declaring a variable/argument without adding its corresponding type declarations will throw an error thus avoiding a problem that would arise when trying to compile the generated C code.

Instance variables type declarations

Pull Requests: #607

Instance variables are the object's attributes, however, in C this isn't so obvious because classes can be translated in two very different manners

  1. Most classes are translated as a set of functions and they declare a set of global variables

  2. In some special cases some classes are marked as structs (this happens when they inherit the SlangStructType class), here instance variables are translated as members of the struct.

This development was focused on structs, the way types are assigned to instance variables is by implementing a method (on the class side) called instVarTypeDeclarationsDo that matches instance variable names with their type.

A validation in the SlangStructType was implemented so that when emitting the struct code if any instance variable didn't have a corresponding type it fails, in a similar fashion to the last issue this prevents problems down the line.

C Reserved Words

Pull Requests: #613

Due to Smalltalk and C having different keywords, some C keywords may be used as identifiers which can cause the compilation to fail (this issue had already been encountered, read issue #429)

For example, the following method would break when trying to compile C because register is a reserved word in C

AClass >> aMethod
| register |

This pull request implemented several validations when creating the TAST nodes that will throw an error when some identifier (selector, argument, temporary variable, etc) would conflict with a C keyword.

C Conflicts Renamings

Pull Requests: #624, #646

As a continuation of the previous work, a possible improvement was to, instead of throwing an error, automatically rename the identifier during the translation thus giving the developer complete freedom when naming.

To implement this behavior I modified the method CCodeGenerator>>emitCCodeOn:doInlining:doAssertions: that orchestrates a huge part of the translation, it collects all the necessary TMethods (the TAST node that represents a method), does the inlinings and then emits the corresponding CAST and C code for the whole program. This worked great because I could perform the renames right before any code was emitted.

The renamings by themselves were pretty straightforward as the idea of renaming a variable or a selector already existed, the only logic I had to add was renamings for instance variables.

After finishing with renaming C keywords conflicts I went and did the same for conflicting selectors, which would conflict if they had the same name but with different arguments. Take this example where we define aMethod and aMethod:

AClass >> aMethod
   ^true

AClass >> aMethod: anArgument
   ^true

When translating them to functions both function names would be aMethod which would cause a conflict and fail when compiling and the same could be said for locals and other selectors.

The final result for the renamings ended up like this

  • Identifier conflicting with keyword: Appends an "_1" to it.

  • Selector conflicting with other selectors: Append the number of arguments to the conflicting selectors, in the previous example the functions would end up like aMethod0 and aMethod1.

  • Local variable conflicting with selectors: Precede it with an "l_".

Expectations for the remaining weeks

Although I got pretty sidetracked with renamings I'm really happy with the result and it's helped me understand Slang much better. The plan is to continue with type-related issues, such as type validations or type-guided translations.