Pharo VM Transpiler: Wrapping up

Progress done while contributing for Google Summer of Code

Another 6 weeks have passed! This is all the work done on the second half of my GSoC project

Development

Wrapping up validations

Pull Requests: #662

In the previous article I went pretty in-depth on validations, however, one small feature that I was eager to implement since starting the project was displaying these validations on the IDE so that the developer would save even more time compared to throwing an error during the transpilation.

The most natural way of displaying this information was as a linter rule, in this case, a new linter rule was added to check for redundant type declarations (for more details on the validation check my previous article).

Implementation

Linter rules in Pharo are implemented as classes, which must be a subclass of ReAbstractRule, this provides you with a simple interface where a method receives a node and answers whether that RbAST node is valid or not

ReSlangRedundantTypeDeclarationRule >> basicCheck: aNode [

    ^ aNode isPragma and: [
          aNode isTypeDefinition and: [
              (aNode methodNode allDefinedVariables includes:
                  (aNode argumentAt: #var:) value) not ] ]
]

And that's it, this is how the linter rule shows up!

Type guided translations

Pull Requests: #683 (pending as of writing this article)

The issue

The last feature I worked on was based on a preexisting issue that had to do with the '&' operator in Pharo, to understand it let's use this code example

AClass >> aMethod
    | result1 result2 |
    result1 := self anOperation.
    result2 := self anotherOperation.
    ^result1 & result2

How should the last line be translated? Well it actually depends on the type, when result1 and result2 are booleans the & is a logical and which is translated as && in C. On the other hand, if they are numbers then the & is a bit and, which is represented with the & symbol.

Tapping into the translation pipeline

To get this behavior I had to modify the CAST generation, many operations are actually "intercepted" by this dictionary to add logic before the CAST. We can use it to call a method generateCASTInferredAnd which handles the type guided translation.

CCodeGenerator >> initializeCASTTranslationDictionary [
    | pairs |

    castTranslationDict := Dictionary new: 200.
    pairs := #(
    #&                #generateCASTInferredAnd:
    #|                #forbiddenSelector:
    #abs            #generateCASTAbs:
    #and:            #generateCASTSequentialAnd:
(...)

Dealing with string-based types

The biggest issue for this implementation was dealing with types, because these are handled as strings, for example, take a look at the way a constant's type is inferred

TConstantNode >> typeOrNilFrom: aCodeGenerator in: aTMethod [
    | hb |
    value isInteger
        ifTrue:
            [value positive
                ifTrue:
                    [hb := value highBit.
                    hb < 32 ifTrue: [^#int].
                    hb = 32 ifTrue: [^#'unsigned int'].
                    hb = 64 ifTrue: [^#'unsigned long long'].
                    ^#'long long']
                ifFalse:
                    [hb := value bitInvert highBit.
                    hb < 32 ifTrue: [^#int].
                    ^#'long long']].
    value isFloat ifTrue: [^#double].
    (#(nil true false) includes: value) ifTrue: [^#int].
    (value isString and: [value isSymbol not]) ifTrue: [^#'char *'].
    ^nil
]

Not only are they all strings but this example displays the biggest issue, the boolean type is the same as the number type, which is int. The solution we came up with was using objects that would wrap the string type and also be able to answer if they are boolean or not. This worked great however it needed changing a bunch of logic around the type system.

An important note is that this PR only transforms some strings into objects, the co-living of these representations is painful and transitioning completely to objects is a must.

Generating the CAST based on type

Once we can deduce if the type is boolean or not the actual type-guided translation has a simple implementation

CCodeGenerator >> generateCASTInferredAnd: aTSendNode [
    | receiverType argumentType |

    "fetch types"
    receiverType := self
                        tTypeFor: aTSendNode receiver
                        in: self currentMethod.
    argumentType := self
                        tTypeFor: aTSendNode arguments first
                        in: self currentMethod.

    "if types are different then throw an error"
    receiverType ~= argumentType ifTrue: [ TypeError signal: 'Cannot infer & type'].

    "translate depending on type"
    ^ receiverType isBoolean
          ifTrue: [ self generateCASTAnd: aTSendNode ]
          ifFalse: [ self generateCASTBitAnd: aTSendNode ]

Now that the types are reified, this could be implemented for any operation!

Conclusions

Wrapping up my GSoC when I look back at my proposal although it wasn't a rigid path, the main goal of improving the development experience was achieved in several ways!

Lastly, I'd like to thank my mentors, they helped me every step of the way.