AST Instrumentation (examples by language)
Posted 29th January 2015
How to perform AST instrumentation in numerous programming languages.
Abstract Syntax Trees (ASTs) are an important tool used internally by compilers. ASTs are a structured, tree-based representation of parsed source code, typically used for semantic analysis and code generation. ASTs are a rich source of information about parsed code, encapsulating the code’s structure and semantic meaning whilst also often retaining information about the tokens in the original source file. The standard tree structure of ASTs makes them extremely simple to traverse and manipulate.
ASTs are used heavily in static analysis tools and program transformation tools. These tools typically analyse an AST and generate either other structured information or, in the case of source-to-source transformation tools, source code, either in the original input language or another language entirely.
In addition to static manipulation, ASTs can be modified and then converted to bytecode and executed. Modifying the AST in order to collect information when the code is executed is known as instrumenting the AST. The collected information can be used for a number of purposes, including profiling the runtime behaviour of code, performing code coverage tests, and generating visualisations that incorporate source code.
The sections below provide examples for a number of different programming languages, demonstrating how to generate ASTs, instrument those ASTs, and then convert the ASTs to bytecode and execute them.
C++ and Objective-C++
Clang is the LLVM compiler frontend for the C family of programming languages: C, C++, Objective-C, and Objective-C++. Both Clang and the LLVM backend are implemented as a set of modular C++ libraries, making all parts of the compiler’s internal functionality available for use in third-party tools.
Although it is possible to programmatically modify the Clang AST and then generate LLVM bytecode from it, Clang’s real strength is in performing source-to-source transformation. The powerful
Rewriter class facilitates easy modification of the original source code. The rewritten source code can then be compiled using Clang or any other C/C++/Objective-C compiler.
Creating the boilerplate code for performing source-to-source transformation with Clang is outside the scope of this article, but this article by Eli Bendersky provides all of the required details. (Note that the example code in the article is for an old version of Clang, see this source file for the up-to-date code.)
Assume that we have the following pair of template functions (note that the use of C++ templates is what prevents this method of instrumentation from working with pure C or Objective-C code):
We can use the
Rewriter in our AST visitor to wrap the source code for any AST node in a call to the probe function:
If a single source file is being instrumented, the instrumentation probe functions can be prepended to the source file. Otherwise, place the probe declarations in a header file, and prepend an
#include preprocessor directive at the start of each instrumented source file. The transformed source code can then be compiled and run.
- Clang homepage
- Article on basic source-to-source transformation with Clang by Eli Bendersky
- Eli Bendersky’s Clang & LLVM code samples GitHub repository provides a number of useful examples for working with the Clang & LLVM libraries
Assume we have the following probe function:
We can wrap AST nodes with a call to the probe function using the
replace() method of estraverse:
- Esprima homepage
- Documentation for the Mozilla SpiderMonkey Parser API on MDN
Prior to PHP 7, the PHP compiler did not use an AST internally. The PhpParser project provides a parser and AST API for PHP versions 5.2 through to 5.6. PhpParser provides all the functionality required to generate and instrument ASTs, and to turn ASTs back into PHP code that can be evaluated.
Assume we have the following probe function:
We can wrap AST nodes with calls to the probe function by implementing an AST visitor for use with PhpParser’s AST traversal functionality:
The instrumented AST can then be turned back into PHP code, and evaluated:
- The RFC detailing the AST implementation for PHP 7, written by the author of PhpParser, which states the limitations of the old (non-AST) parser
- PhpParser repository on GitHub
- PhpParser documentation on AST traversal functionality
Support for generating ASTs is built-in in Python, and can be accessed through the
To instrument any given node, we can generate a function call to wrap it, like so:
The instrumented AST can then be compiled to bytecode and executed, like so: