I want to programmatically edit python source code. Basically I want to read a
.py file, generate the AST, and then write back the modified python source code (i.e. another
There are ways to parse/compile python source code using standard python modules, such as
compiler. However, I don’t think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.
UPDATE: The reason I want to do this is I’d like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.
Both these tools uses the lib2to3 library which is a implementation of the python parser/compiler machinery that can preserve comments in source when it’s round tripped from source -> AST -> source.
The rope project may meet your needs if you want to do more refactoring like transforms.
The ast module is your other option, and there’s an older example of how to “unparse” syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.
The redbaron project also may be a good fit (ht Xavier Combelle)
The builtin ast module doesn’t seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so.
import ast import codegen expr=""" def foo(): print("hello world") """ p=ast.parse(expr) p.body.body = [ ast.parse("return 42").body ] # Replace function body with "return 42" print(codegen.to_source(p))
This will print:
def foo(): return 42
Note that you may lose the exact formatting and comments, as these are not preserved.
However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.
You might not need to re-generate source code. That’s a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:
If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don’t want to change it into an AST and back because you’ll lose
all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together)(ast nodes have
col_offsetattributes) comments. Instead, you’ll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland’s MetaPython extension.
If you are trying to make a change during compilation of a module, note that you don’t have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.
But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I’m a bit of a skeptic that you’ve found the first good use-case. 🙂
Update: now that you’ve explained what you’re trying to do, I’d be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?
In a different answer I suggested using the
astor package, but I have since found a more up-to-date AST un-parsing package called
>>> import ast >>> import astunparse >>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x'))) def foo(x): return (2 * x)
I have tested this on Python 3.5.
I’ve created recently quite stable (core is really well tested) and extensible piece of code which generates code from
ast tree: https://github.com/paluh/code-formatter .
I’m using my project as a base for a small vim plugin (which I’m using every day), so my goal is to generate really nice and readable python code.
I’ve tried to extend
codegen but it’s architecture is based on
ast.NodeVisitor interface, so formatters (
visitor_ methods) are just functions. I’ve found this structure quite limiting and hard to optimize (in case of long and nested expressions it’s easier to keep objects tree and cache some partial results – in other way you can hit exponential complexity if you want to search for best layout). BUT
codegen as every piece of mitsuhiko’s work (which I’ve read) is very well written and concise.
We had a similar need, which wasn’t solved by other answers here. So we created a library for this, ASTTokens, which takes an AST tree produced with the ast or astroid modules, and marks it with the ranges of text in the original source code.
It doesn’t do modifications of code directly, but that’s not hard to add on top, since it does tell you the range of text you need to modify.
For example, this wraps a function call in
WRAP(...), preserving comments and everything else:
example = """ def foo(): # Test '''My func''' log("hello world") # Print """ import ast, asttokens atok = asttokens.ASTTokens(example, parse=True) call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call)) start, end = atok.get_text_range(call) print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end]) + atok.text[end:])
def foo(): # Test '''My func''' WRAP(log("hello world")) # Print
Hope this helps!
Parsing and modifying the code structure is certainly possible with the help of
ast module and I will show it in an example in a moment. However, writing back the modified source code is not possible with
ast module alone. There are other modules available for this job such as one here.
NOTE: Example below can be treated as an introductory tutorial on the usage of
ast module but a more comprehensive guide on using
ast module is available here at Green Tree snakes tutorial and official documentation on
>>> import ast >>> tree = ast.parse("print 'Hello Python!!'") >>> exec(compile(tree, filename="<ast>", mode="exec")) Hello Python!!
You can parse the python code (represented in string) by simply calling the API
ast.parse(). This returns the handle to Abstract Syntax Tree (AST) structure. Interestingly you can compile back this structure and execute it as shown above.
Another very useful API is
ast.dump() which dumps the whole AST in a string form. It can be used to inspect the tree structure and is very helpful in debugging. For example,
On Python 2.7:
>>> import ast >>> tree = ast.parse("print 'Hello Python!!'") >>> ast.dump(tree) "Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"
On Python 3.5:
>>> import ast >>> tree = ast.parse("print ('Hello Python!!')") >>> ast.dump(tree) "Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=))])"
Notice the difference in syntax for print statement in Python 2.7 vs. Python 3.5 and the difference in type of AST node in respective trees.
How to modify code using
Now, let’s a have a look at an example of modification of python code by
ast module. The main tool for modifying AST structure is
ast.NodeTransformer class. Whenever one needs to modify the AST, he/she needs to subclass from it and write Node Transformation(s) accordingly.
For our example, let’s try to write a simple utility which transforms the Python 2 , print statements to Python 3 function calls.
Print statement to Fun call converter utility: print2to3.py:
#!/usr/bin/env python ''' This utility converts the python (2.7) statements to Python 3 alike function calls before running the code. USAGE: python print2to3.py <filename> ''' import ast import sys class P2to3(ast.NodeTransformer): def visit_Print(self, node): new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()), args=node.values, keywords=, starargs=None, kwargs=None)) ast.copy_location(new_node, node) return new_node def main(filename=None): if not filename: return with open(filename, 'r') as fp: data = fp.readlines() data = ''.join(data) tree = ast.parse(data) print "Converting python 2 print statements to Python 3 function calls" print "-" * 35 P2to3().visit(tree) ast.fix_missing_locations(tree) # print ast.dump(tree) exec(compile(tree, filename="p23", mode="exec")) if __name__ == '__main__': if len(sys.argv) <=1: print ("\nUSAGE:\n\t print2to3.py <filename>") sys.exit(1) else: main(sys.argv)
This utility can be tried on small example file, such as one below, and it should work fine.
Test Input file : py2.py
class A(object): def __init__(self): pass def good(): print "I am good" main = good if __name__ == '__main__': print "I am in main" main()
Please note that above transformation is only for
ast tutorial purpose and in real case scenario one will have to look at all different scenarios such as
print " x is %s" % ("Hello Python").
A Program Transformation System is a tool that parses source text, builds ASTs, allows you to modify them using source-to-source transformations (“if you see this pattern, replace it by that pattern”). Such tools are ideal for doing mutation of existing source codes, which are just “if you see this pattern, replace by a pattern variant”.
Of course, you need a program transformation engine that can parse the language of interest to you, and still do the pattern-directed transformations. Our DMS Software Reengineering Toolkit is a system that can do that, and handles Python, and a variety of other languages.
See this SO answer for an example of a DMS-parsed AST for Python capturing comments accurately. DMS can make changes to the AST, and regenerate valid text, including the comments. You can ask it to prettyprint the AST, using its own formatting conventions (you can changes these), or do “fidelity printing”, which uses the original line and column information to maximally preserve the original layout (some change in layout where new code is inserted is unavoidable).
To implement a “mutation” rule for Python with DMS, you could write the following:
rule mutate_addition(s:sum, p:product):sum->sum = " \s + \p " -> " \s - \p" if mutate_this_place(s);
This rule replace “+” with “-” in a syntactically correct way; it operates on the AST and thus won’t touch strings or comments that happen to look right. The extra condition on “mutate_this_place” is to let you control how often this occurs; you don’t want to mutate every place in the program.
You’d obviously want a bunch more rules like this that detect various code structures, and replace them by the mutated versions. DMS is happy to apply a set of rules. The mutated AST is then prettyprinted.
One of the other answers recommends
codegen, which seems to have been superceded by
astor. The version of
astor on PyPI (version 0.5 as of this writing) seems to be a little outdated as well, so you can install the development version of
astor as follows.
pip install git+https://github.com/berkerpeksag/astor.git#egg=astor
Then you can use
astor.to_source to convert a Python AST to human-readable Python source code:
>>> import ast >>> import astor >>> print(astor.to_source(ast.parse('def foo(x): return 2 * x'))) def foo(x): return 2 * x
I have tested this on Python 3.5.