Week 22 / 2023
Python: VM
- a conceptual overview of how a python program is executed.
- I like to think of the execution of a python program as split into two or three main phases depending on how the interpreter is invoked.
- (1) Initialization: involves the set up of the various data structures needed by the python process.
- (2) Compilation: involves activities such as parsing source code to build syntax trees, creation of abstract syntax trees, building of symbol tables and generation of code objects.
- (3) Interpreting : This involves the actual execution of generated code objects within some context.
- The process of generating parse trees and abstract syntax trees from source code is language agnostic so the same methods that apply to other languages also apply to Python
- In summary, parse trees capture the complete syntax of the code, while abstract syntax trees provide a simplified and more abstract representation of the code's structure, excluding unnecessary details. ASTs are commonly used in compilers, interpreters, and static analysis tools to understand and manipulate the code at a higher level of abstraction. Both parse trees and abstract syntax trees (ASTs) are representations of the structure of a program or a piece of code. However, they differ in their level of detail and purpose.
- Parse Tree: A parse tree, also known as a concrete syntax tree, represents the hierarchical structure of a program based on its grammatical rules. It is generated during the parsing phase of the compilation process. The parse tree includes all the syntactic elements of the code, such as keywords, operators, parentheses, and identifiers, as well as the order in which they appear. It reflects the exact syntax of the code and captures all the details specified by the grammar rules. Parse trees can be quite large and verbose.
- Abstract Syntax Tree (AST): An abstract syntax tree, also called a syntax tree or AST, represents the structure of a program in a more abstract and simplified manner. It abstracts away some of the details of the concrete syntax and focuses on the essential elements and their relationships. ASTs are typically generated as an intermediate representation during the compilation or interpretation process.
- the process of building symbol tables and generating code objects, python objects, frame objects, code objects, function objects, python opcodes, the interpreter loop, generators and user defined classes?
- how the CPython virtual machine functions.
The view from 30.000 feet
- a high level expose on how the interpreter goes about executing a python program.
- The python executable is a C program just like any other C program such as the linux kernel or a simple hello world program in C so pretty much the same process happens when the python executable is invoked.
- ..the python executable’s main method is run just like any other C program. located at
./Programs/python.c
- The main function then calls the
Py_Main
function located in the./Modules/main.c
which handles the interpreter initialization process - parsing commandline arguments and setting program flags², reading environment variables, running hooks, carrying out hash randomization etc. - As part of the initialization process,
Py_Initialize
frompylifecycle.c
is called; this handles the initialization of the interpreter and thread state data structures - two very important data structures. - The initialization process also sets up the import mechanisms as well as rudimentary stdio.
- The
./Include/opcodes.h
header file contains a full listing of all the instruction/opcodes for the python virtual machine. The opcodes are pretty straight forward conceptually.