Files
zig-experiments/md/python.md

195 lines
6.6 KiB
Markdown

Title: Python from Scratch
Date: 2025-02-01
Summary: Building up the Python Data Model from scratch.
* ** * ** * ** * ** * ** * ** * ** * ** * ** * **
# Learning to Read
---
**First:** A Python program is made up of _tokens_; you can think of these as "
words". Some examples of tokens:
- `"hello world"`
- `6`
- `(`
- `while`
- `print`
Generally there are four types of token in Python, although in practice the
lines between them get blurred a little bit.
- _Literals_ literally represent some value. `"hello world"` and `6` and `4.2`
are examples of such literals; the first represents some text and the others
represent numbers. This is _literal_ as opposed to some indirect
representation like `4 + 2` or `"hello" + " " + "world"`.
- _Operators_ include things like math operators `+`, `-`, `*`, but also things
like the function call operator `( )`, boolean operators `and`, and myriad
other operators. [There's a comprehensive list here][expressions] but beware -
there's a lot and some of them are pretty technical. The main point is that
`( )` and `+` are the same _kind of thing_ as far as the Python interpreter is
concerned.
- _Keywords_ are special directives that tell Python how to behave. This
includes things like `if` and `def` and `while`. Technically, operators are
also keywords (for example `and` is a keyword) but that's not super relevant
here.
- ___Names___ are the last - and most important - kind of token. `print` is a
name. Variable names are names. Function names are names. Class names are
names. Module names are names. In all cases, a name represents some _thing_,
and Python can fetch that thing if given its name.
[expressions]: https://docs.python.org/3/reference/expressions.html
So if I give Python this code:
```py
x = "world"
print("hello " + x)
```
You should first identify the tokens:
- _Name_ `x`
- _Operator_ `=`
- _Literal_ `"world"`
- _Name_ `print`
- _Operator_ `( )`
- _Literal_ `"hello "`
- _Operator_ `+`
- _Name_ `x`
The first line of code binds `"world"` to the name `x`.
The expression `"hello " + x` looks up the value named by `x` and concatenates
it with the literal value `"hello "`. This produces the string `"hello world"`.
The expression `print( ... )` looks up the value - the function - named by
`print` and uses the `( )` operator to call it with the string `"hello world"`.
To be crystal clear: `x` and `print` _are the same kind of token_, it's just
that their named values have different types. One is a string, the other a
function. The string can be _operated on_ with the `+` operator, and the
function can be _operated on_ with the `( )` operator.
It is valid to write `print(print)`; here we are looking up the name `print`,
and passing that value to the function named by `print`. This should be no more
or less surprising than being able to write `x + x` or `5 * 4`.
# Namespaces
**First-and-a-half:** A _namespace_ is a collection of names.
You might also hear this called a "scope". This is the reason I say "maybe three
or four, depending how you count"; this is really part of that fundamental idea
of a _name_, but I'll list it separately to be extra clear.
There are some special structures in Python that introduce new namespaces. Each
_module_ has a "global" namespace; these are names that can be referenced
anywhere in a given file or script. Each _function_ has a "local" namespace;
these are names that can only be accessed within the function.
For example:
```py
x = "eggs"
def spam():
y = "ham"
# I can print(x) here.
# But I cannot print(y) here.
```
Objects also have namespaces. Names on objects are called "attributes", and they
may be simple values or functions, just how regular names might be simple
values (`x`, `y`) or functions (`print`, `spam`). You access attributes with the
`.` operator.
```py
obj = range(10)
print(
obj.stop) # find the value named by `obj`, then find the value named by `stop`. 10.
```
Finally, there is the built-in namespace. These are names that are accessible
always, from anywhere, by default. Names like `print` and `range` are defined
here. [Here's a comprehensive list of built-in names](https://docs.python.org/3/library/functions.html).
# Strings
**Second:** you asked about characters and letters, so you may appreciate some
background on strings.
A _string_ is a sequence of characters. A _character_ is simply a number to
which we, by convention, assign some meaning. For example, by convention, we've
all agreed that the number `74` means `J`. This convention is called an
_encoding_. The default encoding is called UTF-8 and is specified by a committee
called the _Unicode Consortium_. This encoding includes characters from many
current and ancient languages, various symbols and typographical marks, emojis,
flags, etc. The important thing to remember is each one of these things, really,
is just an integer. And all our devices just agree that when they see a given
integer they will look up the appropriate symbol in an appropriate font.
You can switch between the string representation and the numerical
representation with the `encode` and `decode` methods on strings. Really, these
are the same, you're just telling Python to tell your console to draw them
differently.
```py
>> > list('Fizz'.encode())
[70, 105, 122, 122]
>> > bytes([66, 117, 122, 122]).decode()
'Buzz'
```
For continuity: `list`, `encode`, `decode`, and `bytes` are all names. `( )`,
`[ ]`, `,`, and `.` are all operators. The numbers and `'Fizz'` are literals.
† Technically, `[66, 117, 122, 122]` in its entirety is a literal - `,` is a
keyword, not an operator - but that's neither here nor there for these purposes.
‡ The symbol `†` is number 8224 and the symbol `‡` is number 8225.
# Names
**Second-and-a-half:** names are strings.
Names are just strings, and namespaces are just `dict`. You can access them with
`locals()` and `globals()`, although in practice you almost never need to do
this directly. It's better to just use the name itself.
```py
import pprint
x = range(10)
function = print
pprint.pprint(globals())
```
This outputs:
```
{'__annotations__': {},
'__builtins__': <module 'builtins' (built-in)>,
'__cached__': None,
'__doc__': None,
'__file__': '<stdin>',
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__name__': '__main__',
'__package__': None,
'__spec__': None,
'function': <built-in function print>,
'pprint': <module 'pprint' from 'python3.12/pprint.py'>,
'x': range(0, 10)}
```
For continuity: `import pprint` binds the name `pprint` to the module
`pprint.py` from the standard library. The line `pprint.pprint( ... )` fetches
the function `pprint` from that module, and calls it.