Unknown state [2025-08-04]
This commit is contained in:
194
md/python.md
Normal file
194
md/python.md
Normal file
@@ -0,0 +1,194 @@
|
||||
Title: Python from Scratch
|
||||
Date: 2025-02-01
|
||||
Summary: Building up the Python Data Model from scratch.
|
||||
|
||||
* ** * ** * ** * ** * ** * ** * ** * ** * ** * **
|
||||
|
||||
# Learning to Read
|
||||
|
||||
---
|
||||
|
||||
**First:** A Python program is made up of _tokens_; you can think of these as "
|
||||
words". Some examples of tokens:
|
||||
|
||||
- `"hello world"`
|
||||
- `6`
|
||||
- `(`
|
||||
- `while`
|
||||
- `print`
|
||||
|
||||
Generally there are four types of token in Python, although in practice the
|
||||
lines between them get blurred a little bit.
|
||||
|
||||
- _Literals_ literally represent some value. `"hello world"` and `6` and `4.2`
|
||||
are examples of such literals; the first represents some text and the others
|
||||
represent numbers. This is _literal_ as opposed to some indirect
|
||||
representation like `4 + 2` or `"hello" + " " + "world"`.
|
||||
|
||||
- _Operators_ include things like math operators `+`, `-`, `*`, but also things
|
||||
like the function call operator `( )`, boolean operators `and`, and myriad
|
||||
other operators. [There's a comprehensive list here][expressions] but beware -
|
||||
there's a lot and some of them are pretty technical. The main point is that
|
||||
`( )` and `+` are the same _kind of thing_ as far as the Python interpreter is
|
||||
concerned.
|
||||
|
||||
- _Keywords_ are special directives that tell Python how to behave. This
|
||||
includes things like `if` and `def` and `while`. Technically, operators are
|
||||
also keywords (for example `and` is a keyword) but that's not super relevant
|
||||
here.
|
||||
|
||||
- ___Names___ are the last - and most important - kind of token. `print` is a
|
||||
name. Variable names are names. Function names are names. Class names are
|
||||
names. Module names are names. In all cases, a name represents some _thing_,
|
||||
and Python can fetch that thing if given its name.
|
||||
|
||||
[expressions]: https://docs.python.org/3/reference/expressions.html
|
||||
|
||||
So if I give Python this code:
|
||||
|
||||
```py
|
||||
x = "world"
|
||||
print("hello " + x)
|
||||
```
|
||||
|
||||
You should first identify the tokens:
|
||||
|
||||
- _Name_ `x`
|
||||
- _Operator_ `=`
|
||||
- _Literal_ `"world"`
|
||||
- _Name_ `print`
|
||||
- _Operator_ `( )`
|
||||
- _Literal_ `"hello "`
|
||||
- _Operator_ `+`
|
||||
- _Name_ `x`
|
||||
|
||||
The first line of code binds `"world"` to the name `x`.
|
||||
|
||||
The expression `"hello " + x` looks up the value named by `x` and concatenates
|
||||
it with the literal value `"hello "`. This produces the string `"hello world"`.
|
||||
|
||||
The expression `print( ... )` looks up the value - the function - named by
|
||||
`print` and uses the `( )` operator to call it with the string `"hello world"`.
|
||||
|
||||
To be crystal clear: `x` and `print` _are the same kind of token_, it's just
|
||||
that their named values have different types. One is a string, the other a
|
||||
function. The string can be _operated on_ with the `+` operator, and the
|
||||
function can be _operated on_ with the `( )` operator.
|
||||
|
||||
It is valid to write `print(print)`; here we are looking up the name `print`,
|
||||
and passing that value to the function named by `print`. This should be no more
|
||||
or less surprising than being able to write `x + x` or `5 * 4`.
|
||||
|
||||
# Namespaces
|
||||
|
||||
**First-and-a-half:** A _namespace_ is a collection of names.
|
||||
|
||||
You might also hear this called a "scope". This is the reason I say "maybe three
|
||||
or four, depending how you count"; this is really part of that fundamental idea
|
||||
of a _name_, but I'll list it separately to be extra clear.
|
||||
|
||||
There are some special structures in Python that introduce new namespaces. Each
|
||||
_module_ has a "global" namespace; these are names that can be referenced
|
||||
anywhere in a given file or script. Each _function_ has a "local" namespace;
|
||||
these are names that can only be accessed within the function.
|
||||
|
||||
For example:
|
||||
|
||||
```py
|
||||
x = "eggs"
|
||||
|
||||
|
||||
def spam():
|
||||
y = "ham"
|
||||
|
||||
# I can print(x) here.
|
||||
|
||||
# But I cannot print(y) here.
|
||||
```
|
||||
|
||||
Objects also have namespaces. Names on objects are called "attributes", and they
|
||||
may be simple values or functions, just how regular names might be simple
|
||||
values (`x`, `y`) or functions (`print`, `spam`). You access attributes with the
|
||||
`.` operator.
|
||||
|
||||
```py
|
||||
obj = range(10)
|
||||
print(
|
||||
obj.stop) # find the value named by `obj`, then find the value named by `stop`. 10.
|
||||
```
|
||||
|
||||
Finally, there is the built-in namespace. These are names that are accessible
|
||||
always, from anywhere, by default. Names like `print` and `range` are defined
|
||||
here. [Here's a comprehensive list of built-in names](https://docs.python.org/3/library/functions.html).
|
||||
|
||||
# Strings
|
||||
|
||||
**Second:** you asked about characters and letters, so you may appreciate some
|
||||
background on strings.
|
||||
|
||||
A _string_ is a sequence of characters. A _character_ is simply a number to
|
||||
which we, by convention, assign some meaning. For example, by convention, we've
|
||||
all agreed that the number `74` means `J`. This convention is called an
|
||||
_encoding_. The default encoding is called UTF-8 and is specified by a committee
|
||||
called the _Unicode Consortium_. This encoding includes characters from many
|
||||
current and ancient languages, various symbols and typographical marks, emojis,
|
||||
flags, etc. The important thing to remember is each one of these things, really,
|
||||
is just an integer. And all our devices just agree that when they see a given
|
||||
integer they will look up the appropriate symbol in an appropriate font.
|
||||
|
||||
You can switch between the string representation and the numerical
|
||||
representation with the `encode` and `decode` methods on strings. Really, these
|
||||
are the same, you're just telling Python to tell your console to draw them
|
||||
differently.
|
||||
|
||||
```py
|
||||
>> > list('Fizz'.encode())
|
||||
[70, 105, 122, 122]
|
||||
>> > bytes([66, 117, 122, 122]).decode()
|
||||
'Buzz'
|
||||
```
|
||||
|
||||
For continuity: `list`, `encode`, `decode`, and `bytes` are all names. `( )`,
|
||||
`[ ]`, `,`, and `.` are all operators. The numbers and `'Fizz'` are literals.
|
||||
|
||||
† Technically, `[66, 117, 122, 122]` in its entirety is a literal - `,` is a
|
||||
keyword, not an operator - but that's neither here nor there for these purposes.
|
||||
|
||||
‡ The symbol `†` is number 8224 and the symbol `‡` is number 8225.
|
||||
|
||||
# Names
|
||||
|
||||
**Second-and-a-half:** names are strings.
|
||||
|
||||
Names are just strings, and namespaces are just `dict`. You can access them with
|
||||
`locals()` and `globals()`, although in practice you almost never need to do
|
||||
this directly. It's better to just use the name itself.
|
||||
|
||||
```py
|
||||
import pprint
|
||||
|
||||
x = range(10)
|
||||
function = print
|
||||
pprint.pprint(globals())
|
||||
```
|
||||
|
||||
This outputs:
|
||||
|
||||
```
|
||||
{'__annotations__': {},
|
||||
'__builtins__': <module 'builtins' (built-in)>,
|
||||
'__cached__': None,
|
||||
'__doc__': None,
|
||||
'__file__': '<stdin>',
|
||||
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
|
||||
'__name__': '__main__',
|
||||
'__package__': None,
|
||||
'__spec__': None,
|
||||
'function': <built-in function print>,
|
||||
'pprint': <module 'pprint' from 'python3.12/pprint.py'>,
|
||||
'x': range(0, 10)}
|
||||
```
|
||||
|
||||
For continuity: `import pprint` binds the name `pprint` to the module
|
||||
`pprint.py` from the standard library. The line `pprint.pprint( ... )` fetches
|
||||
the function `pprint` from that module, and calls it.
|
Reference in New Issue
Block a user