195 lines
6.6 KiB
Markdown
195 lines
6.6 KiB
Markdown
Title: Python from Scratch
|
|
Date: 2025-02-01
|
|
Summary: Building up the Python Data Model from scratch.
|
|
|
|
* ** * ** * ** * ** * ** * ** * ** * ** * ** * **
|
|
|
|
# Learning to Read
|
|
|
|
---
|
|
|
|
**First:** A Python program is made up of _tokens_; you can think of these as "
|
|
words". Some examples of tokens:
|
|
|
|
- `"hello world"`
|
|
- `6`
|
|
- `(`
|
|
- `while`
|
|
- `print`
|
|
|
|
Generally there are four types of token in Python, although in practice the
|
|
lines between them get blurred a little bit.
|
|
|
|
- _Literals_ literally represent some value. `"hello world"` and `6` and `4.2`
|
|
are examples of such literals; the first represents some text and the others
|
|
represent numbers. This is _literal_ as opposed to some indirect
|
|
representation like `4 + 2` or `"hello" + " " + "world"`.
|
|
|
|
- _Operators_ include things like math operators `+`, `-`, `*`, but also things
|
|
like the function call operator `( )`, boolean operators `and`, and myriad
|
|
other operators. [There's a comprehensive list here][expressions] but beware -
|
|
there's a lot and some of them are pretty technical. The main point is that
|
|
`( )` and `+` are the same _kind of thing_ as far as the Python interpreter is
|
|
concerned.
|
|
|
|
- _Keywords_ are special directives that tell Python how to behave. This
|
|
includes things like `if` and `def` and `while`. Technically, operators are
|
|
also keywords (for example `and` is a keyword) but that's not super relevant
|
|
here.
|
|
|
|
- ___Names___ are the last - and most important - kind of token. `print` is a
|
|
name. Variable names are names. Function names are names. Class names are
|
|
names. Module names are names. In all cases, a name represents some _thing_,
|
|
and Python can fetch that thing if given its name.
|
|
|
|
[expressions]: https://docs.python.org/3/reference/expressions.html
|
|
|
|
So if I give Python this code:
|
|
|
|
```py
|
|
x = "world"
|
|
print("hello " + x)
|
|
```
|
|
|
|
You should first identify the tokens:
|
|
|
|
- _Name_ `x`
|
|
- _Operator_ `=`
|
|
- _Literal_ `"world"`
|
|
- _Name_ `print`
|
|
- _Operator_ `( )`
|
|
- _Literal_ `"hello "`
|
|
- _Operator_ `+`
|
|
- _Name_ `x`
|
|
|
|
The first line of code binds `"world"` to the name `x`.
|
|
|
|
The expression `"hello " + x` looks up the value named by `x` and concatenates
|
|
it with the literal value `"hello "`. This produces the string `"hello world"`.
|
|
|
|
The expression `print( ... )` looks up the value - the function - named by
|
|
`print` and uses the `( )` operator to call it with the string `"hello world"`.
|
|
|
|
To be crystal clear: `x` and `print` _are the same kind of token_, it's just
|
|
that their named values have different types. One is a string, the other a
|
|
function. The string can be _operated on_ with the `+` operator, and the
|
|
function can be _operated on_ with the `( )` operator.
|
|
|
|
It is valid to write `print(print)`; here we are looking up the name `print`,
|
|
and passing that value to the function named by `print`. This should be no more
|
|
or less surprising than being able to write `x + x` or `5 * 4`.
|
|
|
|
# Namespaces
|
|
|
|
**First-and-a-half:** A _namespace_ is a collection of names.
|
|
|
|
You might also hear this called a "scope". This is the reason I say "maybe three
|
|
or four, depending how you count"; this is really part of that fundamental idea
|
|
of a _name_, but I'll list it separately to be extra clear.
|
|
|
|
There are some special structures in Python that introduce new namespaces. Each
|
|
_module_ has a "global" namespace; these are names that can be referenced
|
|
anywhere in a given file or script. Each _function_ has a "local" namespace;
|
|
these are names that can only be accessed within the function.
|
|
|
|
For example:
|
|
|
|
```py
|
|
x = "eggs"
|
|
|
|
|
|
def spam():
|
|
y = "ham"
|
|
|
|
# I can print(x) here.
|
|
|
|
# But I cannot print(y) here.
|
|
```
|
|
|
|
Objects also have namespaces. Names on objects are called "attributes", and they
|
|
may be simple values or functions, just how regular names might be simple
|
|
values (`x`, `y`) or functions (`print`, `spam`). You access attributes with the
|
|
`.` operator.
|
|
|
|
```py
|
|
obj = range(10)
|
|
print(
|
|
obj.stop) # find the value named by `obj`, then find the value named by `stop`. 10.
|
|
```
|
|
|
|
Finally, there is the built-in namespace. These are names that are accessible
|
|
always, from anywhere, by default. Names like `print` and `range` are defined
|
|
here. [Here's a comprehensive list of built-in names](https://docs.python.org/3/library/functions.html).
|
|
|
|
# Strings
|
|
|
|
**Second:** you asked about characters and letters, so you may appreciate some
|
|
background on strings.
|
|
|
|
A _string_ is a sequence of characters. A _character_ is simply a number to
|
|
which we, by convention, assign some meaning. For example, by convention, we've
|
|
all agreed that the number `74` means `J`. This convention is called an
|
|
_encoding_. The default encoding is called UTF-8 and is specified by a committee
|
|
called the _Unicode Consortium_. This encoding includes characters from many
|
|
current and ancient languages, various symbols and typographical marks, emojis,
|
|
flags, etc. The important thing to remember is each one of these things, really,
|
|
is just an integer. And all our devices just agree that when they see a given
|
|
integer they will look up the appropriate symbol in an appropriate font.
|
|
|
|
You can switch between the string representation and the numerical
|
|
representation with the `encode` and `decode` methods on strings. Really, these
|
|
are the same, you're just telling Python to tell your console to draw them
|
|
differently.
|
|
|
|
```py
|
|
>> > list('Fizz'.encode())
|
|
[70, 105, 122, 122]
|
|
>> > bytes([66, 117, 122, 122]).decode()
|
|
'Buzz'
|
|
```
|
|
|
|
For continuity: `list`, `encode`, `decode`, and `bytes` are all names. `( )`,
|
|
`[ ]`, `,`, and `.` are all operators. The numbers and `'Fizz'` are literals.
|
|
|
|
† Technically, `[66, 117, 122, 122]` in its entirety is a literal - `,` is a
|
|
keyword, not an operator - but that's neither here nor there for these purposes.
|
|
|
|
‡ The symbol `†` is number 8224 and the symbol `‡` is number 8225.
|
|
|
|
# Names
|
|
|
|
**Second-and-a-half:** names are strings.
|
|
|
|
Names are just strings, and namespaces are just `dict`. You can access them with
|
|
`locals()` and `globals()`, although in practice you almost never need to do
|
|
this directly. It's better to just use the name itself.
|
|
|
|
```py
|
|
import pprint
|
|
|
|
x = range(10)
|
|
function = print
|
|
pprint.pprint(globals())
|
|
```
|
|
|
|
This outputs:
|
|
|
|
```
|
|
{'__annotations__': {},
|
|
'__builtins__': <module 'builtins' (built-in)>,
|
|
'__cached__': None,
|
|
'__doc__': None,
|
|
'__file__': '<stdin>',
|
|
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
|
|
'__name__': '__main__',
|
|
'__package__': None,
|
|
'__spec__': None,
|
|
'function': <built-in function print>,
|
|
'pprint': <module 'pprint' from 'python3.12/pprint.py'>,
|
|
'x': range(0, 10)}
|
|
```
|
|
|
|
For continuity: `import pprint` binds the name `pprint` to the module
|
|
`pprint.py` from the standard library. The line `pprint.pprint( ... )` fetches
|
|
the function `pprint` from that module, and calls it.
|