Title: Python from Scratch Date: 2025-02-01 Summary: Building up the Python Data Model from scratch. * ** * ** * ** * ** * ** * ** * ** * ** * ** * ** # Learning to Read --- **First:** A Python program is made up of _tokens_; you can think of these as " words". Some examples of tokens: - `"hello world"` - `6` - `(` - `while` - `print` Generally there are four types of token in Python, although in practice the lines between them get blurred a little bit. - _Literals_ literally represent some value. `"hello world"` and `6` and `4.2` are examples of such literals; the first represents some text and the others represent numbers. This is _literal_ as opposed to some indirect representation like `4 + 2` or `"hello" + " " + "world"`. - _Operators_ include things like math operators `+`, `-`, `*`, but also things like the function call operator `( )`, boolean operators `and`, and myriad other operators. [There's a comprehensive list here][expressions] but beware - there's a lot and some of them are pretty technical. The main point is that `( )` and `+` are the same _kind of thing_ as far as the Python interpreter is concerned. - _Keywords_ are special directives that tell Python how to behave. This includes things like `if` and `def` and `while`. Technically, operators are also keywords (for example `and` is a keyword) but that's not super relevant here. - ___Names___ are the last - and most important - kind of token. `print` is a name. Variable names are names. Function names are names. Class names are names. Module names are names. In all cases, a name represents some _thing_, and Python can fetch that thing if given its name. [expressions]: https://docs.python.org/3/reference/expressions.html So if I give Python this code: ```py x = "world" print("hello " + x) ``` You should first identify the tokens: - _Name_ `x` - _Operator_ `=` - _Literal_ `"world"` - _Name_ `print` - _Operator_ `( )` - _Literal_ `"hello "` - _Operator_ `+` - _Name_ `x` The first line of code binds `"world"` to the name `x`. The expression `"hello " + x` looks up the value named by `x` and concatenates it with the literal value `"hello "`. This produces the string `"hello world"`. The expression `print( ... )` looks up the value - the function - named by `print` and uses the `( )` operator to call it with the string `"hello world"`. To be crystal clear: `x` and `print` _are the same kind of token_, it's just that their named values have different types. One is a string, the other a function. The string can be _operated on_ with the `+` operator, and the function can be _operated on_ with the `( )` operator. It is valid to write `print(print)`; here we are looking up the name `print`, and passing that value to the function named by `print`. This should be no more or less surprising than being able to write `x + x` or `5 * 4`. # Namespaces **First-and-a-half:** A _namespace_ is a collection of names. You might also hear this called a "scope". This is the reason I say "maybe three or four, depending how you count"; this is really part of that fundamental idea of a _name_, but I'll list it separately to be extra clear. There are some special structures in Python that introduce new namespaces. Each _module_ has a "global" namespace; these are names that can be referenced anywhere in a given file or script. Each _function_ has a "local" namespace; these are names that can only be accessed within the function. For example: ```py x = "eggs" def spam(): y = "ham" # I can print(x) here. # But I cannot print(y) here. ``` Objects also have namespaces. Names on objects are called "attributes", and they may be simple values or functions, just how regular names might be simple values (`x`, `y`) or functions (`print`, `spam`). You access attributes with the `.` operator. ```py obj = range(10) print( obj.stop) # find the value named by `obj`, then find the value named by `stop`. 10. ``` Finally, there is the built-in namespace. These are names that are accessible always, from anywhere, by default. Names like `print` and `range` are defined here. [Here's a comprehensive list of built-in names](https://docs.python.org/3/library/functions.html). # Strings **Second:** you asked about characters and letters, so you may appreciate some background on strings. A _string_ is a sequence of characters. A _character_ is simply a number to which we, by convention, assign some meaning. For example, by convention, we've all agreed that the number `74` means `J`. This convention is called an _encoding_. The default encoding is called UTF-8 and is specified by a committee called the _Unicode Consortium_. This encoding includes characters from many current and ancient languages, various symbols and typographical marks, emojis, flags, etc. The important thing to remember is each one of these things, really, is just an integer. And all our devices just agree that when they see a given integer they will look up the appropriate symbol in an appropriate font. You can switch between the string representation and the numerical representation with the `encode` and `decode` methods on strings. Really, these are the same, you're just telling Python to tell your console to draw them differently. ```py >> > list('Fizz'.encode()) [70, 105, 122, 122] >> > bytes([66, 117, 122, 122]).decode() 'Buzz' ``` For continuity: `list`, `encode`, `decode`, and `bytes` are all names. `( )`, `[ ]`, `,`, and `.` are all operators. The numbers and `'Fizz'` are literals. † Technically, `[66, 117, 122, 122]` in its entirety is a literal - `,` is a keyword, not an operator - but that's neither here nor there for these purposes. ‡ The symbol `†` is number 8224 and the symbol `‡` is number 8225. # Names **Second-and-a-half:** names are strings. Names are just strings, and namespaces are just `dict`. You can access them with `locals()` and `globals()`, although in practice you almost never need to do this directly. It's better to just use the name itself. ```py import pprint x = range(10) function = print pprint.pprint(globals()) ``` This outputs: ``` {'__annotations__': {}, '__builtins__': , '__cached__': None, '__doc__': None, '__file__': '', '__loader__': , '__name__': '__main__', '__package__': None, '__spec__': None, 'function': , 'pprint': , 'x': range(0, 10)} ``` For continuity: `import pprint` binds the name `pprint` to the module `pprint.py` from the standard library. The line `pprint.pprint( ... )` fetches the function `pprint` from that module, and calls it.