Python 3.11新加入的和类型系统相关的新特性

PEP 646 – Variadic Generics

介绍这个 PEP 之前需要补一些知识，我们逐步深入先了解一下泛型 (Generics)。

泛型是指在定义函数或类的时候，不预先指定具体的类型，而在使用的时候再指定类型的一种特性。

对于 Python 这种动态语言，因为一切都是对象引用，可以在使用时直接判断类型:

In : def say_type(obj):
...:     match obj:
...:         case int():
...:             print('int')
...:         case str():
...:             print('str')
...:         case float():
...:             print('float')
...:         case _:
...:             print('other type')
...:

In : say_type(1)
int

In : say_type('ss')
str

In : say_type(1.1)
float

In : say_type([])
other type

但是在类型检查时，就会涉及到泛型问题，可以在运行前就发现问题。举个例子:

U = int | str


def max_1(a: U, b: U) -> U:
    return max(a, b)


max_1("foo", 1)
max_1(1, "foo")
max_1("foo", "bar")
max_1(1, 2)

在这个例子中，参数可以是数字或者字符串。但是可以明显的发现 max_1 ("foo", 1) 和 max_1 (1, "foo") 运行时会抛错，因为类型不同。但是 mypy 却没有发现。

在 Python 的类型系统中，泛型类型变量应该使用 TypeVar，就可以暴露问题了:

from typing import TypeVar

T = TypeVar("T", int, str)  # 定义类型变量


def max_2(a: T, b: T) -> T:  # 泛型函数
    return max(a, b)


max_2("foo", 1)
max_2(1, "foo")
max_2("foo", "bar")
max_2(1, 2)

前面的 2 个例子中的 T 或者 U 都是类型变量，我的理解也是类型别名（Type aliases），可以被重复利用 (还更容易表达复杂的结构)，因为参数和返回值的类型相同就直接用它们【替代】了。TypeVar 可以通过 bound 参数绑定某个类型，还可以按照我上面写的，让它支持 int 和 str。我还是建议你稍微去官网看一下 TypeVar 的用法，因为它确实在泛型中很重要。

而这种变量还可以作为容器中的元素，举个例子:

def max_3(items: list[T]) -> T:
    return max(*items)


max_3([1, 2, 3])  # OK
max_3([1, 2, '3'])  # Rejected

在常见情况下，通过 Union 的方式组合对应的类型就可以让 mypy 理解程序中参数和返回值的类型有多个，如上面的例子 items 是一个元素为字符串或者数字的列表。Python 内置的集合类型 (collections.abc.Collection) 可以支持各种类型元素，就是因为它们都是泛型类。而现实世界上，我们在开发中必然定义各种类，有时候我们也需要让自定义类支持泛型。

想想之前的泛型定义【不预先指定具体的类型，而在使用的时候再指定类型】。我们使用 typing.Generic 定义一个类:

from typing import Generic


K = TypeVar("K", int, str)
V = TypeVar("V")


class Item(Generic[K, V]):  # Item是一个泛型类，可以确定其中的2个类型
    key: K
    value: V
    def __init__(self, k: K, v: V):
        self.key = k
        self.value = v


i = Item(1, 'a')  # OK Item是泛型类，所以符合要求的类型值都可以作为参数
i2 = Item[int, str](1, 'a')  #  OK 明确的指定了Item的K, V的类型
i3 = Item[int, int](1, 2)  #  OK 明确的指定成了另外的类型
i4 = Item[int, int](1, 'a')  # Rejected 因为传入的参数和指定的类型V不同

好，有了上面的铺垫，进入正题。

如 PEP 的标题，说的是【可变数量的泛型函数】。之前介绍的 TypeVar 是单个泛型，而这次引入了数量不确定的泛型类型 TypeVarTuple。

我们看一个例子就能理解了:

from typing import TypeVarTuple


K2 = TypeVar("K2", int, str)
V2 = TypeVarTuple("V2")


class Item2(Generic[K2, *V2]):
    def __init__(self, k: K2, *v: *V2):
        self.key = k
        self.values = v


d = Item2(1, 2, '3', {'d': 4})
d = Item2(1, 2, 3, 4)
d = Item2(1, {}, set(), [])
d = Item2('1', {}, set(), [])
d = Item2('1', {})

在这里例子中，Dict 的属性 key 和 values 都是泛型，也就说，key 也可以是 int 也可以是 str，而 values 是非固定长度的，由于使用 TypeVarTuple 时没有指定类型，所以各种类型都可以用。而因为引入了 TypeVarTuple，可以让类型检查的灵活度提升了很多。

PS: 目前 mypy 还没有支持这个新特性，现在运行 mypy 会报错:

"TypeVarTuple" is not supported by mypy yet

PEP 673 – Self Type

Self 顾名思义就是申明自己，举 2 个例子看一下过去常见的用法:

class Result:
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return f'{self.__class__.__name__}(value={self.value})'

    def add_value(self, value: int) -> 'Result':
        self.value += value
        return self

    @classmethod
    def get(cls, value) -> 'Result':
        return cls(value)


class NewResult(Result):
    ...


r = NewResult(10)
print(r.add_value(5))
print(NewResult.get(20))


class Node:
    def __init__(self, data):
        self.data = data
        self.next: 'Node'|None = None
        self.previous: 'Node'|None = None


node = Node(10)
node.next = Node(20)
node.previous = Node(5)

在这个例子中有三个地方的返回值通过类型字符串来表示引用自身:

声明实例方法 add_value 返回一个 Result 实例
声明类方法 get 返回一个 Result 实例
Node 初始化时声明 self.next 和 self.previous 都可以是 None 或者 Node 实例

在这里是不能直接写类名的，因为声明时类还没有创建好，mypy 可以理解，但是运行时会报 NameError: name 'XXX' is not defined 这样的错误。

除了使用字符串定义以外，还有三种方法，我这里简单提一下就不挨个细说了 (因为 Python 3.11 更完美的解决这个问题):

Python 3.8 以上使用 ForwardRef。
导入 fromfutureimport annotations (Python 3.10 开始默认开启)
使用 TResult = TypeVar ("TResult", bound="Result") 这样的方式绑定到一个 TypeVar 上。

但是这些方法都有一个很机械的问题，就是对于继承后的类的支持和表示。例如上面的字符串定义，NewResult 继承了 Result 的同时也继承了方法注解，也就是说类似于 NewResult.get 方法的返回值，其实是一个 Result 的实例。当然本质上按照 isinstance 的逻辑看确实也是没问题的，但是并没有真实的表达 self。而这个 PEP 673 提供了 Self，这样是最美好的方案了:

from typing import Self

class Result:
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return f'{self.__class__.__name__}(value={self.value})'

    def add_value(self, value: int) -> Self:
        self.value += value
        return self

    @classmethod
    def get(cls, value) -> Self:
        return cls(value)


class NewResult(Result):
    ...


class Node:
    def __init__(self, data):
        self.data = data
        self.next: Self|None = None
        self.previous: Self|None = None

PS: 目前 mypy 还没有支持这个新特性，现在运行 mypy 会报错:

error: Variable "typing.Self" is not valid as a type
note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases

PEP 675 – Arbitrary Literal String Type

在说这个 LiteralString 之前，先说一下在 Python 3.8 引入的 Literal，会让你更容易理解。Literal 的意思是字面值，常见的字符串、数字、列表、字典、布尔值等都可以作为\ 字面值。typing.Literal 的意思是可以接受对应列出来的字面值:

from typing import Literal

def accepts_only_four(x: Literal[4]) -> None:
    pass

accepts_only_four(4)   # OK
accepts_only_four(19)  # Rejected
accepts_only_four(2 + 2)  # Rejected

Literal[4] 表示只接受参数的值为 4，所以第二个传入 19 不行，第三个 2 + 2 的结果也是 4，但事实上也不行，因为所谓的字面值的意思是要直接明确的给我这个值，而不是通过计算得来的，对于这个地方，它只判断‘字面’上 4 和 2+2 不一样就 Rejected 了。这段很重要，多理解理解。

回到正题，这个 PEP 的动机来源于提供一个更直观且通用的方案解决 SQL 注入问题。先看 PEP 里面提供的例子:

def query_user(conn: Connection, user_id: str) -> User:
    query = f"SELECT * FROM data WHERE user_id = {user_id}"
    conn.execute(query)

query_user(conn, "user123")  # OK.

# Delete the table.
query_user(conn, "user123; DROP TABLE data;")

# Fetch all users (since 1 = 1 is always true).
query_user(conn, "user123 OR 1 = 1")

在正常情况下，user_id 是一个符合要求的字符串，但是由于 user_id 可能是外部参数获取的，来源是不可靠的，就可能出现拼装 SQL 语句实现一些额外的目的的安全风险。举个例子:

query_user(conn, input())

在这里我用 input 函数表示外部渠道传入这个 user_id。如果按照过去的 str 声明看不出来问题，但是使用新增的 LiteralString 就会让语句不通过:

from typing import LiteralString

def query_user(conn: Connection, user_id: LiteralString) -> User:
    query = f"SELECT * FROM data WHERE user_id = {user_id}"
    conn.execute(query)

query_user(conn, input())  # Rejected

因为 user_id 并不是直接传入了字符串，而是通过 input 计算而来。LiteralString 就如 PEP 的标题，可以表示任意的字符串字面值，不想前面的 typing.Literal，只能规定几个对应的确定的值，灵活性太差。

另外如果字符串是拼接的，需要所有部分都是字面值:

def execute_sql(query: LiteralString):
    execute_sql("SELECT * FROM " + user_input)


user_input = input()
execute_sql("SELECT * FROM " + user_input)  # Rejected
execute_sql(f"SELECT * FROM {user_input}")  # Rejected

上述 2 个例子也会被 Rejected，因为后半部分的 user_input 不是一个字面值。

这个特性我觉得主要针对于 f-string，毕竟 input 这种用法很少见

PS: 目前 mypy 还没有支持这个新特性，所以有问题的地方还不会抛错。

PEP 681 – Data Class Transforms

目前类型检查对于标准库内各个包的支持都是很好的，包含 dataclasses，二这个 PEP 实现了一种把普通类的一些和标准库 dataclasses 相似的行为的类型检查自动转换的方案。这些行为包含:

根据声明的数据字段合成的 init 方法。
可选的合成 eq, ne, lt, le, gt 和 ge 方法。
支持 frozen 参数，静态类型检查时会确认字段的不可变。
支持【字段说明符】，静态类型检查时会了解的各个字段的属性，例如是否为该字段提供了默认值。

本小节相关的 dataclasses 和 attrs 可以看之前我的博客文章: attrs 和 Python3.7 的 dataclasses ，这里就不具体展开了。

在没有这个 PEP 的实现前，当你在项目中使用了相关的库，如 attrs, pydantic, 各种 ORM（如 SQLAlchemy、Django 等），那么在静态类型检查时这些库就需要提供对应的类型注解，否则就得自己写一遍或者想办法忽略相关的检查。而这个 PEP 就是为了降低这种成本，通过 dataclass_transform 可以方便的在装饰器、类、元类三个级别不需要额外写注释就支持对应的类型检查。

我个人觉得这个 PEP 主要旨在帮助库的作者，除非在项目中自己造了有类似 dataclasses 库那些行为的轮子，所以对于开发者开始影响较小。

举个例子，可能更好能理解。我个人是比较喜欢用 attrs 的，在我的项目中会这么定义 Model (为了举例做了极大的简化):

import attr


@attr.define()
class Model:
    id: int
    title: str



Model(1, 2)  # Rejected

我没有定义init方法，但是当我使用 attrs 后，它会帮助我自动创建一系列对应的方法。用的时候 Model (1, 2) 是应该不通过类型检查的 (因为 title 应该是字符串，我传入了 int)

接着先安装未支持这个特性的 attrs 版本，运行 pyright (另外一个静态检查工具，mypy 现在还没有支持这个 PEP) 试试:

➜ pip install attrs==20.3.0
➜ pip install pyright
➜ pyright pep681.py
...
pyright 1.1.276
/home/ubuntu/mp/2022-10-23/pep681.py
  /home/ubuntu/mp/2022-10-23/pep681.py:11:1 - error: Expected no arguments to "Model" constructor (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations

pyright 特别傻，它认为没有在这个类中定义构造方法init。这个时候 attrs 还没有支持对应的类型注解。有兴趣可以看对应的 PR: Implement pyright support via dataclass_transforms

在之后，pyright 会理解 attrs 里面的上述用法:

➜ pip install attrs==22.1.0
➜ pyright pep681.py
...
pyright 1.1.276
/home/ubuntu/mp/2022-10-23/pep681.py
  /home/ubuntu/mp/2022-10-23/pep681.py:11:10 - error: Argument of type "Literal[2]" cannot be assigned to parameter "title" of type "str" in function "__init__"
    "Literal[2]" is incompatible with "str" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations

可以看到上面这个错误就正确了。

PEP 655 – Marking individual TypedDict items as required or potentially-missing

TypedDict 是 Python 3.8 时加入的一个非常有用的类型，我们先把它说清楚。在日常开发中经常会定义一个复杂的字典类型，如果你希望 mypy 对这个字典键值类型做验证，大概需要这样:

def get_summary() -> dict[str, int|str|list[str]]:
    return {
        'total': 100,
        'title': 'test',
        'items': ['1', '2']
    }

上面的例子中我会尽量的具体出值的类型，但是由于这个字典的值的类型太多，只能使用 Union 的方法串起来。但是它在 mypy 哪里却不够明确。比如执行这么一段逻辑:

summary = get_summary()
total = summary['total']
items = summary['items']
print(total / len(items))

mypy 会抛错:

pep655.py:12: error: Unsupported operand types for / ("str" and "int")
pep655.py:12: error: Unsupported operand types for / ("List[str]" and "int")
pep655.py:12: note: Left operand is of type "Union[int, str, List[str]]"
pep655.py:12: error: Argument 1 to "len" has incompatible type "Union[int, str, List[str]]"; expected "Sized"

所以很多时候对于这个返回值就无法具体的确认类型了。而 TypedDict 就是解决这个问题的:

from typing import TypedDict


class Summary(TypedDict):
    total: int
    title: str
    items: list[str]



def get_summary() -> Summary:
    return {
        'total': 100,
        'title': 'test',
        'items': ['1', '2']
    }


summary = get_summary()
total = summary['total']
items = summary['items']
print(total / len(items))

TypedDict 通过一个类似于 dataclass 的形式明确个各个键值的类型。可以帮助 mypy 更理解这个返回的值的结构，从而在逻辑中判断出更多类型问题:

x = summary['x']  # TypedDict "Summary" has no key "x"
summary['total'] = 'total'  # Value of "total" has incompatible type "str"; expected "int"

但是在 Python 3.11 前，它在实现上对于定义的键值的要求很极端，要不然需要全部存在，要不然不关心缺少哪一个键:

class Summary(TypedDict):
    total: int
    title: str
    items: list[str]

s: Summary = {'total': 10}  # Missing keys ("title", "items") for TypedDict "Summary"


class Summary2(TypedDict, total=False):  # 使用total=False会让类型检查不关注是否缺少键
    total: int
    title: str
    items: list[str]


s2: Summary2 = {'total': 10}  # OK
s3: Summary2 = {}  # OK

过去为了分别对于不同的键是否是一个 Optional，只能用继承的方式:

class Summary3(TypedDict):
    total: int
    title: str


class Summary4(Summary3, total=False):
    items: list[str]


s4: Summary4 = {}  # Missing keys ("total", "title") for TypedDict "Summary4"
s5: Summary4 = {'total': 10, 'title': 'Title'}  # OK，缺少了items也没关系

而 PEP 655 通过引入 Required[] 和 NotRequired[] 可以让 TypedDict 里面的键的定义明确是否强依赖:

from typing import Required, NotRequired


class Summary5(TypedDict):
    total: Required[int]  # 明确total是必选的
    title: str  # 默认就是Required
    items: NotRequired[list[str]]  # 明确items是可选的


s6: Summary5 = {}  # Missing keys ("total", "title") for TypedDict "Summary5"
s7: Summary4 = {'total': 10, 'title': 'Title'} # OK，实现和上面一样的效果

后记

除了 PEP 655 以外，目前 mypy 都还没有支持，具体的可以看延伸阅读连接 6 的进度。这... 💊。
准备写一篇文章完整的介绍 Python 类型系统的前世今生

代码目录

本文代码可以在 mp 项目找到