⌘+k ctrl+k
1.4 (LTS)
搜索快捷键 cmd + k | ctrl + k
WITH 子句

WITH 子句允许您指定公用表表达式(CTE)。常规(非递归)公用表表达式本质上是作用域仅限于特定查询的视图。CTE 可以相互引用,也可以嵌套使用。递归 CTE 可以引用自身。

基本 CTE 示例

创建一个名为 cte 的 CTE 并在主查询中使用它

WITH cte AS (SELECT 42 AS x)
SELECT * FROM cte;
x
42

创建两个 CTE cte1cte2,其中第二个 CTE 引用第一个 CTE

WITH
    cte1 AS (SELECT 42 AS i),
    cte2 AS (SELECT i * 100 AS x FROM cte1)
SELECT * FROM cte2;
x
4200

您可以为 CTE 指定列名

WITH cte(j) AS (SELECT 42 AS i)
FROM cte;

CTE 物化 (Materialization)

DuckDB 默认将 CTE 处理为已物化(materialized)状态,这意味着 CTE 只会被评估一次,结果存储在一个临时表中。然而,在某些条件下,DuckDB 可以将 CTE 内联(inline)到主查询中,这意味着 CTE 不会被物化,其定义会在每次引用时被复制。内联通过以下启发式规则完成:

  • 该 CTE 没有被多次引用。
  • 该 CTE 不包含 VOLATILE 函数。
  • 该 CTE 使用了 AS NOT MATERIALIZED 且没有使用 AS MATERIALIZED
  • 该 CTE 不执行分组聚合。

可以通过定义 CTE 时使用 AS MATERIALIZED 显式激活物化,或使用 AS NOT MATERIALIZED 禁用它。请注意,即使满足启发式规则,内联也并不总是可能的。例如,如果 CTE 包含 read_csv 函数,它就无法被内联。

以以下查询为例,它调用了同一个 CTE 三次

WITH t(x) AS (complex_query)
SELECT *
FROM
    t AS t1,
    t AS t2,
    t AS t3;

内联会为每次引用复制 t 的定义,从而产生以下查询

SELECT *
FROM
    (complex_query) AS t1(x),
    (complex_query) AS t2(x),
    (complex_query) AS t3(x);

如果 complex_query 代价高昂,使用 MATERIALIZED 关键字将其物化可以提高性能。在这种情况下,complex_query 只会被评估一次。

WITH t(x) AS MATERIALIZED (complex_query)
SELECT *
FROM
    t AS t1,
    t AS t2,
    t AS t3;

如果想要禁用物化,请使用 NOT MATERIALIZED

WITH t(x) AS NOT MATERIALIZED (complex_query)
SELECT *
FROM
    t AS t1,
    t AS t2,
    t AS t3;

通常不建议使用显式的物化提示,因为 DuckDB 的查询优化器能够根据查询结构和上述启发式规则决定何时物化或内联 CTE。但在某些情况下,使用 MATERIALIZEDNOT MATERIALIZED 来显式控制行为可能是有益的。

递归 CTE

WITH RECURSIVE 允许定义可以引用自身的 CTE。请注意,查询必须以确保能够终止的方式编写,否则可能会陷入无限循环。

示例:斐波那契数列

WITH RECURSIVE 可用于进行递归计算。例如,这是如何使用 WITH RECURSIVE 计算前十个斐波那契数的方法

WITH RECURSIVE FibonacciNumbers (
    RecursionDepth, FibonacciNumber, NextNumber
) AS (
        -- Base case
        SELECT
            0 AS RecursionDepth,
            0 AS FibonacciNumber,
            1 AS NextNumber
        UNION ALL
        -- Recursive step
        SELECT
            fib.RecursionDepth + 1 AS RecursionDepth,
            fib.NextNumber AS FibonacciNumber,
            fib.FibonacciNumber + fib.NextNumber AS NextNumber
        FROM
            FibonacciNumbers fib
        WHERE
            fib.RecursionDepth + 1 < 10
    )
SELECT
    fn.RecursionDepth AS FibonacciNumberIndex,
    fn.FibonacciNumber
FROM
    FibonacciNumbers fn;
FibonacciNumberIndex FibonacciNumber
0 0
1 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34

示例:树遍历

WITH RECURSIVE 可用于遍历树。例如,以标签层级为例

Example graph Example graph

CREATE TABLE tag (id INTEGER, name VARCHAR, subclassof INTEGER);
INSERT INTO tag VALUES
    (1, 'U2',     5),
    (2, 'Blur',   5),
    (3, 'Oasis',  5),
    (4, '2Pac',   6),
    (5, 'Rock',   7),
    (6, 'Rap',    7),
    (7, 'Music',  9),
    (8, 'Movies', 9),
    (9, 'Art', NULL);

以下查询返回从节点 Oasis 到树根(Art)的路径。

WITH RECURSIVE tag_hierarchy(id, source, path) AS (
        SELECT id, name, [name] AS path
        FROM tag
        WHERE subclassof IS NULL
    UNION ALL
        SELECT tag.id, tag.name, list_prepend(tag.name, tag_hierarchy.path)
        FROM tag, tag_hierarchy
        WHERE tag.subclassof = tag_hierarchy.id
    )
SELECT path
FROM tag_hierarchy
WHERE source = 'Oasis';
path
[Oasis, Rock, Music, Art]

图遍历

WITH RECURSIVE 子句可用于表示任意图的遍历。然而,如果图包含环,查询必须执行环检测以防止无限循环。实现这一点的一种方法是将遍历路径存储在 列表 中,并在扩展路径之前检查其终点是否已访问过(参见后文示例)。

LDBC Graphalytics 基准测试 中的以下有向图为例

Example graph Example graph

CREATE TABLE edge (node1id INTEGER, node2id INTEGER);
INSERT INTO edge VALUES
    (1, 3), (1, 5), (2, 4), (2, 5), (2, 10), (3, 1),
    (3, 5), (3, 8), (3, 10), (5, 3), (5, 4), (5, 8),
    (6, 3), (6, 4), (7, 4), (8, 1), (9, 4);

请注意,该图包含有向环,例如在节点 1、5 和 8 之间。

枚举节点的所有路径

以下查询返回从节点 1 开始的所有路径

WITH RECURSIVE paths(startNode, endNode, path) AS (
        SELECT -- Define the path as the first edge of the traversal
            node1id AS startNode,
            node2id AS endNode,
            [node1id, node2id] AS path
        FROM edge
        WHERE startNode = 1
        UNION ALL
        SELECT -- Concatenate new edge to the path
            paths.startNode AS startNode,
            node2id AS endNode,
            array_append(path, node2id) AS path
        FROM paths
        JOIN edge ON paths.endNode = node1id
        -- Prevent adding a repeated node to the path.
        -- This ensures that no cycles occur.
        WHERE list_position(paths.path, node2id) IS NULL
    )
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;
startNode endNode path
1 3 [1, 3]
1 5 [1, 5]
1 5 [1, 3, 5]
1 8 [1, 3, 8]
1 10 [1, 3, 10]
1 3 [1, 5, 3]
1 4 [1, 5, 4]
1 8 [1, 5, 8]
1 4 [1, 3, 5, 4]
1 8 [1, 3, 5, 8]
1 8 [1, 5, 3, 8]
1 10 [1, 5, 3, 10]

请注意,此查询的结果并不限于最短路径,例如对于节点 5,结果包含路径 [1, 5][1, 3, 5]

枚举节点间的无权最短路径

在大多数情况下,枚举所有路径既不切实际也不可行。通常我们只关心 (无权) 最短路径。要找到这些路径,需要调整 WITH RECURSIVE 查询的后半部分,使其仅在节点尚未被访问时才包含它。这是通过使用一个子查询来检查是否有任何先前路径包含该节点来实现的

WITH RECURSIVE paths(startNode, endNode, path) AS (
        SELECT -- Define the path as the first edge of the traversal
            node1id AS startNode,
            node2id AS endNode,
            [node1id, node2id] AS path
        FROM edge
        WHERE startNode = 1
        UNION ALL
        SELECT -- Concatenate new edge to the path
            paths.startNode AS startNode,
            node2id AS endNode,
            array_append(path, node2id) AS path
        FROM paths
        JOIN edge ON paths.endNode = node1id
        -- Prevent adding a node that was visited previously by any path.
        -- This ensures that (1) no cycles occur and (2) only nodes that
        -- were not visited by previous (shorter) paths are added to a path.
        WHERE NOT EXISTS (
                FROM paths previous_paths
                WHERE list_contains(previous_paths.path, node2id)
              )
    )
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;
startNode endNode path
1 3 [1, 3]
1 5 [1, 5]
1 8 [1, 3, 8]
1 10 [1, 3, 10]
1 4 [1, 5, 4]
1 8 [1, 5, 8]

枚举两个节点间的无权最短路径

WITH RECURSIVE 也可以用于查找 两个节点之间的所有 (无权) 最短路径。为了确保递归查询在到达终点节点时立即停止,我们使用一个 窗口函数 来检查终点节点是否在刚添加的节点中。

以下查询返回节点 1(起始节点)和节点 8(结束节点)之间的所有无权最短路径

WITH RECURSIVE paths(startNode, endNode, path, endReached) AS (
   SELECT -- Define the path as the first edge of the traversal
        node1id AS startNode,
        node2id AS endNode,
        [node1id, node2id] AS path,
        (node2id = 8) AS endReached
     FROM edge
     WHERE startNode = 1
   UNION ALL
   SELECT -- Concatenate new edge to the path
        paths.startNode AS startNode,
        node2id AS endNode,
        array_append(path, node2id) AS path,
        max(CASE WHEN node2id = 8 THEN 1 ELSE 0 END)
            OVER (ROWS BETWEEN UNBOUNDED PRECEDING
                           AND UNBOUNDED FOLLOWING) AS endReached
     FROM paths
     JOIN edge ON paths.endNode = node1id
    WHERE NOT EXISTS (
            FROM paths previous_paths
            WHERE list_contains(previous_paths.path, node2id)
          )
      AND paths.endReached = 0
)
SELECT startNode, endNode, path
FROM paths
WHERE endNode = 8
ORDER BY length(path), path;
startNode endNode path
1 8 [1, 3, 8]
1 8 [1, 5, 8]

带有 USING KEY 的递归 CTE

USING KEY 会改变常规递归 CTE 的行为。

在每次迭代中,常规递归 CTE 将结果行追加到联合表中,这最终定义了 CTE 的总体结果。相比之下,带有 USING KEY 的 CTE 能够更新早期迭代中已放入联合表的行:如果当前迭代产生了一个键为 k 的行,它将替换联合表中具有相同键 k 的行(类似于字典)。如果联合表中尚不存在这样的行,则像往常一样将新行追加到联合表中。

这允许 CTE 对联合表的内容进行细粒度控制。避免只追加的行为可以显著减小联合表的大小。这有助于提高查询运行时间、减少内存消耗,并使得在迭代进行中访问联合表成为可能(对于常规递归 CTE 来说,这是不可能的):在 CTE WITH RECURSIVE T(...) USING KEY ... 中,表 T 表示最后一次迭代添加的行(递归 CTE 的惯例),而表 recurring.T 表示迄今为止构建的联合表。对 recurring.T 的引用允许将相当复杂的算法优雅且地道地转换为易读的 SQL 代码。

示例:USING KEY

这是一个递归 CTE,其中 USING KEY 具有一个键列 (a) 和一个负载列 (b)。负载列对应于将被覆盖的列。在第一次迭代中,我们有两个不同的键,12。这两个键将生成两个新行 (1, 3)(2, 4)。在下一次迭代中,我们产生一个新的键 3,它生成一个新行。我们还生成了行 (2, 3),其中 2 是前一次迭代中已经存在的键。这将用新的负载 3 覆盖旧的负载 4

WITH RECURSIVE tbl(a, b) USING KEY (a) AS (
    SELECT a, b
    FROM (VALUES (1, 3), (2, 4)) t(a, b)
        UNION
    SELECT a + 1, b
    FROM tbl
    WHERE a < 3
)
SELECT *
FROM tbl;
a b
1 3
2 3
3 3

使用 VALUES

您可以为 CTE 的初始(锚点)部分使用 VALUES 子句

WITH RECURSIVE tbl(a, b) USING KEY (a) AS (
    VALUES (1, 3), (2, 4)
        UNION
    SELECT a + 1, b
    FROM tbl
    WHERE a < 3
)
SELECT *
FROM tbl;

示例:USING KEY 引用联合表

除了将联合表用作字典外,我们现在还可以在查询中引用它。这使您不仅可以使用上次迭代的结果,还可以使用更早期的结果。这一新特性使得某些算法的实现变得更容易。

连通分量算法就是一个例子。对于每个节点,该算法确定其连接到的 ID 最小的节点。为了实现这一点,我们使用联合表中的条目来跟踪为节点找到的最小 ID。如果新的传入行包含更小的 ID,我们就更新该值。

Example graph Example graph

CREATE TABLE nodes (id INTEGER);
INSERT INTO nodes VALUES (1), (2), (3), (4), (5), (6), (7), (8);
CREATE TABLE edges (node1id INTEGER, node2id INTEGER);
INSERT INTO edges VALUES
    (1, 3), (2, 3), (3, 7), (7, 8), (5, 4), (6, 4);
WITH RECURSIVE connected_components(id, comp) USING KEY (id) AS (
    SELECT n.id, n.id AS comp
    FROM nodes AS n
        UNION (
    SELECT DISTINCT ON (previous_iter.id) previous_iter.id, initial_iter.comp
    FROM 
        recurring.connected_components AS previous_iter,
        connected_components AS initial_iter,
        edges AS e
    WHERE ((e.node1id, e.node2id) = (previous_iter.id, initial_iter.id)
       OR (e.node2id, e.node1id) = (previous_iter.id, initial_iter.id))
      AND initial_iter.comp < previous_iter.comp
    ORDER BY initial_iter.id ASC, previous_iter.comp ASC)
)
TABLE connected_components
ORDER BY id;
id comp
1 1
2 1
3 1
4 4
5 4
6 4
7 1
8 1

限制

DuckDB 不支持互递归 CTE。请参阅 DuckDB 仓库中的相关问题和讨论

语法

© 2025 DuckDB 基金会,阿姆斯特丹,荷兰
行为准则 商标使用指南