在PostgreSQL中使用ltree处理层次结构数据的方法

什么是ltree？为什么选择ltree？初始数据概述

在本文中，我们将学习如何使用 PostgreSQL 的 ltree 模块，该模块允许以分层的树状结构存储数据。

什么是ltree？

Ltree是PostgreSQL模块。它实现了一种数据类型ltree，用于表示存储在分层树状结构中的数据的标签。提供了用于搜索标签树的广泛工具。

为什么选择ltree？

ltree实现了一个物化路径，对于INSERT / UPDATE / DELETE来说非常快，而对于SELECT操作则较快通常，它比使用经常需要重新计算分支的递归CTE或递归函数要快如内置的查询语法和专门用于查询和导航树的运算符索引！！！

初始数据

首先，您应该在数据库中启用扩展。您可以通过以下命令执行此操作：

1	CREATE EXTENSION ltree;

让我们创建表并向其中添加一些数据：

								
									 CREATE   TABLE   comments (user_id   integer  , description text, path ltree); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 1, md5(random()::text),   '0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 2, md5(random()::text),   '0001.0001.0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 2, md5(random()::text),   '0001.0001.0001.0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 1, md5(random()::text),   '0001.0001.0001.0002'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 5, md5(random()::text),   '0001.0001.0001.0003'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 6, md5(random()::text),   '0001.0002'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 6, md5(random()::text),   '0001.0002.0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 6, md5(random()::text),   '0001.0003'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 8, md5(random()::text),   '0001.0003.0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 9, md5(random()::text),   '0001.0003.0002'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 11, md5(random()::text),   '0001.0003.0002.0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 2, md5(random()::text),   '0001.0003.0002.0002'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 5, md5(random()::text),   '0001.0003.0002.0003'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 7, md5(random()::text),   '0001.0003.0002.0002.0001'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 20, md5(random()::text),   '0001.0003.0002.0002.0002'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 31, md5(random()::text),   '0001.0003.0002.0002.0003'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 22, md5(random()::text),   '0001.0003.0002.0002.0004'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 34, md5(random()::text),   '0001.0003.0002.0002.0005'  ); 

									 INSERT   INTO   comments (user_id, description, path)   VALUES   ( 22, md5(random()::text),   '0001.0003.0002.0002.0006'  );

另外，我们应该添加一些索引：

1 2	CREATE INDEX path_gist_comments_idx ON comments USING GIST(path); CREATE INDEX path_comments_idx ON comments USING btree(path);

正如您看到的那样，我建立comments表时带有path字段，该字段包含该表的tree全部路径。如您所见，对于树分隔符，我使用4个数字和点。

让我们在commenets表中找到path以‘0001.0003'的记录：

								
									 $   SELECT   user_id, path   FROM   comments   WHERE   path <@   '0001.0003'  ; 

									    user_id |   path 

									 ---------+-------------------------- 

									     6 | 0001.0003 

									     8 | 0001.0003.0001 

									     9 | 0001.0003.0002 

									     11 | 0001.0003.0002.0001 

									     2 | 0001.0003.0002.0002 

									     5 | 0001.0003.0002.0003 

									     7 | 0001.0003.0002.0002.0001 

									     20 | 0001.0003.0002.0002.0002 

									     31 | 0001.0003.0002.0002.0003 

									     22 | 0001.0003.0002.0002.0004 

									     34 | 0001.0003.0002.0002.0005 

									     22 | 0001.0003.0002.0002.0006 

									 (12   rows  )

让我们通过EXPLAIN命令检查这个SQL：

$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003' ;

QUERY PLAN

----------------------------------------------------------------------------------------------------

Seq Scan on comments (cost=0.00..1.24 rows =2 width=38) (actual time =0.013..0.017 rows =12 loops=1)

Filter: (path <@ '0001.0003' ::ltree)

Rows Removed by Filter: 7

Total runtime: 0.038 ms

(4 rows )

让我们禁用seq scan进行测试：

								
									 $   SET   enable_seqscan=  false  ; 

									 SET 

									 $ EXPLAIN ANALYZE   SELECT   user_id, path   FROM   comments   WHERE   path <@   '0001.0003'  ; 

									                  QUERY PLAN 

									 ----------------------------------------------------------------------------------------------------------------------------------- 

									    Index   Scan using path_gist_comments_idx   on   comments (cost=0.00..8.29   rows  =2 width=38) (actual   time  =0.023..0.034   rows  =12 loops=1) 

									    Index   Cond: (path <@   '0001.0003'  ::ltree) 

									    Total runtime: 0.076 ms 

									 (3   rows  )

现在SQL慢了，但是能看到SQL是怎么使用index的。
第一个SQL语句使用了sequence scan，因为在表中没有太多的数据。

我们可以将select [path <@ ‘0001.0003'] 换种实现方法：

								
									 $   SELECT   user_id, path   FROM   comments   WHERE   path ~   '0001.0003.*'  ; 

									 user_id |   path 

									 ---------+-------------------------- 

									     6 | 0001.0003 

									     8 | 0001.0003.0001 

									     9 | 0001.0003.0002 

									     11 | 0001.0003.0002.0001 

									     2 | 0001.0003.0002.0002 

									     5 | 0001.0003.0002.0003 

									     7 | 0001.0003.0002.0002.0001 

									     20 | 0001.0003.0002.0002.0002 

									     31 | 0001.0003.0002.0002.0003 

									     22 | 0001.0003.0002.0002.0004 

									     34 | 0001.0003.0002.0002.0005 

									     22 | 0001.0003.0002.0002.0006 

									 (12   rows  )

你不应该忘记数据的顺序，如下的例子：

								
									 $   INSERT   INTO   comments (user_id, description, path)   VALUES   ( 9, md5(random()::text),   '0001.0003.0001.0001'  ); 

									 $   INSERT   INTO   comments (user_id, description, path)   VALUES   ( 9, md5(random()::text),   '0001.0003.0001.0002'  ); 

									 $   INSERT   INTO   comments (user_id, description, path)   VALUES   ( 9, md5(random()::text),   '0001.0003.0001.0003'  ); 

									 $   SELECT   user_id, path   FROM   comments   WHERE   path ~   '0001.0003.*'  ; 

									 user_id |   path 

									 ---------+-------------------------- 

									     6 | 0001.0003 

									     8 | 0001.0003.0001 

									     9 | 0001.0003.0002 

									     11 | 0001.0003.0002.0001 

									     2 | 0001.0003.0002.0002 

									     5 | 0001.0003.0002.0003 

									     7 | 0001.0003.0002.0002.0001 

									     20 | 0001.0003.0002.0002.0002 

									     31 | 0001.0003.0002.0002.0003 

									     22 | 0001.0003.0002.0002.0004 

									     34 | 0001.0003.0002.0002.0005 

									     22 | 0001.0003.0002.0002.0006 

									     9 | 0001.0003.0001.0001 

									     9 | 0001.0003.0001.0002 

									     9 | 0001.0003.0001.0003 

									 (15   rows  )

现在进行排序：

								
									 $   SELECT   user_id, path   FROM   comments   WHERE   path ~   '0001.0003.*'   ORDER   by   path; 

									    user_id |   path 

									 ---------+-------------------------- 

									     6 | 0001.0003 

									     8 | 0001.0003.0001 

									     9 | 0001.0003.0001.0001 

									     9 | 0001.0003.0001.0002 

									     9 | 0001.0003.0001.0003 

									     9 | 0001.0003.0002 

									     11 | 0001.0003.0002.0001 

									     2 | 0001.0003.0002.0002 

									     7 | 0001.0003.0002.0002.0001 

									     20 | 0001.0003.0002.0002.0002 

									     31 | 0001.0003.0002.0002.0003 

									     22 | 0001.0003.0002.0002.0004 

									     34 | 0001.0003.0002.0002.0005 

									     22 | 0001.0003.0002.0002.0006 

									     5 | 0001.0003.0002.0003 

									 (15   rows  )

可以在lquery的非星号标签的末尾添加几个修饰符，以使其比完全匹配更匹配：
[ @]-不区分大小写匹配，例如a @匹配A
[ *]-匹配任何带有该前缀的标签，例如foo *匹配foobar
[％]-匹配以下划线开头的单词

								
									 $   SELECT   user_id, path   FROM   comments   WHERE   path ~   '0001.*{1,2}.0001|0002.*'   ORDER   by   path; 

									    user_id |   path 

									 ---------+-------------------------- 

									     2 | 0001.0001.0001 

									     2 | 0001.0001.0001.0001 

									     1 | 0001.0001.0001.0002 

									     5 | 0001.0001.0001.0003 

									     6 | 0001.0002.0001 

									     8 | 0001.0003.0001 

									     9 | 0001.0003.0001.0001 

									     9 | 0001.0003.0001.0002 

									     9 | 0001.0003.0001.0003 

									     9 | 0001.0003.0002 

									     11 | 0001.0003.0002.0001 

									     2 | 0001.0003.0002.0002 

									     7 | 0001.0003.0002.0002.0001 

									     20 | 0001.0003.0002.0002.0002 

									     31 | 0001.0003.0002.0002.0003 

									     22 | 0001.0003.0002.0002.0004 

									     34 | 0001.0003.0002.0002.0005 

									     22 | 0001.0003.0002.0002.0006 

									     5 | 0001.0003.0002.0003 

									 (19   rows  )

我们来为parent ‘0001.0003'找到所有直接的childrens，见下：

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*{1}' ORDER by path;

user_id | path

---------+----------------

8 | 0001.0003.0001

9 | 0001.0003.0002

(2 rows )

为parent ‘0001.0003'找到所有的childrens，见下：

								
									 $   SELECT   user_id, path   FROM   comments   WHERE   path ~   '0001.0003.*'   ORDER   by   path; 

									    user_id |   path 

									 ---------+-------------------------- 

									     6 | 0001.0003 

									     8 | 0001.0003.0001 

									     9 | 0001.0003.0001.0001 

									     9 | 0001.0003.0001.0002 

									     9 | 0001.0003.0001.0003 

									     9 | 0001.0003.0002 

									     11 | 0001.0003.0002.0001 

									     2 | 0001.0003.0002.0002 

									     7 | 0001.0003.0002.0002.0001 

									     20 | 0001.0003.0002.0002.0002 

									     31 | 0001.0003.0002.0002.0003 

									     22 | 0001.0003.0002.0002.0004 

									     34 | 0001.0003.0002.0002.0005 

									     22 | 0001.0003.0002.0002.0006 

									     5 | 0001.0003.0002.0003 

									 (15   rows  )

为children ‘0001.0003.0002.0002.0005'找到parent:

$ SELECT user_id, path FROM comments WHERE path = subpath( '0001.0003.0002.0002.0005' , 0, -1) ORDER by path;

user_id | path

---------+---------------------

2 | 0001.0003.0002.0002

(1 row)

如果你的路径不是唯一的，你会得到多条记录。

概述

可以看出，使用ltree的物化路径非常简单。在本文中，我没有列出ltree的所有可能用法。它不被视为全文搜索问题ltxtquery。但是您可以在PostgreSQL官方文档( http://www.postgresql.org/docs/current/static/ltree.html )中找到它。

了解更多PostgreSQL热点资讯、新闻动态、精彩活动，请访问中国PostgreSQL官方网站： www.postgresqlchina.com

解决更多PostgreSQL相关知识、技术、工作问题，请访问中国PostgreSQL官方问答社区： www.pgfans.cn

下载更多PostgreSQL相关资料、工具、插件问题，请访问中国PostgreSQL官方下载网站： www.postgreshub.cn

解决更多PostgreSQL相关知识、技术、工作问题，请访问PostgreSQL教程网： www.tuohang.net/db/postgresql/

到此这篇关于在PostgreSQL中使用ltree处理层次结构数据的文章就介绍到这了,更多相关PostgreSQL层次结构数据内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家！

原文链接：https://blog.csdn.net/weixin_46199817/article/details/114819040

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://www.haodehen.cn/did229401

更新时间：2023-05-14 阅读：55次