好得很程序员自学网

<tfoot draggable='sEl'></tfoot>

Postgresql中xlog生成和清理逻辑操作

0 前言

1、2部分是对XLOG生成和清理逻辑的分析,XLOG暴涨的处理直接看第3部分。

1 WAL归档

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

# 在自动的WAL检查点之间的日志文件段的最大数量

checkpoint_segments =

# 在自动WAL检查点之间的最长时间

checkpoint_timeout =

# 缓解io压力

checkpoint_completion_target =

# 日志文件段的保存最小数量,为了备库保留更多段

wal_keep_segments =

# 已完成的WAL段通过archive_command发送到归档存储

archive_mode =

# 强制timeout切换到新的wal段文件

archive_timeout =

max_wal_size =

min_wal_size =

1.1 不开启归档时

文件数量受下面几个参数控制,通常不超过

?

1

(2 + checkpoint_completion_target) * checkpoint_segments + 1

checkpoint_segments + wal_keep_segments + 1 个文件。

如果一个旧段文件不再需要了会重命名然后继续覆盖使用,如果由于短期的日志输出高峰导致了超过

3 * checkpoint_segments + 1 个文件,直接删除文件。

1.2 开启归档时

文件数量:删除归档成功的段文件

抽象来看一个运行的PG生成一个无限长的WAL日志序列。每段16M,这些段文件的名字是数值命名的,反映在WAL序列中的位置。在不用WAL归档的时候,系统通常只是创建几个段文件然后循环使用,方法是把不再使用的段文件重命名为更高的段编号。

当且仅当归档命令成功时,归档命令返回零。 在得到一个零值结果之后,PostgreSQL将假设该WAL段文件已经成功归档,稍后将删除段文件。一个非零值告诉PostgreSQL该文件没有被归档,会周期性的重试直到成功。

2 PG源码分析

2.1 删除逻辑

触发删除动作

?

1

2

3

RemoveOldXlogFiles

> CreateCheckPoint

> CreateRestartPoint

wal_keep_segments判断(调用这个函数修改_logSegNo,然后再传入RemoveOldXlogFiles)

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

static void

KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)

{

  XLogSegNo segno;

  XLogRecPtr keep;

  XLByteToSeg(recptr, segno);

  keep = XLogGetReplicationSlotMinimumLSN();

  /* compute limit for wal_keep_segments first */

  if (wal_keep_segments > 0)

  {

  /* avoid underflow, don 't go below 1 */

  if (segno <= wal_keep_segments)

   segno = 1;

  else

   segno = segno - wal_keep_segments;

  }

  /* then check whether slots limit removal further */

  if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)

  {

  XLogSegNo slotSegNo;

  XLByteToSeg(keep, slotSegNo);

  if (slotSegNo <= 0)

   segno = 1;

  else if (slotSegNo < segno)

   segno = slotSegNo;

  }

  /* don' t delete WAL segments newer than the calculated segment */

  if (segno < *logSegNo)

  *logSegNo = segno;

}

删除逻辑

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

static void

RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)

{

   ...

   ...

  while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL )

  {

  /* Ignore files that are not XLOG segments */

  if (strlen(xlde->d_name) != 24 ||

   strspn(xlde->d_name, "0123456789ABCDEF" ) != 24)

   continue ;

  /*

   * We ignore the timeline part of the XLOG segment identifiers in

   * deciding whether a segment is still needed. This ensures that we

   * won't prematurely remove a segment from a parent timeline. We could

   * probably be a little more proactive about removing segments of

   * non-parent timelines, but that would be a whole lot more

   * complicated.

   *

   * We use the alphanumeric sorting property of the filenames to decide

   * which ones are earlier than the lastoff segment.

   */

  if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0)

  {

   if (XLogArchiveCheckDone(xlde->d_name))

         # 归档关闭返回真

         # 存在done文件返回真

         # 存在.ready返回假

         # recheck存在done文件返回真

         # 重建.ready文件返回假

   {

   /* Update the last removed location in shared memory first */

   UpdateLastRemovedPtr(xlde->d_name);

        

         # 回收 或者 直接删除,清理.done和.ready文件

   RemoveXlogFile(xlde->d_name, endptr);

   }

  }

  }

   ...

   ...

}

2.2 归档逻辑

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

static void

pgarch_ArchiverCopyLoop(void)

{

  char xlog [MAX_XFN_CHARS + 1];

  

   # 拿到最老那个没有被归档的xlog文件名

  while (pgarch_readyXlog(xlog))

  {

  int   failures = 0;

  for (;;)

  {

   /*

   * Do not initiate any more archive commands after receiving

   * SIGTERM, nor after the postmaster has died unexpectedly. The

   * first condition is to try to keep from having init SIGKILL the

   * command, and the second is to avoid conflicts with another

   * archiver spawned by a newer postmaster.

   */

   if (got_SIGTERM || !PostmasterIsAlive())

   return ;

   /*

   * Check for config update . This is so that we'll adopt a new

   * setting for archive_command as soon as possible, even if there

   * is a backlog of files to be archived.

   */

   if (got_SIGHUP)

   {

   got_SIGHUP = false ;

   ProcessConfigFile(PGC_SIGHUP);

   }

   # archive_command没设的话不再执行

       # 我们的command没有设置,走的是这个分支

   if (!XLogArchiveCommandSet())

   {

   /*

    * Change WARNING to DEBUG1, since we will left archive_command empty to

    * let external tools to manage archive

    */

   ereport(DEBUG1,

    (errmsg( "archive_mode enabled, yet archive_command is not set" )));

   return ;

   }

       # 执行归档命令!

   if (pgarch_archiveXlog(xlog))

   {

   # 成功了,把.ready改名为.done

   pgarch_archiveDone(xlog);

   /*

    * Tell the collector about the WAL file that we successfully

    * archived

    */

   pgstat_send_archiver(xlog, false );

   break;  /* out of inner retry loop */

   }

   else

   {

   /*

    * Tell the collector about the WAL file that we failed to

    * archive

    */

   pgstat_send_archiver(xlog, true );

   if (++failures >= NUM_ARCHIVE_RETRIES)

   {

    ereport(WARNING,

     (errmsg( "archiving transaction log file \"%s\" failed too many times, will try again later" ,

      xlog)));

    return ; /* give up archiving for now */

   }

   pg_usleep(1000000L); /* wait a bit before retrying */

   }

  }

  }

}

2.3 ready生成逻辑

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

static void

XLogWrite(XLogwrtRqst WriteRqst, bool flexible)

{

...

       if (finishing_seg)

   {

   issue_xlog_fsync(openLogFile, openLogSegNo);

   /* signal that we need to wakeup walsenders later */

   WalSndWakeupRequest();

   LogwrtResult.Flush = LogwrtResult.Write; /* end of page */

         # 归档打开 && wal_level >= archive

   if (XLogArchivingActive())

           # 生成ready文件

    XLogArchiveNotifySeg(openLogSegNo);

   XLogCtl->lastSegSwitchTime = (pg_time_t) time ( NULL );

...

2.4 总结

ready文件只要满足archive_mode=on和wal_lever>=archive,就总会生成(XLogWrite函数调用生成)

因为archive_command设置空,所以ready文件的消费完全由外部程序控制

done文件的处理由PG完成,两个地方会触发done文件处理,检查点和重启点

处理多少done文件受wal_keep_segments和replication_slot控制(KeepLogSeg函数)

3 WAL段累积的原因(长求总?)

注意:无论如何注意不要手动删除xlog文件

注意:checkpoint产生的日志回不立即生成ready文件,是在下一个xlog后一块生成的

3.1 ReplicationSlot

打开流了复制槽

?

1

2

3

4

5

-- 流复制插槽

-- 如果restart_lsn和当前XLOG相差非常大的字节数, 需要排查slot的订阅者是否能正常接收XLOG,

-- 或者订阅者是否正常. 长时间不将slot的数据取走, pg_xlog目录可能会撑爆

select pg_xlog_location_diff(pg_current_xlog_location(),restart_lsn), *

from pg_replication_slots;

删除

?

1

select pg_drop_replication_slot( 'xxx' );

删除后PG会在下一个checkpoint清理xlog

3.2 较大的wal_keep_segments

检查参数配置,注意打开这个参数会使xlog和ready有一定延迟

3.3 回收出现问题

如果不使用PG自动回收机制,数据库依赖外部程序修改.ready文件,需要检测回收进程

?

1

(archive_mode= on archive_command= '' )

3.4 检查点间隔过长

检查参数配置

以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。

原文链接:https://blog.csdn.net/jackgo73/article/details/90108958

查看更多关于Postgresql中xlog生成和清理逻辑操作的详细内容...

  阅读:53次