Friday, December 15, 2017

pg_hexedit now supports GiST, GIN, and hash indexes

I've added several enhancements to pg_hexedit, the experimental hex editor toolkit that allows you to open up raw PostgreSQL relation files with useful tags and annotations about the state and purpose of each field. The tool now supports annotations for GiST, GIN, and hash indexes, as well as sequences.

GIN "posting tree" leaf page. Compressed TIDs are in orange.

Sunday, November 26, 2017

pg_hexedit: Rich hex editor annotations for Postgres relfiles

I've written an experimental tool for presenting PostgreSQL relation files in a hex editor with annotations/tags and tooltips that show the structure of the data and its content, including bit field values. This tool is called pg_hexedit, and is available from:

https://github.com/petergeoghegan/pg_hexedit

pg_hexedit is built on top of the open source, cross-platform GUI hex editor wxHexEditor. Since it's an experimental tool that is primarily made available for educational purposes, you are well advised to not use it on any data directory that isn't entirely disposable. It may cause data corruption. Opening a Postgres relation file in a hex editor while the server is running is a fundamentally unsafe thing to do if you care about your data. Use of the tool should be limited to throwaway installations on users' personal machines.

Thursday, October 26, 2017

amcheck “table-matches-index” enhancement now available, detects "freeze-the-dead" corruption

I’m pleased to announce that v1.2 of amcheck, a tool for detecting that PostgreSQL relations are logically consistent (that they do not appear to be corrupt) is now generally available. This version adds a big enhancement - the optional ability to check if every tuple that should have an entry in the index does in fact have such an entry. Specifically, we check for a table entry with matching data, as well as a matching heap TID. This happens at the end of the existing tests, as an optional extra step.

This enhancement is significant because it seems much more likely to catch corruption in the wild. In general, inconsistencies between a table and its indexes are more likely to occur than inconsistencies between blocks within an index for many reasons. There is simply a much larger window for an inconsistency to arise when something is amiss with database storage that breaks the assumptions PostgreSQL makes during crash recovery, for example.

The enhancement is also significant because it played a role in identifyingPostgreSQL data corruption bug that will be fixed in the next point release, scheduled for November 9th, 2017. This bug affects all supported PostgreSQL versions. It was informally dubbed the “freeze-the-dead” bug.

November 6 2017 update: The fix was reverted due to additional concerns that came to light. The community is working on a new, more comprehensive fix for the next point release.

December 15 2017 update: A new fix has been committed, and will appear in 9.3.21, 9.4.16, 9.5.11, 9.6.7, and 10.2 point releases, scheduled for February 8th, 2018.

Packages for v1.2 are available from the community Debian/Ubuntu apt repository, as well as packages from the community Redhat/CentOS/SLES yum repository. Full details on installing these packages are available from the README:

https://github.com/petergeoghegan/amcheck/

Sunday, October 15, 2017

amcheck for Postgres 9.4+ now available from PGDG apt and yum repositories

amcheck, a tool for index corruption detection, now has packages available from the community Debian/Ubuntu apt repository, as well as packages from the community Redhat/CentOS/SLES yum repository.

This means that installations built on those community resources can easily install amcheck, even on PostgreSQL versions before PostgreSQL 10, the release that contrib/amcheck actually first appears in.

Full details on installing these packages are available from the README: https://github.com/petergeoghegan/amcheck/

It's also possible to install the packages on PostgreSQL 10, because the extension these packages install is actually named "amcheck_next" (not "amcheck"). Currently, it isn't really useful to install "amcheck_next" on PostgreSQL 10, because its functionality is identical to contrib/amcheck. That's expected to change soon, though. I will add a new enhancement to amcheck_next in the coming weeks, allowing verification functions to perform "heap matches index" verification on top of what is already possible.

Many thanks to Christoph Berg and Devrim Gündüz for their help with the packaging.

Tuesday, July 18, 2017

PostgreSQL Index bloat under a microscope

I've posted a snippet query to the PostgreSQL Wiki that "summarizes the keyspace" of a target B-Tree index. This means that it displays which range of indexed values belong on each page, starting from the root. It requires pageinspect. The query recursively performs a breadth-first search. Along the way, it also displays information about the space utilization of each page, and the number of distinct key values that actually exist on the page, allowing you to get a sense of how densely filled each page is relative to what might be expected.

The query is available from:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index