Sort vs. Hash: A Duality by Peter Geoghegan

Tuesday, November 15 at 11:20-12:15

PostgreSQL 9.5 and 9.6 significantly improved upon the performance of both hash joins, and sort operations. Sorts are often used as input to GroupAggregate nodes and merge joins. While both approaches have various strengths and weaknesses, and are essential components of the PostgreSQL executor, their relative importance has somewhat shifted over the years. This happened due to trends in CPU and storage performance characteristics, and various improvements that gradually made their way into Postgres.

Capabilities expected to be part of PostgreSQL 10 may further complicate this picture; parallel hash join and parallel sort add another dimension that must be considered. This may force us to further revise the "Sort vs. Hash" analysis in the coming years.

In this talk, I'll discuss:

  • Why merge joins may be faster than hash joins for particular cases, and vice-versa. (Nested-loop joins will be briefly discussed.)
  • Improvements that have been made in both areas, and improvements that are tentatively scheduled for the next Postgres release.
  • How to conceptualize both approaches, to understand why the optimizer may prefer one or the other of the two general approaches in practice.
  • A historical perspective: the waxing and waning of sort merge join since the 1970s.

About the Speaker

Peter has been involved in PostgreSQL development since 2011 as both a reviewer and feature implementer, and is a major contributor. His development projects tend to be centered on improving PostgreSQL's performance in various ways, including adding support for group commit, as well as a succession of improvements to the PostgreSQL sort code down through the years.

Originally from Dublin, Ireland, Peter now lives in San Francisco and works on Heroku's popular cloud database service.


Friday, September 16, 2016 - 12:15