Distributing Columns for Amazon Redshift

From ER/Studio Data Architect
Jump to: navigation, search

Go Up to Creating and Editing Tables

The Distribution tab of the Table Editor for the Amazon Redshift platform allows you to create the DISTSTYLE and DISTKEY clauses for a table. The Distribution Style allows you to choose between one of the following values:

  • AUTO
  • KEY: Distribution Keys (DISTKEY) means the data is distributed by the values in the DISTKEY column(s). When join columns of joining tables are set as distribution keys, the joining rows from both tables are collocated on the compute nodes. This allows the optimizer to perform joins more efficiently. When DISTSTYLE of KEY is specified, one or more DISTKEY columns must be specified for the table.
  • EVEN: EVEN means that data in the table spreads evenly across the nodes in a cluster in round-robin distribution determined by Row IDs. The result is the distribution of approximately the same number of rows to each node. EVEN is the default distribution style and assumed unless a different DISTSTYLE is specified.
  • ALL: ALL means that a copy of the entire table is distributed to every node. This distribution style ensures that all the rows required for any join to this table are available on every node. The downside is that it multiplies storage requirements, increases load time, and increases maintenance times for the table. ALL distribution can improve execution time when used with certain dimension tables where KEY distribution is not appropriate. It is generally suited to small tables used frequently in joins.

This translates to the DISTSTYLE clause of the DDL.

When the distribution style is chosen as KEY, the Distribution Keys selection is enabled and allows you to choose the distribution key column. This translates to the DISTKEY clause in the DDL.

See Also