Go Up to Developing the Physical Model
On the Skewed By tab of the Table Editor for the Hive platform, you can select the columns of a table on which to Skew by. When at least one of the available columns has been added to the selection, the ON clause box becomes available.
Skewed By can be used to improve performance for tables where one or more columns have skewed values. By specifying the values that appear very often (heavy skew) Hive will split those out into separate files.
The following options are:
- Available Columns: Displays all the columns available to add to the Skewed By. Select the column you want to add to the organizing keys and move it to the Selected Columns box. Use the left and right arrows to move columns to and from the Selected Columns box.
- Selected Columns: Displays the columns that make up the cluster.
- Up / Down buttons: Let you reorder the columns in the bucket. The column order can affect the access speed. The most frequently accessed columns should be at the top of the Selected Columns list.
- ON clause box: This is where you can enter your skew values.