Bucketing

From ER/Studio Data Architect
Jump to: navigation, search

Go Up to Developing the Physical Model

On the Bucketing tab of the Table Editor for the Hive platform, you can select the columns of a table on which to cluster. When at least one of the available columns has been added to the selection, the Sorted box becomes available. Allowing you to choose which columns to sort in the cluster.

Bucketed tables allow a more efficient sampling than non-bucketed tables, and may later allow for time saving operations such as mapside joins.


Bucketing Tab.png

The following options are:

  • Available Columns: Displays all the columns available to add to the bucket. Select the column you want to add to the organizing keys and move it to the Selected Columns box. Use the left and right arrows to move columns to and from the Selected Columns box.
  • Selected Columns: Displays the columns that make up the cluster.
  • Up / Down buttons: Let you reorder the columns in the bucket. The column order can affect the access speed. The most frequently accessed columns should be at the top of the Selected Columns list.


  • INTO variableN BUCKETS: This box allows you to specify the number of buckets in your table.

See Also