Annotating Data

In [1]:
import holoviews as hv
import holoviews.util
hv.extension('bokeh')

As introduced in the Getting Started guide , HoloViews relies heavily on semantic annotations , i.e., metadata you declare that lets HoloViews interpret what your data represents. With these annotations, HoloViews can perform complex tasks like visualization automatically.

There are three main kinds of annotation that can be associated with each element:

  1. Type , used to declare the sort of data you have, which is required before it can be visualized,
  2. Dimensions , used to specify the abstract space in which the data resides, allowing axis labeling and indexing, and
  3. Group/Label , used to declare a meaningful category and human-readable description of the element, allowing plot labeling and selecting related sets of elements.

This user guide explains each of these three types of annotation, describing why you would need or want to use them.

1. Specifying element type

Basic Python data structures like dataframes, arrays, lists, and dictionaries can be used to represent an infinite variety of different types of data, and thus they cannot be visualized as any particular type of graphical representation without some additional information from the user that says what sort of data it is meant to be. The user can declare this information by selecting a suitable HoloViews element type from the many different ones available (see the Reference Gallery ).

For instance, let's say you have two lists of numbers:

In [2]:
xs = range(-10,11)
ys = [100-x**2 for x in xs]

As far as Python is concerned, xs and ys are just two arbitrary lists, which could represent nearly anything imaginable. But we as humans can see that each of the ys is a value computed from one of the xs by evaluating the function $y=100-x^2$. We can convey some of that information to HoloViews by choosing a Curve element type, which is a convenient shorthand for "a discrete set of real-valued samples from a continuous function of one real-valued variable":

In [3]:
curve = hv.Curve((xs, ys))
curve
Out[3]:

As you can see, declaring the element type is the only required bit of annotation, instantly making your data visualizable. However, this initial visualization relies on various defaults that may not be appropriate for your data, and you can override these defaults by declaring additional annotations as described below.

2. Specifying element dimensionality

Each element type can process a certain number and type of dimensions , i.e., ways in which the data can vary. For instance, the Curve object above has two dimensions, $x$ and $y$. If you look at how we generated the data, you can see that these two dimensions are semantically different -- we chose an arbitrary set of values for the xs , and then calculated a corresponding value to make each of the ys . In mathematical terms, $x$ is thus an independent variable (selected by the creator of the data), and $y$ is a dependent variable (typically measured or calculated from the independent variable(s)).

HoloViews elements call these two different types of variables key dimensions ( kdims ) and value dimensions ( vdims ). The key dimensions are the dimensions you can index by to get the values corresponding to the value dimensions. You can learn more about indexing data in the later Indexing and Selecting Data user guide.

Different elements have different numbers of required key dimensions and value dimensions. For instance, a Curve always has one key dimension and one value dimension. As we did not explicitly specify anything regarding dimensions when declaring the curve above, the kdims and vidms use their default names 'x' and 'y':

In [4]:
"Object 'curve' has kdims {kdims} and vdims {vdims}".format(kdims=curve.kdims, vdims=curve.vdims)
Out[4]:
"Object 'curve' has kdims [Dimension('x')] and vdims [Dimension('y')]"

The easiest way to override the default dimension names is to provide strings for the dimensions, where the second argument in the Element constructor will always be the kdims , and the third will always be the vdims :

In [5]:
trajectory = hv.Curve((xs, ys), 'distance', 'height')
trajectory
Out[5]:
In [6]:
"Object 'trajectory' has kdims {kdims} and vdims {vdims} ".format(kdims=trajectory.kdims, vdims=trajectory.vdims)
Out[6]:
"Object 'trajectory' has kdims [Dimension('distance')] and vdims [Dimension('height')] "

We can see that the strings we provided have been 'promoted' to dimension objects. The kdims and vdims always contain instances of the Dimension class, described in the following section. Here, the immediate effect is to use the new names for the displayed axis labels.

Dimension parameters

Dimensions are not just names, they are rich objects with numerous parameters that can be used to describe the space in which the data resides. Only two of these are considered core parameters that uniquely identify the dimension object; the rest are auxilliary metadata. The most important parameters are:


``name``
(core) A concise name for the dimension, which for convenient usage as a keyword argument should usually be a legal Python identifier.
``label``
(core) A optional longer description of the dimension, which is convenient if you want the displayed label to contain arbitrary spaces, symbols, or unicode.
``range``
The minimum and maximum allowable values for the dimension, for error checking and generating widgets when needed.
``soft_range``
Suggested minimum and maximum values within the allowed range, used to specify a useful portion of the range for widgets and animations.
``step``
Suggested interval for sampling a continuous range, if needed for a widget or animation.
``unit``
The name of the unit to be associated with the dimension, if any, for labelling.
``values``
Explicit list of allowed dimension values, for error checking, widgets, and animations.

For the full list of parameters, you can call hv.help(hv.Dimension) .

Similar to how you can just use a string if all you want to specify is the name, you can provide a (name,label) tuple if you want to specify the name and the label to kdims and vdims without building an explicit Dimension :

In [7]:
wo_unit = hv.Curve((xs, ys), 
                   ('distance','Horizontal distance'), 
                   ('height','Height above sea level'))

distance = hv.Dimension('distance', label='Horizontal distance', unit='m')
height = hv.Dimension(('height','Height above sea level'), unit='m')
with_unit = hv.Curve((xs, ys), distance, height)

# (using + to compose elements is described in the next guide)
wo_unit + with_unit
Out[7]:

Note that after supplying the longer labels, you can still use the short name to specify the dimension in keyword arguments. For instance, try using with_unit.select(distance=(5,8)) in the cell above.

Setting properties with redim

Declaring dimension objects with appropriate parameters can be awkward and verbose if you only want to set a few specific parameters. You can often avoid declaring explicit dimension objects using the redim method, which returns a clone of the element: the same data, wrapped in a new instance of the same element type with the new dimension settings.

Let's use redim to swap out the 'height' dimension for an 'altitude' dimension:

In [8]:
renamed_height = trajectory.redim(height='altitude')
renamed_height
Out[8]:

The redim "method" is actually a utility that can be used to set any of the dimension parameters, such as the label, unit, range, or values. For instance, the label can be updated on an existing object by specifying the dimension name and then the new value for that parameter:

In [9]:
renamed_height.redim.label(altitude='Altitude above sea-level', distance='Horizontal distance')
Out[9]:

3. Organizing your elements with groups and labels

A complex visualization you build with HoloViews may include many instances of the same element type, each built from different bits of data and potentially representing categorically distinct types of information to you. To help you keep track of these distinctions when you need to, HoloViews provides a group parameter you can use to declare semantically distinct categories for elements, and a label parameter you can use to identify which specific item the element represents within that category:

In [10]:
low_ys = [25-(0.5*el)**2 for el in xs]
hv.Curve((xs, low_ys), group='Trajectory', label='Shallow') + \
hv.Curve((xs, ys), group='Trajectory', label='Medium')
Out[10]:

As you can see, the group and label information will be used to generate sensible titles, here indicating that both sets of data represent trajectories, and that there are two different specific trajectories being shown. Once the group and/or label have been specified, they can be used for Customizing Plots (e.g. to make all trajectories have the same line width and style, or to customize one particular plot out of many of the same type). The group and label are also used for indexing, as we will see in the following Composing_Elements guide.


Right click to download this notebook from GitHub.