SoFunction
Updated on 2024-10-29

Example tutorial for python to make bullet and waterfall charts based on Bokeh library

First, let's do an import and make Bokeh's output appear in our notebook:

from  import show, output_notebook
from  import PuBu4
from  import figure
from  import Label

output_notebook()

Bullet diagram

In this example, we will populate the data with a python list. We could modify it to fit the Pandas dataframe, but we'll stick with the simple Python datatype for this example:

data= [("John Smith", 105, 120),
       ("Jane Jones", 99, 110),
       ("Fred Flintstone", 109, 125),
       ("Barney Rubble", 135, 123),
       ("Mr T", 45, 105)]

limits = [0, 20, 60, 100, 160]
labels = ["Poor", "OK", "Good", "Excellent"]
cats = [x[0] for x in data]

A tricky part of the code is constructing a list of categories in the cat variable on the y-axis.

The next step is to create the scatterplot and set a few options related to the x-axis and how the gridlines are displayed. As mentioned above, we use the cats variable to define all the categories in y_range.

p=figure(title="Sales Rep Performance", plot_height=350, plot_width=800, y_range=cats)
p.x_range.range_padding = 0
.grid_line_color = None
[0].ticker.num_minor_ticks = 0

The next section will create colored range bars using the hbar of the bokeh. To do this, we need to define the left and right ranges of each bar as well as the color. We can use python's zip function to create the data structure we need.

zip(limits[:-1], limits[1:], PuBu4[::-1])

# The results are as follows:
[(0, 20, '#f1eef6'),
 (20, 60, '#bdc9e1'),
 (60, 100, '#74a9cf'),
 (100, 160, '#0570b0')]

Here's how to combine them to create a color range.

for left, right, color in zip(limits[:-1], limits[1:], PuBu4[::-1]):
	(y=cats, left=left, right=right, height=0.8, color=color)

The results are as follows:

在这里插入图片描述

We use a similar process to add a black bar to each performance metric.

perf = [x[1] for x in data]
(y=cats, left=0, right=perf, height=0.3, color="black")

The last token we need to add is a segment showing the target value.

comp = [x[2]for x in data]
(x0=comp, y0=[(x, -0.5) for x in cats], x1=comp,
          y1=[(x, 0.5) for x in cats], color="white", line_width=2)

The results are as follows:

在这里插入图片描述

The final step is to add tags to each range. We can use zip to create the label structure we need and then add each label to the layout.

for start, label in zip(limits[:-1], labels):
    p.add_layout(Label(x=start, y=0, text=label, text_font_size="10pt",
                       text_color='black', y_offset=5, x_offset=15))

The results are as follows:

在这里插入图片描述

waterfalls

Constructs a dataframe to be used as the presentation's dataframe.

# Create the initial dataframe
index = ['sales','returns','credit fees','rebates','late charges','shipping']
data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}
df = (data=data,index=index)

# Determine the total net value by adding the start and all additional transactions
net = df['amount'].sum()

The results are as follows:

在这里插入图片描述

The final waterfall code will require us to define several additional attributes for each segment, including:

Starting position;

Strip color;

Label Position;

Label Text;

By adding it to a single dataframe, we can use Bokeh's built-in functionality to simplify the final code.

For the next step, we'll add the running total, segment start position, and the location of the label.

df['running_total'] = df['amount'].cumsum()
df['y_start'] = df['running_total'] - df['amount']

# Where do we want to place the label?
df['label_pos'] = df['running_total']

Next, we add a line to the bottom of the data box containing the net worth.

df_net = .from_records([(net, net, 0, net)],
                                   columns=['amount', 'running_total', 'y_start', 'label_pos'],
                                   index=["net"])
df = (df_net)

For this particular waterfall, I want to set the negative values to a different color and format the labels below the chart. Let's add columns to the data box using values.

df['color'] = 'grey'
[ < 0, 'color'] = 'red'
[ < 0, 'label_pos'] = df.label_pos - 10000
df["bar_label"] = df["amount"].map('{:,.0f}'.format)

This is the final dataframe containing all the data we need. It does require some manipulation of the data to get to this state, but it's fairly standard Pandas code and easy to debug if something goes wrong.

在这里插入图片描述

Creating the actual plot is fairly standard Bokeh code, since the data frame has all the values we need.

TOOLS = "box_zoom,reset,save"
source = ColumnDataSource(df)
p = figure(tools=TOOLS, x_range=list(), y_range=(0, net+40000),
           plot_width=800, title = "Sales Waterfall")

By defining the ColumnDataSource as our dataframe, Bokeh takes care of creating all the segments and labels without any looping.

(x0='index', y0='y_start', x1="index", y1='running_total',
          source=source, color="color", line_width=55)

We'll do a little formatting to add labels and format the y-axis nicely.

.grid_line_alpha=0.3
[0].formatter = NumeralTickFormatter(format="($ 0 a)")
.axis_label = "Transactions"

The final step is to add all the labels to the bars using LabelSet.

labels = LabelSet(x='index', y='label_pos', text='bar_label',
                  text_font_size="8pt", level='glyph',
                  x_offset=-20, y_offset=0, source=source)
p.add_layout(labels)

The results are as follows:

在这里插入图片描述

The above is python based on Bokeh library to make bullet and waterfall diagram example tutorial details, more information about the Bokeh library based on the production of bullet and waterfall diagram please pay attention to my other related articles!