First, let's do an import and make Bokeh's output appear in our notebook:
from import show, output_notebook from import PuBu4 from import figure from import Label output_notebook()
Bullet diagram
In this example, we will populate the data with a python list. We could modify it to fit the Pandas dataframe, but we'll stick with the simple Python datatype for this example:
data= [("John Smith", 105, 120), ("Jane Jones", 99, 110), ("Fred Flintstone", 109, 125), ("Barney Rubble", 135, 123), ("Mr T", 45, 105)] limits = [0, 20, 60, 100, 160] labels = ["Poor", "OK", "Good", "Excellent"] cats = [x[0] for x in data]
A tricky part of the code is constructing a list of categories in the cat variable on the y-axis.
The next step is to create the scatterplot and set a few options related to the x-axis and how the gridlines are displayed. As mentioned above, we use the cats variable to define all the categories in y_range.
p=figure(title="Sales Rep Performance", plot_height=350, plot_width=800, y_range=cats) p.x_range.range_padding = 0 .grid_line_color = None [0].ticker.num_minor_ticks = 0
The next section will create colored range bars using the hbar of the bokeh. To do this, we need to define the left and right ranges of each bar as well as the color. We can use python's zip function to create the data structure we need.
zip(limits[:-1], limits[1:], PuBu4[::-1]) # The results are as follows: [(0, 20, '#f1eef6'), (20, 60, '#bdc9e1'), (60, 100, '#74a9cf'), (100, 160, '#0570b0')]
Here's how to combine them to create a color range.
for left, right, color in zip(limits[:-1], limits[1:], PuBu4[::-1]): (y=cats, left=left, right=right, height=0.8, color=color)
The results are as follows:
We use a similar process to add a black bar to each performance metric.
perf = [x[1] for x in data] (y=cats, left=0, right=perf, height=0.3, color="black")
The last token we need to add is a segment showing the target value.
comp = [x[2]for x in data] (x0=comp, y0=[(x, -0.5) for x in cats], x1=comp, y1=[(x, 0.5) for x in cats], color="white", line_width=2)
The results are as follows:
The final step is to add tags to each range. We can use zip to create the label structure we need and then add each label to the layout.
for start, label in zip(limits[:-1], labels): p.add_layout(Label(x=start, y=0, text=label, text_font_size="10pt", text_color='black', y_offset=5, x_offset=15))
The results are as follows:
waterfalls
Constructs a dataframe to be used as the presentation's dataframe.
# Create the initial dataframe index = ['sales','returns','credit fees','rebates','late charges','shipping'] data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]} df = (data=data,index=index) # Determine the total net value by adding the start and all additional transactions net = df['amount'].sum()
The results are as follows:
The final waterfall code will require us to define several additional attributes for each segment, including:
Starting position;
Strip color;
Label Position;
Label Text;
By adding it to a single dataframe, we can use Bokeh's built-in functionality to simplify the final code.
For the next step, we'll add the running total, segment start position, and the location of the label.
df['running_total'] = df['amount'].cumsum() df['y_start'] = df['running_total'] - df['amount'] # Where do we want to place the label? df['label_pos'] = df['running_total']
Next, we add a line to the bottom of the data box containing the net worth.
df_net = .from_records([(net, net, 0, net)], columns=['amount', 'running_total', 'y_start', 'label_pos'], index=["net"]) df = (df_net)
For this particular waterfall, I want to set the negative values to a different color and format the labels below the chart. Let's add columns to the data box using values.
df['color'] = 'grey' [ < 0, 'color'] = 'red' [ < 0, 'label_pos'] = df.label_pos - 10000 df["bar_label"] = df["amount"].map('{:,.0f}'.format)
This is the final dataframe containing all the data we need. It does require some manipulation of the data to get to this state, but it's fairly standard Pandas code and easy to debug if something goes wrong.
Creating the actual plot is fairly standard Bokeh code, since the data frame has all the values we need.
TOOLS = "box_zoom,reset,save" source = ColumnDataSource(df) p = figure(tools=TOOLS, x_range=list(), y_range=(0, net+40000), plot_width=800, title = "Sales Waterfall")
By defining the ColumnDataSource as our dataframe, Bokeh takes care of creating all the segments and labels without any looping.
(x0='index', y0='y_start', x1="index", y1='running_total', source=source, color="color", line_width=55)
We'll do a little formatting to add labels and format the y-axis nicely.
.grid_line_alpha=0.3 [0].formatter = NumeralTickFormatter(format="($ 0 a)") .axis_label = "Transactions"
The final step is to add all the labels to the bars using LabelSet.
labels = LabelSet(x='index', y='label_pos', text='bar_label', text_font_size="8pt", level='glyph', x_offset=-20, y_offset=0, source=source) p.add_layout(labels)
The results are as follows:
The above is python based on Bokeh library to make bullet and waterfall diagram example tutorial details, more information about the Bokeh library based on the production of bullet and waterfall diagram please pay attention to my other related articles!