Table of Contents
What is this post about?
Rule
- When designing any graph showing categorical data, be thoughful about how your categories are ordered.
Common Problem
- Sometimes, the default alphabetical order does not make sense for your audiences.
- When your category represent the weekday, the order should be weekday order, not alphabetical order.
- How can we specify the categorical order with axis when plotting (here, horizontal-bar-plotting) the data
Goal
Code in practice
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## import
import pandas as pd
import matplotlib.pyplot as plt
from pandas.api.types import CategoricalDtype
## GDP
df = pd.DataFrame({
'col1': ['Sun', 'Sat', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sun', 'Sat', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
'col2': [2, 1, 9, 8, 7, 4, 5, 8, 7, 4, 5, 8, 7, 4],
'col3': [8, 7, 1, 1, 9, 0, 4, 9, 8, 9, 8, 7, 4, 9]})
## aggregation
df_agg = df.groupby('col1').agg({'col2':['mean', 'min', 'max'], 'col3':['mean', 'max']})
df_agg.columns = df_agg.columns.droplevel(1) + '_' + df_agg.columns.droplevel(0)
df_agg = df_agg.reset_index()
## specify the categorical order
weekday_order = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
weekday_type = CategoricalDtype(categories=weekday_order, ordered=True)
df_agg['col1'] = df_agg['col1'].astype(weekday_type)
df_agg.sort_values('col1', inplace=True)
## plot
fig, ax = plt.subplots()
ax.barh(df_agg.col1, df_agg.col2_mean, align='center')
ax.invert_yaxis() # labels read top-to-bottom
ax.axvline(df['col2'].mean(),
ls='--',
color='r')
ax.text(x=df['col2'].mean()*1.02,
y=5.5,
s='- Metrics mean\n accross weekday')
ax.set(xlim=[0, 10], xlabel='Metrics', ylabel='Weekday',
title='barh plot example')
plt.show()
References
統計
Python
math
Linux
Ubuntu 20.04 LTS
Shell
English
git
方法論
Ubuntu 22.04 LTS
統計検定
競技プログラミング
フーリエ解析
前処理
SQL
coding
コミュニケーション
Network
ssh
将棋
Data visualization
Docker
Econometrics
VSCode
statistical inference
GitHub Pages
apt
development
システム管理
Coffee
cloud
数値計算
素数
Book
Font
Metrics
Poetry
Ubuntu 24.04 LTS
architecture
aws
shell
systemctl
テンプレート
データ構造
ポワソン分布
会計分析
文字コード
環境構築
論文
App
Bayesian
Dynamic Programming
Keyboard
Processing
R
Steam
filesystem
quarto
regex
(注意:GitHub Accountが必要となります)