概要 | |
---|---|
目的 | Averageのとり方 - Weighted Average, Moving Average, Moving Weighted Average |
分類 | SQL Cookbook |
SQL | Standard SQL |
環境 | BigQuery |
Table of Contents
問題1: Weighted Averageの計算方法
以下のようなデータ(X, Y, W)を考えたとき、Xの値ごとにYの加重平均を取りたいとします. WeightはWカラムの値とします
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
X Y W
10 10 60
10 20 10
20 10 10
20 20 50
20 30 50
20 40 20
30 10 10
30 20 50
30 30 100
30 40 50
30 50 10
40 20 10
40 30 30
40 40 40
40 50 20
50 30 20
50 40 20
50 50 40
Data生成クエリ
1
2
3
4
5
6
7
8
9
10
11
12
13
WITH data AS (
SELECT
[10, 10, 20, 20, 20, 20, 30, 30, 30, 30, 30, 40, 40, 40, 40, 50, 50, 50] AS X,
[10, 20, 10, 20, 30, 40, 10, 20, 30, 40, 50, 20, 30, 40, 50, 30, 40, 50] AS Y,
[60, 10, 10, 50, 50, 20, 10, 50, 100, 50, 10, 10, 30, 40, 20, 20, 20, 40] AS W
)
SELECT
X,
COALESCE(Y[SAFE_OFFSET(offset)], 0) AS Y,
COALESCE(W[SAFE_OFFSET(offset)], 0) AS W
FROM
data,
UNNEST(X) AS X WITH OFFSET AS offset
Weighted Averageの計算
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WITH data AS(
SELECT
X,
COALESCE(Y[SAFE_OFFSET(offset)], 0) AS Y,
COALESCE(W[SAFE_OFFSET(offset)], 0) AS W
FROM
(
SELECT
[10, 10, 20, 20, 20, 20, 30, 30, 30, 30, 30, 40, 40, 40, 40, 50, 50, 50] AS X,
[10, 20, 10, 20, 30, 40, 10, 20, 30, 40, 50, 20, 30, 40, 50, 30, 40, 50] AS Y,
[60, 10, 10, 50, 50, 20, 10, 50, 100, 50, 10, 10, 30, 40, 20, 20, 20, 40] AS W
) AS input,
UNNEST(X) AS X WITH OFFSET AS offset
)
SELECT
X,
AVG(Y) AS naive_mean_Y,
SUM(Y * W)/SUM(W) AS weighted_mean_Y,
FROM
data
GROUP BY
X
ORDER BY
X
結果
1
2
3
4
5
6
7
X naive_mean_Y weighted_mean_Y
10 15.0 11.428571428571429
20 25.0 26.153846153846153
30 30.0 30.0
40 35.0 37.0
50 40.0 42.5
問題2: Moving AverageとMoving Medianの計算方法
(observe_date, Y)のレコードから構成されるデータを考えます. observe_dateは観測日, Yはその観測日に対応する確率変数とします. ここで、過去7日間ごとの移動平均と直近7日の中央値を計算したいとします.
Data生成
1
2
3
4
5
6
7
8
9
10
11
WITH data AS (
SELECT
GENERATE_DATE_ARRAY(DATE('2020-10-01'), DATE('2021-09-30'), INTERVAL 1 DAY) AS time_index,
GENERATE_ARRAY(1, 365, 1) AS Y
)
SELECT
observe_date,
MOD(ABS(FARM_FINGERPRINT(CAST(Y[SAFE_OFFSET(offset)] AS STRING))), 100) AS Y
FROM
data,
UNNEST(time_index) AS observe_date WITH OFFSET AS offset
Moving Averageの計算
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
CREATE TEMPORARY FUNCTION ARRAY_INT64_SORT(arr ARRAY<INT64>)
RETURNS ARRAY<INT64> AS ((
SELECT ARRAY_AGG(x) FROM(
SELECT x FROM UNNEST(arr) AS x ORDER BY x
)
));
WITH
data AS (
SELECT
observe_date,
MOD(ABS(FARM_FINGERPRINT(CAST(Y[SAFE_OFFSET(offset)] AS STRING))), 100) AS Y
FROM
(
SELECT
GENERATE_DATE_ARRAY(DATE('2020-10-01'), DATE('2021-09-30'), INTERVAL 1 DAY) AS time_index,
GENERATE_ARRAY(1, 365, 1) AS Y
) AS raw_data,
UNNEST(time_index) AS observe_date WITH OFFSET AS offset
),
agg_data AS(
SELECT
observe_date,
Y,
AVG(Y) OVER (ORDER BY observe_date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS weekly_moving_average,
ARRAY_INT64_SORT(ARRAY_AGG(Y) OVER (ORDER BY observe_date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW)) AS weekly_array
FROM
data
)
SELECT
observe_date,
Y,
weekly_moving_average,
weekly_array[ORDINAL(CAST(CEIL(ARRAY_LENGTH(weekly_array)*0.5)AS INT64))] AS weekly_median,
fhoffa.x.median(weekly_array) AS fhoffa_weekly_median
FROM
agg_data
fohoffa.x.meadianの挙動
1
2
3
4
5
SELECT
fhoffa.x.median([1,2,3]) median_1,
fhoffa.x.median([1,2,3,4]) median_2,
fhoffa.x.median([-10,2,3,4]) median_3,
fhoffa.x.median([-10,2,3,1000]) median_4
THEN
1
2
median_1 median_2 median_3 median_4
2.0 2.5 2.5 2.5
問題3: Moving Weighted Averageの計算方法
問題2を少し改良して、分析データに新たにW(0~9のランダムのINT64)というカラムを追加します. このWの値をweightとして用いて、YについてのMoving averageを計算します.
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
WITH
data AS (
SELECT
observe_date,
MOD(ABS(FARM_FINGERPRINT(CAST(Y[SAFE_OFFSET(offset)] AS STRING))), 100) AS Y,
MOD(ABS(FARM_FINGERPRINT(CAST(Y[SAFE_OFFSET(offset)] AS STRING))), 10) AS W
FROM
(
SELECT
GENERATE_DATE_ARRAY(DATE('2020-10-01'), DATE('2021-09-30'), INTERVAL 1 DAY) AS time_index,
GENERATE_ARRAY(1, 365, 1) AS Y
) AS raw_data,
UNNEST(time_index) AS observe_date WITH OFFSET AS offset
)
SELECT
observe_date,
Y,
AVG(Y) OVER (ORDER BY observe_date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS weekly_moving_average,
SUM(Y*W) OVER (ORDER BY observe_date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW)
/
SUM(W) OVER (ORDER BY observe_date ROWS BETWEEN 7 PRECEDING ANCURRENT ROW) AS weekly_weighted_moving_average
FROM
data
統計
Python
math
Linux
Ubuntu 20.04 LTS
Shell
English
git
方法論
Ubuntu 22.04 LTS
統計検定
競技プログラミング
フーリエ解析
前処理
SQL
coding
コミュニケーション
Network
ssh
将棋
Data visualization
Docker
Econometrics
VSCode
statistical inference
GitHub Pages
apt
development
システム管理
Coffee
cloud
数値計算
素数
Book
Font
Metrics
Poetry
Ubuntu 24.04 LTS
architecture
aws
shell
systemctl
テンプレート
データ構造
ポワソン分布
会計分析
文字コード
環境構築
論文
App
Bayesian
Dynamic Programming
Keyboard
Processing
R
Steam
filesystem
quarto
regex
(注意:GitHub Accountが必要となります)