電力使用量の予測

ライブラリの読み込み

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
In [2]:
import gc

気温情報の取り込み

以下のURLより気温データをダウンロード

http://www.data.jma.go.jp/gmd/risk/obsdl/index.php

  • 地点:岡山、広島
  • 項目:時別値、気温
  • 期間:2016/3/31〜2017/4/1

ファイル名を「data_okayama-2016.csv」「data_hiroshima-2016.csv」として保存。

In [3]:
def read_temp(filename):
    df_temp = pd.read_csv(filename,encoding="Shift_JIS",skiprows=4)
    df_temp.columns = ["DATETIME","TEMP","品質情報","均質番号"]
    df_temp.DATETIME = df_temp.DATETIME.map(lambda _: pd.to_datetime(_))
    return df_temp
In [4]:
df_temp_okym = read_temp("data_okayama-2016.csv")
df_temp_okym.rename(columns = {'TEMP':'TEMP_okym'}, inplace=True)
df_temp_okym.shape
Out[4]:
(8808, 4)
In [5]:
df_temp_okym.head()
Out[5]:
DATETIME TEMP_okym 品質情報 均質番号
0 2016-03-31 01:00:00 11.0 8 1
1 2016-03-31 02:00:00 10.6 8 1
2 2016-03-31 03:00:00 10.0 8 1
3 2016-03-31 04:00:00 9.6 8 1
4 2016-03-31 05:00:00 8.7 8 1
In [6]:
df_temp_hrsm = read_temp("data_hiroshima-2016.csv")
df_temp_hrsm.rename(columns = {'TEMP':'TEMP_hrsm'}, inplace=True)
df_temp_hrsm.shape
Out[6]:
(8784, 4)
In [7]:
df_temp_hrsm.head()
Out[7]:
DATETIME TEMP_hrsm 品質情報 均質番号
0 2016-04-01 01:00:00 13.5 8 1
1 2016-04-01 02:00:00 13.2 8 1
2 2016-04-01 03:00:00 13.3 8 1
3 2016-04-01 04:00:00 13.3 8 1
4 2016-04-01 05:00:00 13.5 8 1
In [8]:
df_temp_okym.TEMP_okym.plot(figsize=(15,4))
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x107e5c2b0>
In [9]:
df_temp_hrsm.TEMP_hrsm.plot(figsize=(15,4))
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x1151ac208>

電力需要データの取り込み

以下のURLより中国電力の電力需要データをダウンロード

http://www.energia.co.jp/jukyuu/download.html

In [10]:
df_kw = pd.read_csv("juyo-2016.csv",encoding="Shift_JIS",skiprows=1)
df_kw.shape
Out[10]:
(8760, 3)
In [11]:
df_kw.head()
#kw_df.tail()
Out[11]:
DATE TIME 実績(万kW)
0 2016/4/1 0:00 581
1 2016/4/1 1:00 600
2 2016/4/1 2:00 640
3 2016/4/1 3:00 665
4 2016/4/1 4:00 658
In [12]:
df_kw.columns = ["DATE","TIME","万kW"]
In [13]:
df_kw["MW"] = df_kw["万kW"] * 10
In [14]:
df_kw["DATETIME"] = df_kw.index.map(lambda _: pd.to_datetime(df_kw.DATE[_] + " " + df_kw.TIME[_]))
df_kw.head()
Out[14]:
DATE TIME 万kW MW DATETIME
0 2016/4/1 0:00 581 5810 2016-04-01 00:00:00
1 2016/4/1 1:00 600 6000 2016-04-01 01:00:00
2 2016/4/1 2:00 640 6400 2016-04-01 02:00:00
3 2016/4/1 3:00 665 6650 2016-04-01 03:00:00
4 2016/4/1 4:00 658 6580 2016-04-01 04:00:00
In [15]:
df_kw["MW"].plot(figsize=(15,4))
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x114d7d780>

データ加工

  • 需要データに気温データを追加
In [16]:
df = df_kw.copy()
df.shape
Out[16]:
(8760, 5)
In [17]:
df = df.merge(df_temp_okym,how="inner", on="DATETIME")
df.head()
Out[17]:
DATE TIME 万kW MW DATETIME TEMP_okym 品質情報 均質番号
0 2016/4/1 0:00 581 5810 2016-04-01 00:00:00 13.7 8 1
1 2016/4/1 1:00 600 6000 2016-04-01 01:00:00 13.1 8 1
2 2016/4/1 2:00 640 6400 2016-04-01 02:00:00 13.6 8 1
3 2016/4/1 3:00 665 6650 2016-04-01 03:00:00 14.4 8 1
4 2016/4/1 4:00 658 6580 2016-04-01 04:00:00 14.9 8 1
In [18]:
df = df.merge(df_temp_hrsm,how="inner", on="DATETIME")
df.head()
Out[18]:
DATE TIME 万kW MW DATETIME TEMP_okym 品質情報_x 均質番号_x TEMP_hrsm 品質情報_y 均質番号_y
0 2016/4/1 1:00 600 6000 2016-04-01 01:00:00 13.1 8 1 13.5 8 1
1 2016/4/1 2:00 640 6400 2016-04-01 02:00:00 13.6 8 1 13.2 8 1
2 2016/4/1 3:00 665 6650 2016-04-01 03:00:00 14.4 8 1 13.3 8 1
3 2016/4/1 4:00 658 6580 2016-04-01 04:00:00 14.9 8 1 13.3 8 1
4 2016/4/1 5:00 622 6220 2016-04-01 05:00:00 15.0 8 1 13.5 8 1
  • 月データを追加
In [19]:
df["MONTH"] = df.DATETIME.map(lambda _: _.month)
  • 曜日データを追加
In [20]:
df["WEEK"] = df.DATETIME.map(lambda _: _.weekday())
  • 時間データを追加
In [21]:
df["HOUR"] = df.DATETIME.map(lambda _: _.hour)
In [22]:
df.head()
Out[22]:
DATE TIME 万kW MW DATETIME TEMP_okym 品質情報_x 均質番号_x TEMP_hrsm 品質情報_y 均質番号_y MONTH WEEK HOUR
0 2016/4/1 1:00 600 6000 2016-04-01 01:00:00 13.1 8 1 13.5 8 1 4 4 1
1 2016/4/1 2:00 640 6400 2016-04-01 02:00:00 13.6 8 1 13.2 8 1 4 4 2
2 2016/4/1 3:00 665 6650 2016-04-01 03:00:00 14.4 8 1 13.3 8 1 4 4 3
3 2016/4/1 4:00 658 6580 2016-04-01 04:00:00 14.9 8 1 13.3 8 1 4 4 4
4 2016/4/1 5:00 622 6220 2016-04-01 05:00:00 15.0 8 1 13.5 8 1 4 4 5
  • データ確認・欠損データ削除
In [23]:
df[df.isnull().any(axis=1)].shape
Out[23]:
(2, 14)
In [24]:
df[df.isnull().any(axis=1)]
Out[24]:
DATE TIME 万kW MW DATETIME TEMP_okym 品質情報_x 均質番号_x TEMP_hrsm 品質情報_y 均質番号_y MONTH WEEK HOUR
973 2016/5/11 14:00 698 6980 2016-05-11 14:00:00 NaN 1 1 19.8 8 1 5 2 14
6598 2016/12/31 23:00 659 6590 2016-12-31 23:00:00 3.4 8 1 NaN 1 1 12 5 23
In [25]:
df = df.dropna()
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 8757 entries, 0 to 8758
Data columns (total 14 columns):
DATE         8757 non-null object
TIME         8757 non-null object
万kW          8757 non-null int64
MW           8757 non-null int64
DATETIME     8757 non-null datetime64[ns]
TEMP_okym    8757 non-null float64
品質情報_x       8757 non-null int64
均質番号_x       8757 non-null int64
TEMP_hrsm    8757 non-null float64
品質情報_y       8757 non-null int64
均質番号_y       8757 non-null int64
MONTH        8757 non-null int64
WEEK         8757 non-null int64
HOUR         8757 non-null int64
dtypes: datetime64[ns](1), float64(2), int64(9), object(2)
memory usage: 1.0+ MB
In [26]:
#df.head()
In [27]:
cols = ["MONTH","WEEK","HOUR"]

for col in cols:
    df = df.join(pd.get_dummies(df[col], prefix=col))

#df.info()

学習・検証データの作成

In [28]:
#x_cols = ["MONTH","WEEK","HOUR","TEMP_okym","TEMP_hrsm"]
x_cols = ["TEMP_okym","TEMP_hrsm"] + df.columns.tolist()[14:]
In [29]:
#X = df[x_cols].as_matrix().astype('float64')
X = df[x_cols]
X.shape
Out[29]:
(8757, 45)
In [30]:
#y = df["MW"].as_matrix().astype('int').flatten()
y = df["MW"]
y.shape
Out[30]:
(8757,)
  • ラベル付きデータをトトレーニングセットとテストセットに分割する
In [31]:
# モジュールを読み込む
from sklearn import model_selection

# ラベル付きデータをトレーニングセット (X_train, y_train)とテストセット (X_test, y_test)に分割
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=.2, random_state=42)
#X_train.std()
In [32]:
X_train.shape
Out[32]:
(7005, 45)

正規化

In [33]:
#X_train.std()
In [34]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
Out[34]:
StandardScaler(copy=True, with_mean=True, with_std=True)
In [35]:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
In [36]:
X_train.std()
Out[36]:
1.0

機械学習 with Keras

In [37]:
from keras.models import Sequential
from keras.layers import Dense
#from keras.layers import Input, Dense, Dropout, LSTM, BatchNormalization, Activation

from keras.wrappers.scikit_learn import KerasRegressor

#from sklearn.model_selection import cross_val_score
#from sklearn.model_selection import KFold
#from sklearn.model_selection import train_test_split
#from sklearn.preprocessing import StandardScaler

#from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
Using TensorFlow backend.
In [38]:
import keras
keras.__version__
Out[38]:
'2.1.2'
In [39]:
# create regression model
def reg_model():
    
    reg = Sequential()
    
    reg.add(Dense(10, input_dim=len(x_cols), activation='relu'))
    reg.add(Dense(16, activation='relu'))
    reg.add(Dense(1))

    # compile model
    reg.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
    reg.summary()
    
    return reg
In [40]:
X_train.shape[1]
Out[40]:
45
In [41]:
model = KerasRegressor(build_fn=reg_model, epochs=200, batch_size=16, verbose=0)
In [42]:
model.fit(X_train, y_train)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 10)                460       
_________________________________________________________________
dense_2 (Dense)              (None, 16)                176       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 17        
=================================================================
Total params: 653
Trainable params: 653
Non-trainable params: 0
_________________________________________________________________
Out[42]:
<keras.callbacks.History at 0x121911978>
In [43]:
model.score(X_test, y_test)
Out[43]:
-122321.06786886415
In [44]:
#estimator.evaluate(X_test, y_test)
In [45]:
y_pred = model.predict(X_test)
In [46]:
# show its root mean square error
mse = mean_squared_error(y_test, y_pred)
print("KERAS REG RMSE : %.2f" % (mse ** 0.5))
KERAS REG RMSE : 349.74
In [47]:
pd.DataFrame({"pred":y_pred, "act":y_test})[:100].reset_index(drop=True).plot(figsize=(15,4))
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x121cfc390>

予測

データ読込

In [48]:
df_temp_okym = read_temp("data_okayama-2017.csv")
df_temp_okym.rename(columns = {'TEMP':'TEMP_okym'}, inplace=True)
df_temp_okym.shape
Out[48]:
(5880, 4)
In [49]:
df_temp_hrsm = read_temp("data_hiroshima-2017.csv")
df_temp_hrsm.rename(columns = {'TEMP':'TEMP_hrsm'}, inplace=True)
df_temp_hrsm.shape
Out[49]:
(5880, 4)
In [50]:
df_kw = pd.read_csv("juyo-2017.csv",encoding="Shift_JIS",skiprows=1)
df_kw.shape
Out[50]:
(6072, 3)

データ加工

In [51]:
# 電力データ
df_kw.columns = ["DATE","TIME","万kW"]
df_kw["MW"] = df_kw["万kW"] * 10
df_kw["DATETIME"] = df_kw.index.map(lambda _: pd.to_datetime(df_kw.DATE[_] + " " + df_kw.TIME[_]))

# データ結合
df = df_kw.copy()
df = df.merge(df_temp_okym,how="inner", on="DATETIME")
df = df.merge(df_temp_hrsm,how="inner", on="DATETIME")


# 追加データ
df["MONTH"] = df.DATETIME.map(lambda _: _.month)
df["WEEK"] = df.DATETIME.map(lambda _: _.weekday())
df["HOUR"] = df.DATETIME.map(lambda _: _.hour)

# 欠損データ削除
df = df.dropna()
In [52]:
df.index = df.DATETIME
In [53]:
df.head()
Out[53]:
DATE TIME 万kW MW DATETIME TEMP_okym 品質情報_x 均質番号_x TEMP_hrsm 品質情報_y 均質番号_y MONTH WEEK HOUR
DATETIME
2017-04-01 01:00:00 2017/4/1 1:00 660 6600 2017-04-01 01:00:00 4.7 8 1 6.3 8 1 4 5 1
2017-04-01 02:00:00 2017/4/1 2:00 685 6850 2017-04-01 02:00:00 4.8 8 1 6.4 8 1 4 5 2
2017-04-01 03:00:00 2017/4/1 3:00 706 7060 2017-04-01 03:00:00 4.9 8 1 6.2 8 1 4 5 3
2017-04-01 04:00:00 2017/4/1 4:00 696 6960 2017-04-01 04:00:00 5.1 8 1 6.0 8 1 4 5 4
2017-04-01 05:00:00 2017/4/1 5:00 667 6670 2017-04-01 05:00:00 5.0 8 1 5.8 8 1 4 5 5
In [54]:
cols = ["MONTH","WEEK","HOUR"]

for col in cols:
    df = df.join(pd.get_dummies(df[col], prefix=col))

#df.info()
In [55]:
for col in x_cols[2:]:
    if not col in df.columns:
        print(col, 'none')
        df[col] = 0
MONTH_1 none
MONTH_2 none
MONTH_3 none

データ作成

In [56]:
df_pred = df[df.MONTH==7]

# 予測用データ
X = df_pred[x_cols].as_matrix().astype('float64')
y = df_pred["MW"].as_matrix().astype('int').flatten()

# 正規化
X = scaler.transform(X)

結果取得 with Keras

In [57]:
y_pred = model.predict(X)
In [58]:
# show its root mean square error
mse = mean_squared_error(y, y_pred)
print("KERAS REG RMSE : %.2f" % (mse ** 0.5))
KERAS REG RMSE : 370.55
In [59]:
pd.DataFrame({"pred":y_pred, "act":y}, index=df_pred.index).plot(figsize=(15,4),ylim=(4000,12000))
Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x121e57320>
In [60]:
gc.collect()
Out[60]:
325
In [ ]:
 
In [ ]: