Thesis/Reports/Thesis/sections/results/models/non-linear.tex

\subsubsection{Non-Linear Model}
Adding nonlinearity to the model can be done by adding some non-linear activations between linear layers. This improves the model's ability to learn more complex patterns in the data. The model is trained the same way as the linear model for quantile regression using the pinball loss. Because a non-linear model is more complex, it is more prone to overfitting the training data. Because of this, dropout layers are added to the model to prevent overfitting.

The architecture of the non-linear model is illustrated in Table \ref{tab:non_linear_model_architecture}. The autoregressive model begins with an input layer that converts the quarter of the day into an embedding. This layer concatenates the other input features with the quarter embedding. These combined features are then processed through a sequence of layers:

\begin{itemize}
    \item Linear layer: Transforms input features to higher-dimensional space defined by hidden size.
    \item ReLU Activation Function: Introduces non-linearity to the model to learn complex patterns. This also helps with the vanishing gradient problems with deep neural networks.
    \item Dropout Layer: Regularizes the model to prevent overfitting. During training, random neurons are set to zero.
\end{itemize}

This sequence of layers is repeated N times to increase the depth of the model and enhance its ability to learn complex patterns. The final layer of the network is a linear layer that outputs the quantiles for the NRV prediction. For an autoregressive model, this is just the quantiles for a single quarter, whereas for a non-autoregressive model, the quantiles for every quarter of the day are outputted. The number of outputs is then the number of quarters in a day multiplied by the number of quantiles used.

\begin{table}[H]
\centering
\begin{tabularx}{\textwidth}{Xr} % Set the table width to the text width
\toprule
\textbf{Layer (Type)} & \textbf{Output Shape} \\ \midrule
\multicolumn{2}{c}{\textit{Only for autoregressive model}} \\
Time Embedding (Embedding) & [B, Input Features Size + Time Embedding Size] \\
\midrule
% Repeated Block
\multicolumn{2}{c}{\textit{Repeated Block (N times)}} \\
Linear (Linear) & [B, Hidden Size] \\
ReLU (Activation) & [B, Hidden Size] \\
Dropout (Regularization) & [B, Hidden Size] \\
% End of Repeated Block
\midrule
Linear (Linear) & [B, Number of quantiles] \\
\bottomrule
\end{tabularx}
\caption{Non-linear Quantile Regression Model Architecture}
\label{tab:non_linear_model_architecture}
\end{table}

While this non-linear model is still quite simple, it offers the flexibility in tuning a limited set of hyperparameters. The hidden size of the linear layers and the number of layers can be experimented with, which can significantly influence the model's performance. The experiments are executed with the same quantiles as the linear model. Multiple experiments are executed with different hyperparameters and input features. All results are shown in the Table \ref{tab:non_linear_model_results}.

\begin{table}[H]
    \centering
    \begin{adjustbox}{width=\textwidth,center}
    \begin{tabular}{@{}cccccccccc@{}}
    \toprule
    Features & Layers & Hidden Size & \multicolumn{2}{c}{MSE} & \multicolumn{2}{c}{MAE} & \multicolumn{2}{c}{CRPS} \\
    \cmidrule(lr){4-5} \cmidrule(lr){6-7} \cmidrule(lr){8-9}
    & & & AR & NAR & AR & NAR & AR & NAR \\
    \midrule
    NRV & & & & & & & & \\
    & 2 & 256 & 38117.43 & 41574.38 & 147.55 & 153.83 & 86.42 & 75.61 \\
    & 4 & 256 & 37817.78 & 40200.92 & 146.90 & 152.00 & 85.63 & 74.37 \\
    & 8 & 256 & 36346.57 & 38746.81 & 144.80 & 148.82 & 84.51 & 74.55 \\
    & 16 & 256 & 38624.83 & 39328.47 & 148.61 & 149.19 & 87.05 & 75.38 \\
    \midrule
    NRV + Load + PV\\ + Wind & & & & & & & & \\
    & 2 & 256 & 42983.21 & 42950.17 & 156.65 & 156.88 & 92.15 & 76.21 \\
    \midrule
    NRV + Load + PV\\ + Wind + Net Position\\ + QE (dim 5) & & & & & & & & \\
    & 2 & 256 & 37785.49 & 42828.61 & 146.99 & 157.03 & 85.22 & 76.36 \\
    & 4 & 256 & 34232.57 & 42588.16 & 139.78 & 157.20 & 80.14 & 73.75 \\
    & 8 & 256 & \textbf{32447.41} & 40541.92 & \textbf{137.24} & 151.60 & \textbf{79.22} & 75.52 \\
    & 2 & 512 & 44281.20 & 44018.79 & 158.63 & 159.06 & 91.82 & 77.99 \\
    & 4 & 512 & 34839.79 & 41999.79 & 140.67 & 154.86 & 80.21 & 75.70 \\
    & 8 & 512 & 34925.46 & 39774.38 & 141.11 & 150.62 & 81.11 & 74.67 \\

    \bottomrule
    \end{tabular}
    \end{adjustbox}
    \caption{Non-linear quantile regression model results. All the models used a dropout of 0.2 .}
    \label{tab:non_linear_model_results}
\end{table}

The same behavior as the linear model is observed when looking at the metric differences between the autoregressive and non-autoregressive models. The autoregressive model performs better in terms of MSE and MAE, while the non-autoregressive model performs better in terms of CRPS. The results also give insight into the importance of the input features and hyperparameters. The addition of more input features improves the performance of the autoregressive model. The non-autoregressive model, on the other hand, does not benefit from this. The metrics are worse when more input features are added. This was also seen in the linear model. A reason for this behavior could be that the non-autoregressive model is not able to learn the complex patterns in the large set of input features. The non-autoregressive is provided with all the values for each quarter for which the quantiles need to be predicted. This increases the input size with 96 values each time a new forecast feature is added. Capturing patterns in this large input space can be a challenging task.

% TODO: talk about hyperparameters?

\begin{figure}[H]
    \centering
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_non_linear_model_samples/AQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_864.png}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_non_linear_model_samples/NAQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_864.png}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_non_linear_model_samples/AQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_4320.png}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_non_linear_model_samples/NAQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_4320.png}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_non_linear_model_samples/AQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_6336.png}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_non_linear_model_samples/NAQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_6336.png}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/aqr_non_linear_model_samples/AQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_7008.png}
        \caption{Autoregressive linear model}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/naqr_non_linear_model_samples/NAQR_NL_NRV_Load_Wind_PV_NP_QE-Sample_7008.png}
        \caption{Non-autoregressive linear model}
    \end{subfigure}

    \caption{Comparison of the autoregressive and non-autoregressive non-linear model examples.}
    \label{fig:non_linear_model_examples}
\end{figure}

The examples from the test set for the non-linear model are shown in Figure \ref{fig:non_linear_model_examples}. A big difference can be observed between the examples of the autoregressive and non-autoregressive models. The autoregressive model examples follow the actual NRV trend more closely than the non-autoregressive model. The mean of the samples generated by the non-autoregressive model is around zero for every quarter of the day. No clear trend can be observed in the samples. This is a clear indication that the non-autoregressive model is not able to learn the patterns in the data despite having a lower CRPS.

\begin{figure}[ht]
    \centering
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/AQR_NL_Quantile_Performance_Training.jpeg}
        \caption{AR - Train}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/AQR_NL_Quantile_Performance_Test.jpeg}
        \caption{AR - Test}
    \end{subfigure}

    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/NAQR_NL_Quantile_Performance_Training.jpeg}
        \caption{NAR - Train}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.49\textwidth}
        \includegraphics[width=\textwidth]{images/quantile_regression/quantile_performance/NAQR_NL_Quantile_Performance_Test.jpeg}
        \caption{NAR - Test}
    \end{subfigure}
    \caption{Over/underestimation of the quantiles for the autoregressive and non-autoregressive non-linear models. Both the quantile performance for the training and test set are shown. The plots are generated using the input features NRV, Load, Wind, PV, Net Position, and the quarter embedding (only for the autoregressive model).}
    \label{fig:non-linear_model_quantile_over_underestimation}
\end{figure}

% TODO: correct use of overestimation
The plots in Figure \ref{fig:non-linear_model_quantile_over_underestimation} show the over/underestimation of the quantiles outputted by the non-linear models. Looking at the plots for the autoregressive models, the observation can be made that the fraction of the real NRV values under the quantiles is too big most of the time in comparison with the ideal fraction. This means the model is estimating the quantiles too high which results in a bigger fraction of NRV values below this value. The model overestimates the quantiles. The non-autoregressive model also suffers from this problem for the training set. For the test set, the lower quantiles are estimated too high and the higher quantiles are estimated too low. The quantiles in the middle are estimated quite accurately.