") training_loop(ds_train)
test_batch % iter_next() encoded % round(5)) }
On to what we'll use as a baseline for comparison.
#### Vanilla LSTM
Here is the vanilla LSTM, stacking two layers, each, again, of size 32. Dropout and recurrent dropout were chosen individually
per dataset, as was the learning rate.
### Data preparation
For all experiments, data were prepared in the same way.
In every case, we used the first 10000 measurements available in the respective `.pkl` files [provided by Gilpin in his GitHub
repository](https://github.com/williamgilpin/fnn/tree/master/datasets). To save on file size and not depend on an external
data source, we extracted those first 10000 entries to `.csv` files downloadable directly from this blog's repo:
Should you want to access the complete time series (of considerably greater lengths), just download them from Gilpin's repo
and load them using `reticulate`:
Here is the data preparation code for the first dataset, `geyser` - all other datasets were treated the same way.
Now we're ready to look at how forecasting goes on our four datasets.
## Experiments
### Geyser dataset
People working with time series may have heard of [Old Faithful](https://en.wikipedia.org/wiki/Old_Faithful), a geyser in
Wyoming, US that has continually been erupting every 44 minutes to two hours since the year 2004. For the subset of data
Gilpin extracted[^3],
[^3]: see dataset descriptions in the [repository\'s README](https://github.com/williamgilpin/fnn)
> `geyser_train_test.pkl` corresponds to detrended temperature readings from the main runoff pool of the Old Faithful geyser
> in Yellowstone National Park, downloaded from the [GeyserTimes database](https://geysertimes.org/). Temperature measurements
> start on April 13, 2015 and occur in one-minute increments.
Like we said above, `geyser.csv` is a subset of these measurements, comprising the first 10000 data points. To choose an
adequate timestep for the LSTMs, we inspect the series at various resolutions:
<div class="figure">
<img src="images/geyser_ts.png" alt="Geyer dataset. Top: First 1000 observations. Bottom: Zooming in on the first 200." width="600" />
<p class="caption">(\#fig:unnamed-chunk-5)Geyer dataset. Top: First 1000 observations. Bottom: Zooming in on the first 200.</p>
</div>
It seems like the behavior is periodic with a period of about 40-50; a timestep of 60 thus seemed like a good try.
Having trained both FNN-LSTM and the vanilla LSTM for 200 epochs, we first inspect the variances of the latent variables on
the test set. The value of `fnn_multiplier` corresponding to this run was `0.7`.
```{}
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
0.258 0.0262 0.0000627 0.000000600 0.000533 0.000362 0.000238 0.000121 0.000518 0.000365
There is a drop in importance between the first two variables and the rest; however, unlike in the Lorenz system, V1 and V2 variances also differ by an order of magnitude.
Now, it’s interesting to compare prediction errors ...