4

I finally got the lstm to a training and validation loss of < 0.05 for predicting the digits of a semiprime's factors.

I used selu activation with lecun normal initialization on a dense decoder, and compiled the model with Adam as the optimizer using mean squared error.

Selu is self-normalizing, meaning it tends to mean 0 and preserves a standard deviation of one, so it eliminates the exploding/vanishing gradient problem. And I can get away with this specifically because selu *only* works on dense layers.

I chose Adam, even though this isn't a spare problem, because Adam excels on noisy problems and non-stationary objectives (definitely this), and because adam typically doesn't require a lot of hyperparameter tuning its ideal here, especially considering because I don't know what the hyperparameters should be to begin with.

I did work out some general guidelines on training quantity vs validation, etc.
The initial set wasn't huge or anything, roughly 110k pairs for training.

It converged pretty quick all things considered, and to the low loss like I mentioned, but even then the system always outputs the same result, regardless of the input, so obviously I'm doing something incorrectly.

The effectiveness of this approach for training and validation makes me question if I haven't got something wildly wrong. Still exploring though and figuring out how to get my answers back out. I'm hoping I just fucked up the output, and not the input as well.

Comments
Add Comment