Final synposis. Neural Networks suck. They just plain suck. 5% error rate on the best and most convoluted problem is sti

Ranter

AvatarOfKaine

3758

Comments

3

AlmondSauce

15454

4y

They're just *difficult* to get working effectively, and you tend to need a reasonably deep understanding to get them to work well (especially in scenarios that aren't a traditionally great "fit".) I haven't played with them much in recent times (last time I touched them was over a decade ago when the landscape was very different) but others like @Nomad certainly have.
1

AvatarOfKaine

3758

4y

@AlmondSauce a consideration I would have is that if there is not an activation function per layer you should have a lot broader range of values possible to combine

We already had this discussion I think
Sorry dirty treasonous garbage picking me up and kidnapping me a decade ago kind of screwed up my memory a tad
3

RememberMe

13617

4y

They're difficult to get right. You don't just throw things together and get something working and expect it to be good. There's a reason people dedicate their whole careers to this.
2

AvatarOfKaine

3758

4y

@RememberMe well my simplistic example was a 3d function

z = y^2+x^2+2x^3+2y^3

z was the output
x and y the input

so... with hidden layers i ended up with

2x12x1024x12x1

the initial training flies.
n=400 e=20 optim=adamw loss=mse
lr = 0.0001 initially i drop it to lr=0.0000001 by the time i'm done.

I randomly generate the training data in each session

domains of x and y are 0 < x < 100 0<y<100
i divide the values of the function expectation and the parameters by their max values.

eventually the values start oscillating.
steady convergence for 1000s of training valuesm, if at a low rate.

i adjust the learning rate down more and more but eventually that stops working.
1

AvatarOfKaine

3758

4y

@RememberMe with more layers the error remains about the same.

i have passed the final values through tanh and sigmoid and the end fitting is about the same i can only get so close.

and here i am having been bumped into now to try and piss me off twice by the same little boy toucher.

so obviously something is being missed.
i almost feel like i just need to write my own.
like the backprop algorithm for example. i tried to modify the way data is trained, i spread the values out across partitions for the inputs that are evenly spaced in their beginnings and ends etc that increased the training speed because its an odd shape.

i removed the relus since they stopped training
1

Wisecrack

9353

4y

@AvatarOfKaine you're using tanh and sigmoid but have you tried relu?
1

AvatarOfKaine

3758

4y

@Wisecrack no no i alternated between the two to fit things.

and no, not as the output's activation function no.
1

Wisecrack

9353

4y

@AvatarOfKaine never tried it, but what about arbitrary finely quantized functions or curves? Hell you could set the points on the function *manually*, and then maybe interpolate as needed?
After all, why only stick with well defined functions like tanh and sigmoid?
1

AvatarOfKaine

3758

4y

@atheist refining my error ?
Do mean the backpropagation optimizer ?

The biggest network I tried had 4 hidden layers of 1024

It didn’t make any difference it just slowed everything down
1

AvatarOfKaine

3758

4y

@atheist but if you mean how am I trying to ensure a more even training, the data I randomly generate now I arrange into evenly spaced partitions, so that each partition has an even number of randomly generated values in it between the start and end ranges of the partition so at least when i run a training batch, that I shuffle at the beginning of each epoch, its pull the values of the weights likely in a way that i imagine would converge better.
1

AvatarOfKaine

3758

4y

@atheist now last time when we had this discussion you answered me and I wasn't apparently certain what you meant by fixing error.

but you did answer :P
but it was likely some time ago.
1

AvatarOfKaine

3758

4y

@atheist Here. some results.

https://pastebin.com/eFxMtEGu
0

Oktokolo

11238

4y

Don't try to get neural networks to do higher math.
Humans have way more neurons, than you can simulate and still struggle at that after years of training...
1

AvatarOfKaine

3758

4y

@Oktokolo I’m not really trying to make it SOLVE an equation
And the neurons really don’t work this way in our brains

I’m testing the claim that neural nets can approximate any function.
1

AvatarOfKaine

3758

4y

@atheist amusingly that first part about two seperate nets i was think of actually.

still kind of invalidates the claim though :P
1

AvatarOfKaine

3758

4y

@atheist again with the partial differential was thinking the same thing yesterday.

its the reason i can't understand how its not figuring out better since the two inputs do not serve as coefficients to one another so the relationship is a sum that is not very complex.

as far as suggesting how layers are split up that is what confuses me when people say that.

like an individual describing image recognition suggested that layers might take on purposes, that seems kind of automatic though doesn';t it ?

so far as the 12 connections i initially started out at 1024 with no avail initially and saw only slightly better results.

in the end shouldn't matter though right ?

y_i = sum( w_x*A_(x-1))+b_x

with every single connection being multipled and added at each layer.

and there being quite a few to choose from.
1

AvatarOfKaine

3758

4y

@atheist something confuses me slightly.
i had thought that alot of the power came from their being a seperate weight value per neuron's connection to every other neuron the previous and next layer.
1

AvatarOfKaine

3758

4y

@atheist I tried something kind of what you're talking about too, but i feel it skewed the results, alternating that is on one variable or another but that was in a different problem.

what is still confusing me now that i print more valid statistics than just a single epochs loss value is by what criteria to decide to adjust the leanring rate, the errror adjusts down all over over and over but the universal error occasionally jumps high and then drops back down low in each epoch.

however like i said sometimes the behavior of training is strange. certain values jump up and down which yeah I can visualize as a kind of shared modifier to the overall equation which passed through effects a large chunk of the end value being adjusted along with several others..
1

AvatarOfKaine

3758

4y

@atheist shouldn't introducing equally spaced data that differs each time produce better results ?
I feel like it should.
I feel like overfitting would occur if I used a small dataset that matched those specifications but never changed it.
3

Oktokolo

11238

4y

@AvatarOfKaine
Of they neural nets can approximate any function given enough neurons, layers and interconnects.
It is just absurdly hard to make them to do that.

Like @atheist wrote, having separate clusters of neurons to solve parts of the problem (like it is done for feature detection in convolutional neuronal networks) is probably the way to do it.

You definitely can split the problem into single operations on two inputs and train them seperately. Then you can combine the parts to get the full solution.

In general, keep the models as tiny as possible. More inputs, neurons, layers and interconnects means longer training and more overfitting.
More of these also means more potential for the model to exploit unexpected relations between inputs. But you don't want that for known functions.

Happy research.
1

AvatarOfKaine

3758

4y

@atheist by clusters of neurons you mean in coding terms several models ? That’s the part I don’t get by looking at the available libraries there seems no way of fine grained control nor does it seem like it should work that way given the definition mathematically about how the concept works
0

AvatarOfKaine

3758

4y

@atheist well just the weight adjustment part and the concept of an activation layer
So oh
Partially connected interesting
1

AvatarOfKaine

3758

4y

@atheist I use pytorch lol

Add Comment

random