4

Final synposis.
Neural Networks suck.
They just plain suck.
5% error rate on the best and most convoluted problem is still way too high
Its amazing you can make something see an image its been trained on, that's awesome....

But if I can't get a simple function approximator down to lower than 0.07 on a scale of 0 to 1 difference and the error value on a fixed point system is still pretty goddamn high, even if most of the data sort of fits when spitting back inference values, it is unusable.

Even the trained turret aimer I made successfully would sometimes skip around full circle and pass the target before lining up after another full circle.

There has to be something LIKE IT that actually works in premise.

I think my behavioral simulation might be a cool idea, primitive environment, primitive being, reward learning. however with an attached DATABASE.

Comments
  • 3
    They're just *difficult* to get working effectively, and you tend to need a reasonably deep understanding to get them to work well (especially in scenarios that aren't a traditionally great "fit".) I haven't played with them much in recent times (last time I touched them was over a decade ago when the landscape was very different) but others like @Nomad certainly have.
  • 1
    @AlmondSauce a consideration I would have is that if there is not an activation function per layer you should have a lot broader range of values possible to combine

    We already had this discussion I think
    Sorry dirty treasonous garbage picking me up and kidnapping me a decade ago kind of screwed up my memory a tad
  • 3
    They're difficult to get right. You don't just throw things together and get something working and expect it to be good. There's a reason people dedicate their whole careers to this.
  • 2
    @RememberMe well my simplistic example was a 3d function

    z = y^2+x^2+2x^3+2y^3

    z was the output
    x and y the input

    so... with hidden layers i ended up with

    2x12x1024x12x1

    the initial training flies.
    n=400 e=20 optim=adamw loss=mse
    lr = 0.0001 initially i drop it to lr=0.0000001 by the time i'm done.

    I randomly generate the training data in each session

    domains of x and y are 0 < x < 100 0<y<100
    i divide the values of the function expectation and the parameters by their max values.

    eventually the values start oscillating.
    steady convergence for 1000s of training valuesm, if at a low rate.

    i adjust the learning rate down more and more but eventually that stops working.
  • 1
    @RememberMe with more layers the error remains about the same.

    i have passed the final values through tanh and sigmoid and the end fitting is about the same i can only get so close.

    and here i am having been bumped into now to try and piss me off twice by the same little boy toucher.

    so obviously something is being missed.
    i almost feel like i just need to write my own.
    like the backprop algorithm for example. i tried to modify the way data is trained, i spread the values out across partitions for the inputs that are evenly spaced in their beginnings and ends etc that increased the training speed because its an odd shape.

    i removed the relus since they stopped training
  • 1
    @AvatarOfKaine you're using tanh and sigmoid but have you tried relu?
  • 1
    @Wisecrack no no i alternated between the two to fit things.

    and no, not as the output's activation function no.
  • 1
    @AvatarOfKaine never tried it, but what about arbitrary finely quantized functions or curves? Hell you could set the points on the function *manually*, and then maybe interpolate as needed?
    After all, why only stick with well defined functions like tanh and sigmoid?
  • 1
    @atheist refining my error ?
    Do mean the backpropagation optimizer ?

    The biggest network I tried had 4 hidden layers of 1024

    It didn’t make any difference it just slowed everything down
  • 1
    @atheist but if you mean how am I trying to ensure a more even training, the data I randomly generate now I arrange into evenly spaced partitions, so that each partition has an even number of randomly generated values in it between the start and end ranges of the partition so at least when i run a training batch, that I shuffle at the beginning of each epoch, its pull the values of the weights likely in a way that i imagine would converge better.
  • 1
    @atheist now last time when we had this discussion you answered me and I wasn't apparently certain what you meant by fixing error.

    but you did answer :P
    but it was likely some time ago.
  • 1
  • 0
    Don't try to get neural networks to do higher math.
    Humans have way more neurons, than you can simulate and still struggle at that after years of training...
  • 1
    @Oktokolo I’m not really trying to make it SOLVE an equation
    And the neurons really don’t work this way in our brains

    I’m testing the claim that neural nets can approximate any function.
  • 1
    @atheist amusingly that first part about two seperate nets i was think of actually.

    still kind of invalidates the claim though :P
  • 1
    @atheist again with the partial differential was thinking the same thing yesterday.

    its the reason i can't understand how its not figuring out better since the two inputs do not serve as coefficients to one another so the relationship is a sum that is not very complex.

    as far as suggesting how layers are split up that is what confuses me when people say that.

    like an individual describing image recognition suggested that layers might take on purposes, that seems kind of automatic though doesn';t it ?

    so far as the 12 connections i initially started out at 1024 with no avail initially and saw only slightly better results.

    in the end shouldn't matter though right ?

    y_i = sum( w_x*A_(x-1))+b_x

    with every single connection being multipled and added at each layer.

    and there being quite a few to choose from.
  • 1
    @atheist something confuses me slightly.
    i had thought that alot of the power came from their being a seperate weight value per neuron's connection to every other neuron the previous and next layer.
  • 1
    @atheist I tried something kind of what you're talking about too, but i feel it skewed the results, alternating that is on one variable or another but that was in a different problem.

    what is still confusing me now that i print more valid statistics than just a single epochs loss value is by what criteria to decide to adjust the leanring rate, the errror adjusts down all over over and over but the universal error occasionally jumps high and then drops back down low in each epoch.

    however like i said sometimes the behavior of training is strange. certain values jump up and down which yeah I can visualize as a kind of shared modifier to the overall equation which passed through effects a large chunk of the end value being adjusted along with several others..
  • 1
    @atheist shouldn't introducing equally spaced data that differs each time produce better results ?
    I feel like it should.
    I feel like overfitting would occur if I used a small dataset that matched those specifications but never changed it.
  • 3
    @AvatarOfKaine
    Of they neural nets can approximate any function given enough neurons, layers and interconnects.
    It is just absurdly hard to make them to do that.

    Like @atheist wrote, having separate clusters of neurons to solve parts of the problem (like it is done for feature detection in convolutional neuronal networks) is probably the way to do it.

    You definitely can split the problem into single operations on two inputs and train them seperately. Then you can combine the parts to get the full solution.

    In general, keep the models as tiny as possible. More inputs, neurons, layers and interconnects means longer training and more overfitting.
    More of these also means more potential for the model to exploit unexpected relations between inputs. But you don't want that for known functions.

    Happy research.
  • 1
    @atheist by clusters of neurons you mean in coding terms several models ? That’s the part I don’t get by looking at the available libraries there seems no way of fine grained control nor does it seem like it should work that way given the definition mathematically about how the concept works
  • 0
    @atheist well just the weight adjustment part and the concept of an activation layer
    So oh
    Partially connected interesting
  • 1
    @atheist I use pytorch lol
Add Comment