Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
They're just *difficult* to get working effectively, and you tend to need a reasonably deep understanding to get them to work well (especially in scenarios that aren't a traditionally great "fit".) I haven't played with them much in recent times (last time I touched them was over a decade ago when the landscape was very different) but others like @Nomad certainly have.
-
@AlmondSauce a consideration I would have is that if there is not an activation function per layer you should have a lot broader range of values possible to combine
We already had this discussion I think
Sorry dirty treasonous garbage picking me up and kidnapping me a decade ago kind of screwed up my memory a tad -
They're difficult to get right. You don't just throw things together and get something working and expect it to be good. There's a reason people dedicate their whole careers to this.
-
@RememberMe well my simplistic example was a 3d function
z = y^2+x^2+2x^3+2y^3
z was the output
x and y the input
so... with hidden layers i ended up with
2x12x1024x12x1
the initial training flies.
n=400 e=20 optim=adamw loss=mse
lr = 0.0001 initially i drop it to lr=0.0000001 by the time i'm done.
I randomly generate the training data in each session
domains of x and y are 0 < x < 100 0<y<100
i divide the values of the function expectation and the parameters by their max values.
eventually the values start oscillating.
steady convergence for 1000s of training valuesm, if at a low rate.
i adjust the learning rate down more and more but eventually that stops working. -
@RememberMe with more layers the error remains about the same.
i have passed the final values through tanh and sigmoid and the end fitting is about the same i can only get so close.
and here i am having been bumped into now to try and piss me off twice by the same little boy toucher.
so obviously something is being missed.
i almost feel like i just need to write my own.
like the backprop algorithm for example. i tried to modify the way data is trained, i spread the values out across partitions for the inputs that are evenly spaced in their beginnings and ends etc that increased the training speed because its an odd shape.
i removed the relus since they stopped training -
@Wisecrack no no i alternated between the two to fit things.
and no, not as the output's activation function no. -
@AvatarOfKaine never tried it, but what about arbitrary finely quantized functions or curves? Hell you could set the points on the function *manually*, and then maybe interpolate as needed?
After all, why only stick with well defined functions like tanh and sigmoid? -
@atheist refining my error ?
Do mean the backpropagation optimizer ?
The biggest network I tried had 4 hidden layers of 1024
It didn’t make any difference it just slowed everything down -
@atheist but if you mean how am I trying to ensure a more even training, the data I randomly generate now I arrange into evenly spaced partitions, so that each partition has an even number of randomly generated values in it between the start and end ranges of the partition so at least when i run a training batch, that I shuffle at the beginning of each epoch, its pull the values of the weights likely in a way that i imagine would converge better.
-
@atheist now last time when we had this discussion you answered me and I wasn't apparently certain what you meant by fixing error.
but you did answer :P
but it was likely some time ago. -
Don't try to get neural networks to do higher math.
Humans have way more neurons, than you can simulate and still struggle at that after years of training... -
@Oktokolo I’m not really trying to make it SOLVE an equation
And the neurons really don’t work this way in our brains
I’m testing the claim that neural nets can approximate any function. -
atheist92983y@AvatarOfKaine I think NNs "can" approximate any function. Getting them to is hard. If I was gonna hand craft something using ReLU based on your equation, you could do it with 2 separate nets for each input that are combined at the end, because the equation is separable. If you think of it in a similar way to Taylor series, lots of small bits that increase the accuracy over longer and longer sections.
Doing that manually with a complete understanding of the problem space is possible, going from random noise to that is much harder.
My guesses: making the first layer much wider might help. Otherwise you're vaguely quantizing your input to 12 sections. If you're looking for high accuracy over a 100x100 space, 100,000 should definitely be able to do it (manually, at least, but much slower). And maybe have a couple of layers only connected to each input, then join them. If you wanted to add something like (x^2)(y^3) then if the first couple of layers are split in 3, one for each input, one for both. In theory NNs can learn that relationship. In practice, getting them to is hard.
They won't be able to extrapolate very well, neither can Taylor series. This is where the whole "big data" comes in, if you've seen enough real world examples, almost everything is interpolation.
Not AGI of course, but I think we as a society are missing something academic there.
I'm just eyeballing based on the tiny amount of information I can remember.
I think the "deep" net stuff is of greater benefit when output is based on a combination of features, eg in images, no one pixel tells you a lot about the answer, whereas in your case, each input can be used to calculate part of the answer.
And partial differentials, Your net could alternate training on changes in x then y coz the partial differential of each variable doesn't include the other. -
@atheist amusingly that first part about two seperate nets i was think of actually.
still kind of invalidates the claim though :P -
@atheist again with the partial differential was thinking the same thing yesterday.
its the reason i can't understand how its not figuring out better since the two inputs do not serve as coefficients to one another so the relationship is a sum that is not very complex.
as far as suggesting how layers are split up that is what confuses me when people say that.
like an individual describing image recognition suggested that layers might take on purposes, that seems kind of automatic though doesn';t it ?
so far as the 12 connections i initially started out at 1024 with no avail initially and saw only slightly better results.
in the end shouldn't matter though right ?
y_i = sum( w_x*A_(x-1))+b_x
with every single connection being multipled and added at each layer.
and there being quite a few to choose from. -
@atheist something confuses me slightly.
i had thought that alot of the power came from their being a seperate weight value per neuron's connection to every other neuron the previous and next layer. -
@atheist I tried something kind of what you're talking about too, but i feel it skewed the results, alternating that is on one variable or another but that was in a different problem.
what is still confusing me now that i print more valid statistics than just a single epochs loss value is by what criteria to decide to adjust the leanring rate, the errror adjusts down all over over and over but the universal error occasionally jumps high and then drops back down low in each epoch.
however like i said sometimes the behavior of training is strange. certain values jump up and down which yeah I can visualize as a kind of shared modifier to the overall equation which passed through effects a large chunk of the end value being adjusted along with several others.. -
@atheist shouldn't introducing equally spaced data that differs each time produce better results ?
I feel like it should.
I feel like overfitting would occur if I used a small dataset that matched those specifications but never changed it. -
atheist92983y@AvatarOfKaine I think with regard to separating layers, in theory NNs can learn that relation (it's a zero weight on one of the inputs), but it's a harder to learn relation.
-
atheist92983yRe bigger first layers, if your function is 2 separable functions the net may either learn the whole space, or learn the individual functions. The whole space is doable, but would require a bigger network/more data/more time etc.
-
atheist92983yIf you look at each layer as "extracting features from the previous layer", the answer is a function of the 2 input variables, but it's not much of a combination of the two, so it doesn't benefit so much from the layers because each layer combines information from the previous and outputs the combination to the next. So the first layer, conceptually at least, extracts information from the inputs, then you lose the discrete separation of the information you need. The information can still be passed forward, just harder.
It's possible for a NN to extract the information, you end up with something like lots of small planes (surfaces, not aero) being added together to give the final result, but the problem space can be simplified, and the simpler problem is easier to learn.
I'm gonna have to brush up on this stuff, aren't I? -
@AvatarOfKaine
Of they neural nets can approximate any function given enough neurons, layers and interconnects.
It is just absurdly hard to make them to do that.
Like @atheist wrote, having separate clusters of neurons to solve parts of the problem (like it is done for feature detection in convolutional neuronal networks) is probably the way to do it.
You definitely can split the problem into single operations on two inputs and train them seperately. Then you can combine the parts to get the full solution.
In general, keep the models as tiny as possible. More inputs, neurons, layers and interconnects means longer training and more overfitting.
More of these also means more potential for the model to exploit unexpected relations between inputs. But you don't want that for known functions.
Happy research. -
@atheist by clusters of neurons you mean in coding terms several models ? That’s the part I don’t get by looking at the available libraries there seems no way of fine grained control nor does it seem like it should work that way given the definition mathematically about how the concept works
-
atheist92983y@AvatarOfKaine some googling later, as I understand it are you using fully connected layers? Keras seems to support "partially" connected layers, eg https://stackoverflow.com/questions...
With regards to the mathematical definition, not sure what you mean. -
@atheist well just the weight adjustment part and the concept of an activation layer
So oh
Partially connected interesting
Final synposis.
Neural Networks suck.
They just plain suck.
5% error rate on the best and most convoluted problem is still way too high
Its amazing you can make something see an image its been trained on, that's awesome....
But if I can't get a simple function approximator down to lower than 0.07 on a scale of 0 to 1 difference and the error value on a fixed point system is still pretty goddamn high, even if most of the data sort of fits when spitting back inference values, it is unusable.
Even the trained turret aimer I made successfully would sometimes skip around full circle and pass the target before lining up after another full circle.
There has to be something LIKE IT that actually works in premise.
I think my behavioral simulation might be a cool idea, primitive environment, primitive being, reward learning. however with an attached DATABASE.
random