2

So here is a good question.
Supposing I train a neural network to handwriting.
And that handwriting is mostly contained in a certain small area in the center of a 28x28 pixel block.

wouldn't a shift left or right fuck up its ability to predict accurately ? Pretty sure it would !

You'd think you'd have to prune down images border down to as close as possible for it to even work in more natural settings where someone might draw a slightly longer character or wider one.

because from what i'm seeing these things aren't searching for subshapes in reality their just shifting a bunch of numbers around that statistically seem to correspond.

Comments
  • 5
    Good question.

    The shifting is supposed to prevent overfitting as well as to help the network generalize. It should be able to predict the letter even of pieces of the type are obscured, missing or misaligned.

    Doing random shifts and movements is also a good way to increase your training set. The NN has to go pixel by pixel (or in kernels) so even shifting by a pixel or two represents a valid new sample for the network
  • 0
    @Hazarth precisely what i mean, the last part.
    the smallest difference is a difference where a comparison operator is concerned.

    and imagine your trained network was trained so the weights and biases were adjusted for input neurons 64 through 128 but you shifted those same pixels to 55 through 117

    its the same character, but the weights would be all messed up producing junk.

    so you take that character and you move it all over the place and retrain... wouldn't that severely fuck up all the other character samples ?

    thats what I have difficulty understanding. in essence shouldn't you have to train one neural net for a very limited set of things to cover all bases ?

    like what i took my favorite object, breasts, and i rotated them or trained with different sizes. that object type varies so much it would seem like it should have its own model. tis why its hard for me to imagine a classiifer that detects everything. i'd think they'd chain them. (this didn't work, try this one)
  • 0
    btw its funny how someone thinks my liking boobs makes me sound goofy :P when they invented a fucked up system that makes me straight up laugh at and want to kill them all lol
  • 3
    @MadMadMadMrMim Which is why you train by augmenting your data with shifted/rotated/partial images like @Hazarth said.

    Certain network architectures also more resistant to shifts/rotations to some degree, like CNNs. In general architectures with deeper, more "hierarchical" structure are more resistant to changes in data because the "lower" layers filter out said changes and the "higher" layers then work on these invariant representations.

    Another great way is to "squeeze out" trivial changes in the information by encoding it, using the coded representation, and then decoding it, which for eg. autoencoder networks do really well. During the process of encoding and then decoding an input the network will learn what's important about an input and what it can safely throw, because the coded representation necessarily has less information in it, so it needs to learn to be selective.

    https://en.m.wikipedia.org/wiki/...
  • 0
    @RememberMe you're not getting me. stretched images as well. as in the core parts. not the whole canvas.

    i would think after moving shit around enough you'd end up with a numerical mess that didn't work, just because of what its doing which is adjusting a 'firenow' value at multiple locations and a 'jam this into the trigger and if its heavy enough it will exceed the firenow force required'
  • 1
    @MadMadMadMrMim that's very simplistic. There's a lot of detail in how that firenow signal is both calculated and calibrated. It's not as simple as "if sum of inputs > 0.5, fire", that barely does anything. Based on what you want to do you calculate activations in a whole bunch of different ways.

    And how you connect these things around and how you stack them up in layers affects what it does and how resistant it is to changes in data too. That's what "architecture" means for a neural network and it's been under heavy research for more than a decade now for modern DNNs and you have some really robust systems. The stuff I talked about above is two ways of handling noise and/or distortion in data.

    I assume you've only looked at a simple fully connected architecture trained on the MNIST data set for digit recognition. That's like the hello world for deep learning, there's a lot, lot more that goes into modern state of the art nets (or even a really old architecture like LeNet from 1989. It was once used (or investigated) for zip code recognition by USPS iirc, so it's fairly robust and would make a good study for you https://en.m.wikipedia.org/wiki/...).

    tl;dr there are lots of ways of dealing with distortions/rotations/shifts/scaling/whatever.
  • 0
    @RememberMe I've only been playing about with object classification using opencv and mobile net initially but been looking at the 'hello world' implemented in python yes.

    its using libsvm.

    so the answer to your question is this application developer is hoping this technology isn't light years away from useful implementation atm :P but yes I am interested in the internal workings.

    maybe you could comment more on architechture.

    presently my goal is just to find a lib that allows me to interface with a bunch of pre-trained models to accomplish various tasks.
  • 0
    Praise be to gradient descent.
  • 0
    @off-point you know something is occurring to me with YouTube and other toys and the like and a connected world who’s replacing all the os developers responsible for like. Everything ?
  • 0
    I mean even Microsoft fixed some of its crap by having dumb college students fix their kernel for free
Add Comment