5

"Fast" random-number/sample based estimation of logarithms:
https://pastebin.com/niVB57Ay

The result of rAvg(p) is usually "pretty close" to log(p)/2

rAvg(p, 1000) seems to be the golden number. 100 is a little low, but I've already pasted the code. Eh.

Don't know why it works, or if average results are actually considered "close" for logarithms of e.

Comments
  • 1
    FindRuns is calculating an approximate base 2 log.

    On average it's going to half the range on each iteration.
  • 0
    @atheist

    What you're saying is a little over my head. Would you explain?

    I wasn't sure if it as genuinely calculating the log or if it was a fluke. I'm interested to know why a set of seemingly random numbers seems to converge on the log though.

    Also is this actually fast or is the standard logarithm algorithm faster (aside from hardware acceleration)?
  • 2
    @Wisecrack if you count the number of times a number, n, is divisible by 2 to get to 1, you've got a rough value for log base 2.

    The while loop in find runs, each iteration picks a random number between i and w. The mean of a random number from a uniform range will be the middle of the range. That means each iteration is "on average" halving that range, similar to the divide by 2 above.

    The divide by 2 is a simple approach to get a result accurate to the nearest whole number, I'm guessing the randomness gives the benefit that it can get results that are accurate to sub integer precision.

    You could use a bisection search to calculate the log algorithmically, would outperform this by an order of magnitude I'd guess (roughly log n time complexity, this has something like n * log n * number of iterations time complexity).

    Hardware algorithms are even faster, you could also use log rules to expand and simplify. And I suspect there's a series approximation of the log function that's pretty fast too.
  • 1
    @atheist

    > You could use a bisection search to calculate the log algorithmically, would outperform this by an order of magnitude I'd guess

    Just read about bisection search (thanks). Thats an interesting approach, if it works the way I think it works. Its sort of like the old lookup tables for logarithms, the ones that used fractions to decide intervals for interpolation yeah?

    > I'm guessing the randomness gives the benefit that it can get results that are accurate to sub integer precision.

    Yeah, but it diverges after a while from the actual logarithm for some n, and it makes me wonder why.

    I was thinking of the quake fast inverse square hack, the one with the magic number and how they derived it. If we generated a large enough set of N's and calculated the average of their error from the actual log of each, we could come up with a rough constant that minimizes the error for most values of N here.
  • 1
    It also makes me wonder why, regardless of the series generated, why each N settles on a finer and finer precision number that reproduceable.

    You ould think that given the same N, and given a unique stream of integers, that the estimated log would always vary slightly between runs, but it doesn't. It just increases the precision of the output with the number of runs.

    I'm not familiar with statistics though so eh.
  • 1
    > The bisection search approach, assuming you've got a quick way to calculate powers, you just start with a power of 1 over your base (eg 2), double your power until the result is above the target value, giving you upper and lower bounds, then split the range until you get the result as the actual value. If we're talking about no fancy algorithms, integer powers are fairly quick to calculate, and fractional powers can be represented IIRC as ratio of 2 integer powers (power rules).

    > it diverges after a while

    Probably not, it's just a probabilistic estimation of the true value. The mean value of "FindRuns" will be the log2 (maybe out by a scale value, which will cause larger errors for larger n in log(n) ), but the probability distribution of the estimated log will be a gaussian (multiple uniform distributions tend to produce a gaussian), eg for some smaller number of iterations, you could plot the distribution for multiple calls to the function (don't reset the random seed each time otherwise you'll always get the same result for the same input). It's not really a unique stream of integers, it's a random number selected from a uniform distribution, the range of that distribution roughly halves each time, so

    > The quake hack is cool. That kinda comes down to float numbers being represented with an exponent in binary, the log is about calculating that exponent, so a basic approximation is just using that value (mantissa IIRC but possibly wrong)
  • 1
    @atheist Thanks again for the ELI5.
Add Comment