1
jestdotty
15d

need a random number

AI says just use system time and modulus it. I'm wondering if I can get performance down lower cuz I'm doing this maybe like thousands of times a second (im too lazy to do the math rn)

found a crate called fastrand. they're all like this isn't secure for cryptography and yada yada. peak inside curious how they do it. not too sure, seems like they have a predetermined hash and they do some bitwise or something. kind of a lot to read so I don't wanna. either case seems like they're not using system time

make a test to benchmark, 10k rounds how fast is it?

430 nano seconds for system time
460 nano second for fastrand

lol
all that typing and you end up slower than system time. I'm assuming system time can be guessed as well but what's the point of fastrand if it's slower ๐Ÿค”

I mean maybe on some OS systems looking up the system time might be slower? no clue

Comments
  • 1
    ok I found in their code they're using Instant instead of SystemTime, seems they just do this to generate the initial seed

    wrote a random function using that (so it initializes over and over again)

    bruh what it's 1.3 Ms

    so maybe the more runs you do the better the performance savings? there's overhead for the seed but then you're good?

    going from 10k rounds to 100k
    systemTime is 2.7 Ms
    fastrand is 4.6 Ms

    WHY IS IT CALLED FAST
  • 2
    If its just for RNG then you dont need accurate time, read the cycle counter on the CPU instead. The modulo part is the same.

    This looks like one way to do it in rust https://github.com/gnzlbg/tsc/...

    The _rdtsc() call is the important part (see rdstc instruction for the x86: https://felixcloutier.com/x86/...). Everything else seems to be about translating the instruction to different architectures, if Im interpreting the code correctly.

    Performance wise, reading the counter should take around 40 cycles on average, at 3GHz that should be 15 nanoseconds I think. Your mileage may vary.

    Applying modulo depends on whether youre dividing by a constant, as the division would be optimized out by the compiler in that case and replaced with cheaper instructions. Anyway, even in the worst case its somewhere around a 100 cycles, so 35 nanoseconds at 3GHz.

    Estimated total: 50-80 nanoseconds. Time it, I could be wrong.
  • 1
    @Liebranca hehehe

    it's like you're doing my homework for me. nerd

    that's like the most helpful comment I've received since I was browsing reverse engineer forums as a child

    leme test. so exciting.
  • 0
    @Liebranca ok it's actually slower. still kind of cool

    450-600 nano seconds for 10k
    3.4-5 Ms for 100k

    man this jumps a lot, the other methods don't
  • 0
    tried with constant and no effect

    I'd think the compiler would just convert stuff to a constant if it can in most cases, probably checks if anyone changes a value and if not then it's a constant. or if I was writing it I'd do that
  • 0
    oh I'm fucking dumb nevermind

    150 Ms for 10k
    1.4 Ms for 100k
  • 1
    lmao, rustc strikes again.

    We may have to look at the disassembly to get to the bottom of it; this goes pretty fast on my dice roll util (https://gist.github.com/Liebranca/...), so why in all of the world's fuck would compiled code be slower than that, no idea.

    Also did you just call me nerd. I'll have you know I'm a man of science!!!11
  • 2
    @Liebranca ok sorry I got distracted. x10'ed some money somewhere, woo not being a failure

    fixed up my test to not be retarded / made benchmark fn

    and yeah it's like x5 as fast with the CPU timer turns out

    when I first ran my code I was accidentally running some other random method cuz I forgot to comment it out lol

    revised numbers:
    system:
    - 10k: 400-450 nanos
    - 100k: 2500-3600 nanos
    instant:
    - 10k: 900-1300 nanos
    - 100k: 9600-11000 nanos
    cpu:
    - 10k: 65-75 nanos
    - 100k: 650-750 nanos
  • 1
    also apparently the weird character isn't nanos (micron? idk)

    anyway CPU method is exactly 7 nanos per lookup and everything else is higher

    *sudden bout of dementia and goes to suffer*
  • 2
    @jestdotty yeah, I went to write my own benchmark as the numbers seemed too low. It didn't add up at all haha.

    "ms" is for milliseconds (1/1,000), "μs" is for microseconds (1/1,000,000) and "ns" is for nanoseconds (1/1,000,000,000). Note that the consensus is that nowadays it's pointless to use nanoseconds as resolution for measuring runtime of software, but I still do it because I'm counting cycles.

    If I give you a number in seconds, with nine decimal places, then you can read it as such: 0s.000ms,000us,000ns

    Anyway, highest times I'm seeing when timing my handrolled fasm:

    - 0.001|985|073s for a thousand iterations.

    - 0.002|364|874s for ten thousand.

    - 0.004|305|840s for a hundred thousand.

    - 0.025|751|829s for a million.

    - 0.106|004|000s for ten million.

    - 0.928|516|150s for a hundred million.

    This includes the overhead of forking to run the program. But even the most extreme test is still somehow under a second, fucking shit.
  • 0
    @Liebranca ye thought the in/reverse pi symbol was nano , not micro. stupid math people eeee

    neener, neener, 4 Ms for 100k?

    guess I got a fast cpu. do think so
  • 1
    @jestdotty Well, if I were to run the benchmark on the assembly code directly, rather than measuring time spent inside the program from a shell script, then I would get a more precise result...

    Let's try that. I'll be back in a few hours!
  • 2
    Alright, remind me to never do that again.

    As expected, lower number of iterations (say 30-40K downwards) run in under a millisecond, because I'm not spawning a new process inbetween starting the clock. But results are about the same for everything else.
  • 2
    Why not just incremental.

    I thought this would be fast:

    ```

    int unique_id(){

    static unsigned long id = 0;

    return id++;

    }

    ```

    But it's 81.01µs for 10.000 rounds.

    1.9ms for 100.000 rounds
  • 2
    @retoor You're only reading from memory and writing back, so context switches causing cache misses. At some point your process lives long enough for an interrupt to happen.

    But now pair the id with say a 256 long array of pseudo random numbers. Return array[id++ AND $FF] and you've got the PRNG from the original DOOM.

    Would that be faster? No. But it's cooler.
  • 0
    Benchmarking is addicting. I found out that wrapping functions to an OOP object takes quite a performance hit in wren language.

    @Liebranca you should check wren too so you can forget about perl ;) wren is a complete language having only the basics. It kinda misses stdlib so it's nice to work on
  • 1
    @retoor I have no idea why I didn't think of a counter lmao
  • 1
    @jestdotty yeah, it's what i mostly do. It's guaranteed unique in a way you're sure. A UUID4 is only as unique as all the grains of sand in the world. That means collision is possible! Oh no!
  • 1
    @retoor the kicker is I used that in an OS I wrote for the process IDs so I'm like kicking myself rn
  • 0
    @jestdotty worked on an OS? Cool. It does fit in my type of projects in case of doing everything myself but the time span is a bit high and doubt if i ever get the networking implemented. But for implementing network @awesomemeest offered help :)
  • 1
    @retoor it was a faux OS for a game I was playing. I just needed a way for JavaScript not to crash everything so I created processes, therefore if something new I wrote crashed it just crashed the process and not the whole runtime. and then I went around and picked up random OS concepts to stick into the codebase. like keeping track of process errors and performance, prioritizing, figuring out order execution and interdependencies between different processes, CPU management, caching / faux RAM or disk I guess, deferring work to be done later if the system is overtaxed now or starting new work if underutilized, interprocess communication, system-wide events, drivers that interfaced with game things

    it was pretty cool. wasn't a study of OSes though it was just a useful model for what I wanted to achieve

    ---

    just tested the counter. it's faster than the CPU actually =]

    cpu is 7 ns each roll, id counter is 4 ns each roll

    counter isn't really random but works for my purposes actually
  • 1
    How do you benchmark something with 7ns? The code to calculate the duration already costs 100ns using the language of angels (C):

    nsecs_t start = nsecs();

    nsecs_t end = nsecs();

    Retrieving nsecs() is a few lines tho. I do have a decade old laptop tho. But still. These are the times my check comes with:

    471ns

    (ใฃโ—”โ—กโ—”)ใฃ ♥ ./unique

    250ns

    (ใฃโ—”โ—กโ—”)ใฃ ♥ ./unique

    827ns

    (ใฃโ—”โ—กโ—”)ใฃ ♥ ./unique

    230ns

    (ใฃโ—”โ—กโ—”)ใฃ ♥ ./unique

    433ns

    Yes, that's how my bash prefix look like. Deal with it :P
  • 0
    @jestdotty the CPU takes the same amount of time to solve a sudoku puzzle. It's hard to measure smth like that i guess. How consistent is your 7ns?
  • 0
    @soldierofcode don't flip, but there's someone here repeating the same code with same parameters to measure speed and it wasn't me this time. It's the dottii!
  • 1
    @retoor it's consistent oddly enough, the other ones move around

    I ran it maybe 5 times, with the 10k and 100k, so 10 runs I guess. once it displayed 6ns

    rust has something called Instant which is for precision timing I think (I also tried to use it to generate a random number and it was slower than system lookup -- it's probably doing something to increase precision somehow)

    pastebin: https://paste.hackliberty.org//...

    actually I guess in retrospect I could be eating up that loop's overhead also, but good enough I guess

    my problem I think was not related to the random number anyway ๐Ÿ˜, but still cool to know all this so

    also I guess I hashed the instant which could've been why that was slower. I didn't find a way to get a number from it but I didn't try very hard. fastrand used instant to generate a seed like that tho
  • 1
    @retoor that's a very cute bash prefix ๐Ÿคฃ
  • 1
    My bench mark works like this:

    RBENCH(times, {//code}}). It's a c macro. The only way to benchmark it even more secure is to generate one huge file with the code duplicated on it and compile it once. You're indeed canceling out the loop overhead.

    I iterated many times to cancel out luck of poker. It takes 3000 times playing to ensure you have the mathematical % of winning. For example, if math says 7%, it requires you 3000 times playing with those cards to actually reach it. The pokercard simulator was a beautiful piece of code that I lost. It contained the whole game. You could also give amount of players as parameter. Didn't matter, random is random. Still, I'm sure that computers random doesn't come in neighborhood of real random.

    My friend that visits me today calculated the three doors problem using code and said, it's better to switch. Sure, with the "random" function you use. I should replicate that code and use different Random fns. Mysql's random is once proven not to be so
Add Comment