Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
jestdotty434915dok I found in their code they're using Instant instead of SystemTime, seems they just do this to generate the initial seed
wrote a random function using that (so it initializes over and over again)
bruh what it's 1.3 Ms
so maybe the more runs you do the better the performance savings? there's overhead for the seed but then you're good?
going from 10k rounds to 100k
systemTime is 2.7 Ms
fastrand is 4.6 Ms
WHY IS IT CALLED FAST -
Liebranca70515dIf its just for RNG then you dont need accurate time, read the cycle counter on the CPU instead. The modulo part is the same.
This looks like one way to do it in rust https://github.com/gnzlbg/tsc/...
The _rdtsc() call is the important part (see rdstc instruction for the x86: https://felixcloutier.com/x86/...). Everything else seems to be about translating the instruction to different architectures, if Im interpreting the code correctly.
Performance wise, reading the counter should take around 40 cycles on average, at 3GHz that should be 15 nanoseconds I think. Your mileage may vary.
Applying modulo depends on whether youre dividing by a constant, as the division would be optimized out by the compiler in that case and replaced with cheaper instructions. Anyway, even in the worst case its somewhere around a 100 cycles, so 35 nanoseconds at 3GHz.
Estimated total: 50-80 nanoseconds. Time it, I could be wrong. -
jestdotty434915d@Liebranca hehehe
it's like you're doing my homework for me. nerd
that's like the most helpful comment I've received since I was browsing reverse engineer forums as a child
leme test. so exciting. -
jestdotty434915d@Liebranca ok it's actually slower. still kind of cool
450-600 nano seconds for 10k
3.4-5 Ms for 100k
man this jumps a lot, the other methods don't -
jestdotty434915dtried with constant and no effect
I'd think the compiler would just convert stuff to a constant if it can in most cases, probably checks if anyone changes a value and if not then it's a constant. or if I was writing it I'd do that -
Liebranca70515dlmao, rustc strikes again.
We may have to look at the disassembly to get to the bottom of it; this goes pretty fast on my dice roll util (https://gist.github.com/Liebranca/...), so why in all of the world's fuck would compiled code be slower than that, no idea.
Also did you just call me nerd. I'll have you know I'm a man of science!!!11 -
jestdotty434915d@Liebranca ok sorry I got distracted. x10'ed some money somewhere, woo not being a failure
fixed up my test to not be retarded / made benchmark fn
and yeah it's like x5 as fast with the CPU timer turns out
when I first ran my code I was accidentally running some other random method cuz I forgot to comment it out lol
revised numbers:
system:
- 10k: 400-450 nanos
- 100k: 2500-3600 nanos
instant:
- 10k: 900-1300 nanos
- 100k: 9600-11000 nanos
cpu:
- 10k: 65-75 nanos
- 100k: 650-750 nanos -
jestdotty434915dalso apparently the weird character isn't nanos (micron? idk)
anyway CPU method is exactly 7 nanos per lookup and everything else is higher
*sudden bout of dementia and goes to suffer* -
Liebranca70515d@jestdotty yeah, I went to write my own benchmark as the numbers seemed too low. It didn't add up at all haha.
"ms" is for milliseconds (1/1,000), "μs" is for microseconds (1/1,000,000) and "ns" is for nanoseconds (1/1,000,000,000). Note that the consensus is that nowadays it's pointless to use nanoseconds as resolution for measuring runtime of software, but I still do it because I'm counting cycles.
If I give you a number in seconds, with nine decimal places, then you can read it as such: 0s.000ms,000us,000ns
Anyway, highest times I'm seeing when timing my handrolled fasm:
- 0.001|985|073s for a thousand iterations.
- 0.002|364|874s for ten thousand.
- 0.004|305|840s for a hundred thousand.
- 0.025|751|829s for a million.
- 0.106|004|000s for ten million.
- 0.928|516|150s for a hundred million.
This includes the overhead of forking to run the program. But even the most extreme test is still somehow under a second, fucking shit. -
jestdotty434915d@Liebranca ye thought the in/reverse pi symbol was nano , not micro. stupid math people eeee
neener, neener, 4 Ms for 100k?
guess I got a fast cpu. do think so -
Liebranca70515d@jestdotty Well, if I were to run the benchmark on the assembly code directly, rather than measuring time spent inside the program from a shell script, then I would get a more precise result...
Let's try that. I'll be back in a few hours! -
Liebranca70515dAlright, remind me to never do that again.
As expected, lower number of iterations (say 30-40K downwards) run in under a millisecond, because I'm not spawning a new process inbetween starting the clock. But results are about the same for everything else. -
retoor981515dWhy not just incremental.
I thought this would be fast:
```
int unique_id(){
static unsigned long id = 0;
return id++;
}
```
But it's 81.01µs for 10.000 rounds.
1.9ms for 100.000 rounds -
Liebranca70515d@retoor You're only reading from memory and writing back, so context switches causing cache misses. At some point your process lives long enough for an interrupt to happen.
But now pair the id with say a 256 long array of pseudo random numbers. Return array[id++ AND $FF] and you've got the PRNG from the original DOOM.
Would that be faster? No. But it's cooler. -
retoor981515dBenchmarking is addicting. I found out that wrapping functions to an OOP object takes quite a performance hit in wren language.
@Liebranca you should check wren too so you can forget about perl ;) wren is a complete language having only the basics. It kinda misses stdlib so it's nice to work on -
retoor981514d@jestdotty yeah, it's what i mostly do. It's guaranteed unique in a way you're sure. A UUID4 is only as unique as all the grains of sand in the world. That means collision is possible! Oh no!
-
jestdotty434914d@retoor the kicker is I used that in an OS I wrote for the process IDs so I'm like kicking myself rn
-
retoor981514d@jestdotty worked on an OS? Cool. It does fit in my type of projects in case of doing everything myself but the time span is a bit high and doubt if i ever get the networking implemented. But for implementing network @awesomemeest offered help :)
-
jestdotty434914d@retoor it was a faux OS for a game I was playing. I just needed a way for JavaScript not to crash everything so I created processes, therefore if something new I wrote crashed it just crashed the process and not the whole runtime. and then I went around and picked up random OS concepts to stick into the codebase. like keeping track of process errors and performance, prioritizing, figuring out order execution and interdependencies between different processes, CPU management, caching / faux RAM or disk I guess, deferring work to be done later if the system is overtaxed now or starting new work if underutilized, interprocess communication, system-wide events, drivers that interfaced with game things
it was pretty cool. wasn't a study of OSes though it was just a useful model for what I wanted to achieve
---
just tested the counter. it's faster than the CPU actually =]
cpu is 7 ns each roll, id counter is 4 ns each roll
counter isn't really random but works for my purposes actually -
retoor981514dHow do you benchmark something with 7ns? The code to calculate the duration already costs 100ns using the language of angels (C):
nsecs_t start = nsecs();
nsecs_t end = nsecs();
Retrieving nsecs() is a few lines tho. I do have a decade old laptop tho. But still. These are the times my check comes with:
471ns
(ใฃโโกโ)ใฃ ♥ ./unique
250ns
(ใฃโโกโ)ใฃ ♥ ./unique
827ns
(ใฃโโกโ)ใฃ ♥ ./unique
230ns
(ใฃโโกโ)ใฃ ♥ ./unique
433ns
Yes, that's how my bash prefix look like. Deal with it :P -
retoor981514d@jestdotty the CPU takes the same amount of time to solve a sudoku puzzle. It's hard to measure smth like that i guess. How consistent is your 7ns?
-
retoor981514d@soldierofcode don't flip, but there's someone here repeating the same code with same parameters to measure speed and it wasn't me this time. It's the dottii!
-
jestdotty434914d@retoor it's consistent oddly enough, the other ones move around
I ran it maybe 5 times, with the 10k and 100k, so 10 runs I guess. once it displayed 6ns
rust has something called Instant which is for precision timing I think (I also tried to use it to generate a random number and it was slower than system lookup -- it's probably doing something to increase precision somehow)
pastebin: https://paste.hackliberty.org//...
actually I guess in retrospect I could be eating up that loop's overhead also, but good enough I guess
my problem I think was not related to the random number anyway ๐, but still cool to know all this so
also I guess I hashed the instant which could've been why that was slower. I didn't find a way to get a number from it but I didn't try very hard. fastrand used instant to generate a seed like that tho -
retoor981514dMy bench mark works like this:
RBENCH(times, {//code}}). It's a c macro. The only way to benchmark it even more secure is to generate one huge file with the code duplicated on it and compile it once. You're indeed canceling out the loop overhead.
I iterated many times to cancel out luck of poker. It takes 3000 times playing to ensure you have the mathematical % of winning. For example, if math says 7%, it requires you 3000 times playing with those cards to actually reach it. The pokercard simulator was a beautiful piece of code that I lost. It contained the whole game. You could also give amount of players as parameter. Didn't matter, random is random. Still, I'm sure that computers random doesn't come in neighborhood of real random.
My friend that visits me today calculated the three doors problem using code and said, it's better to switch. Sure, with the "random" function you use. I should replicate that code and use different Random fns. Mysql's random is once proven not to be so
Related Rants
need a random number
AI says just use system time and modulus it. I'm wondering if I can get performance down lower cuz I'm doing this maybe like thousands of times a second (im too lazy to do the math rn)
found a crate called fastrand. they're all like this isn't secure for cryptography and yada yada. peak inside curious how they do it. not too sure, seems like they have a predetermined hash and they do some bitwise or something. kind of a lot to read so I don't wanna. either case seems like they're not using system time
make a test to benchmark, 10k rounds how fast is it?
430 nano seconds for system time
460 nano second for fastrand
lol
all that typing and you end up slower than system time. I'm assuming system time can be guessed as well but what's the point of fastrand if it's slower ๐ค
I mean maybe on some OS systems looking up the system time might be slower? no clue
rant
rust
benchmark
random