12
halfflat
25d

Jesus, Python. Hash randomization?! I now have non-deterministic fucking behaviour _that I cannot disable within the python script_.

Oh, we have a crafted-input DoS scenario. What will we do? Make an interface to allow programs to supply their own hash function perhaps? No! That wouldn't break reproducibility!

Floating point addition is not commutative? Summing over float values in a set may give randomly different answers? Who cares? It's not like anyone is using python in science or anything.

Comments
  • 1
    What are you using python for?
    And you can wrap some carefully written C functions for that.
  • 2
    I don't think I can make a C function 'void reproducible_baby()', chuck it an extension module and off we go. [*]

    Reproducible behaviour allows people to find bugs when something else goes wrong. Are you getting different answers? Did we introduce an error in the process? The software? The data? Or is Python just making shit up?

    If I want random behaviour, I'll be explicit. I'll use MPI or, I dunno, the Intel C compiler.

    [*] Actually, if I can do that, let me know. That would be awesome.
  • 5
    @magicMirror Oh hell yeah let's write C because my extremely high level scripting language sucks
  • 8
    The floating point thing isn't a Python issue. You seriously need to read up on how floating point even works.
  • 0
    @Fast-Nop, @Irene Please have some charity here, I am well aware of floating point issues.

    The problem isn't that fp arithmetic is not commutative. The problem is that it is not deterministic, if something like Python comes along and randomly changes the order of operations.
  • 3
    Just to elaborate. Programs generally pretend to be functions: they take some input, and map it to some output.

    So when the same program takes the same inputs and produces different outputs, this normally means that either the program wasn't actually the same (e.g. different compiler or different code), or the inputs weren't actually the same (e.g. different operating environment, or different data produced further up the chain of processing.) Or something more fundamental has gone wrong, like a fault in the hardware.

    This is why reproducibility is important: it allows people to put complex systems in boxes and reason about their behaviour, so we have some damned hope of working out what went wrong when things go wrong.
  • 4
    The issue is misusing floating point. It is never supposed to give an exact result. You never test for equality (except e.g. before divisions to avoid a null division). You always test for ranges instead.

    And BAMM you will get reproducible results, provided that you don't have a problem that is numerically badly conditioned.
  • 3
    @Fast-Nop You are totally missing the point.

    Consider this scenario: series of programs transform some data into some numerical results. One step in the processing pipeline is changed, and the results differ, but differ from invocation to invocation.

    Is this change innocuous? Is it a faulty bit in memory somewhere? Is something somehow reading in uninitialized state that can be taking random values?

    Faulty memory: this is a real problem, let's diagnose where in the chain of processing this happens. Uninitialized state? It's a software bug that we need to track down.

    All of this debugging only works if the components operate reproducibly.
  • 1
    A similar error scenario in Ancient Times led me to diagnose a UDP packet corruption issue: a network router had been configured with UDP checksumming off, and we were running NFS over UDP.
  • 1
    @halfflat Can you provide a minimal example? I never saw that kind of behavior and want to see this in action.

    https://pyfiddle.io/
  • 4
    @halfflat The results differ because you do test for equality, which (as I wrote) you should never do with float, it's that easy. Write your checks so that they work on ranges, then you don't have issues.

    If you work with floats, it's your duty as dev to make a rounding error propagation analysis and determine how exact your results even are. If all are within the interval that your numeric analysis has given as error interval, AND your requirements are OK with that amount of error, then they are equal.
  • 1
    @Demolishun https://pyfiddle.io/fiddle/...

    May have to run it quite a few times to see the different output.
  • 2
    @Fast-Nop I can't understand why you don't understand this. If it were really just an innocuous floating point reorder, then yeah, of course it's fine. We don't care; no one cares.

    The problem is that if operations are being randomly reordered, we can't tell that scenario apart from some much more horrible scenarios that demand urgent intervention.
  • 3
    @halfflat That is really fucked up. Nice simple example too.
  • 2
    @halfflat Well it seems that you don't understand what "test for equality" even means. This is what your tests are doing, from one run to another. Rewrite them properly, and you don't have issues.

    Also, you need to read up on numerics because your example qualifies as very badly conditioned. If you have real cases like that, then you have completely different problems and need different solutions.
  • 0
    Jesus @Fast-Nop, it's not about the numerics, it's about fucking reproducibility.
  • 2
    @halfflat It looks like the order is the hashing caused by the dictionary. Is that what you are talking about?

    I noticed it doesn't do this with a list. I assume because they are ordered.

    https://pyfiddle.io/fiddle/...

    @Fast-Nop I think everyone here understands floating point comparisons. What exactly are you talking about? Please provide some code to elucidate.
  • 2
    @halfflat Yeah and that's why your tests should check whether the result is X +/- delta, with very small delta. If your tests don't do it like this, they are unsuited for floating point.

    Also, a set is, by definition, an UNORDERED list. Unordered as in "order doesn't matter" so that even instruction level parallelism could kick in.

    @Demolishun I don't think that the idea of a range check with a +/- delta should need any code.
  • 2
    Yep, it's a change that Python introduced in 3.3. Apparently the fact that hash values were predictable meant that some net-facing services in python were vulnerable to DoS through quadratic behaviour with crafted keys.

    So the fix was to make the hashing random. In the process, this broke any process that assumes that the same input, same python program would produce the same output.
  • 1
    @Fast-Nop That is not what we are talking about. Read the code he provided. Its really easy to see there are zero comparisons in there.
  • 2
  • 2
    @Demolishun The comparison will be done between runs of the test obviously, which is what we're talking about in the whole thread.
  • 2
    @Fast-Nop There are two directions of inference here.

    1. I make a change in the code. It reorders a floating point calculation. If this actually causes a problem somewhere, then my code sucks and I should give back my maths degree.

    2. A chain of processing starts giving different answers, and I need to work out why. If I can safely assume that each step of the process has reproducible behaviour, I am going to find out much more quickly if 1) this is actually problematic and 2) what is causing the problem.

    Python's behaviour means that I cannot presume reproducible behaviour from a component written in python (unless I can get my mitts on the environment variable that controls this behaviour.)
  • 2
    @halfflat The fundamental coding error is assuming an exact result of any kind floating point operation. If the delta is irrelevant, then the results are the same for all practical purposes.

    But well, just keep on ranting about how much Python sucks, keep your botched up tests, I don't care and I'm out. Some people really don't want to improve.
  • 0
    Christ all mighty. Use your imagination, @Fast-Nop. Put yourself in scenario 2. Floating point has nothing to do with it. It could be text output.

    That said, IEEE floating point can be pinned down. The same operations on the same data give the same results _deterministically_. This is really fucking useful for debugging the sort of crap I am talking about here.
  • 3
    @irene No, that's not true.

    I totally expect (a+b)+(c+d) to give me a different answer to a + (b+c) + d. That's fine.

    I don't expect f(a, b, c, d) for some function f to give me different answers depending on the time of day, outside of some very well-defined situations.
  • 1
    I might have mentioned about 19 times above that yes, fp operations are not commutative. WE ALL KNOW THIS.
  • 1
    Next people will be saying that ECC is a waste of time.
  • 1
    @halfflat Don't use a dict. It is rearranging things. Use a list, OrderedDict, or something else with a deterministic order.
  • 2
    The concrete example for me was: code on one system was generating different output from another. This can indicate there is a problem, and deserves some investigation.

    I discovered then that on system B, it was producing different code each invocation. (Turns out the python versions differed between the two systems.)

    Not knowing that Python had embraced Eris, the Goddess of Strife and Discord, I then spent half and hour trying to find out wtf was going on. Narrowed it down to an iteration over a set, and then SO led me to enlightenment.

    In this case, no real harm done. But the principle that correct code can produce literally randomized output from the same input breaks long established assumptions about how code works, and is a nightmare from the point of view of debugging problems in complex systems.
  • 1
    @Demolishun Yes, I know this NOW. Argh.

    You know, I did look at the Python language documentation. It was only after learning (via SO) about the magic environment variable that I was able to find the paragraph in the description of object.__hash__ that explained this behaviour.
  • 2
    @halfflat We have a system that uses a hash for the entries in a config file. It literally produces a different order for the config file on each run. So the people modifying the config have to hunt to find what the strings are for editing. It was never really meant to be human readable, but somehow they end up needing to edit it by hand all the time. I know how to fix it, but there are lots of installed systems in the field. Those won't be fixed.
  • 2
    @halfflat NOW is the winter of our discontent!
  • 2
    @Demolishun Haha! Yes, so winter. Very discontent.
  • 1
    @irene He did, that is what we have been talking about. That is what those links were.
  • 0
    @irene I admit, it will be tedious to extract a schematic example from the real code. But to be honest, it doesn't matter: nothing broke as such.

    The problem was that the non-deterministic behaviour made it _look_ as if something was broken.
  • 1
    I've also just realized that all with all my raving about commutability of fp, I actually meant associativity. IEEE fp addition is totally commutative :)
  • 0
    @irene Heh, I forget that people use this on phones. I use this in a browser. Well, if you can find one that does work with your phone I will paste it there.
  • 0
    The demo was really just a tiny example that exhibits the class of behaviour.

    In Python 3.6 or 3.7:

    d = {('apple', 1), ('banana', 1e-20), ('kumquat', -1e-20), ('durian', -1)}

    print(sum((x[1] for x in d)))

    And I have to preemptively stress: I know that the output will be dependent upon the order of operations; the numbers here are chosen to make this very clear. What was a problem, was that the output changed between runs.
  • 0
    @irene That link is actually a python execution environment with easy selection of versions of python and the ability to store snippets. It makes it so the viewer doesn't have to spin up any local environments. But yeah, text is sometimes simpler.
  • 1
    But it could well just as easily have been:

    d = {('apple', 'aardvark'), ('banana', 'beetle')}

    print ('\n'.join((x[1] for x in d)))

    It's not the numerics, it's the irreproducibility. The difference is between an arbitrary-but-fixed order, and a random order. I don't care (in this instance) what the order is, but if I see different results *between runs* on the same machine, this is a cause for concern.
  • 0
    I feel I need to belabour this point. There's a lot of numerical code out there that uses pseudorandom numbers, right? But usually you'll have a way to provide explicitly the seed that is used for those numbers.

    If the code is robust, why would you ever need that?

    The answer is reproducibility. It makes it possible to hone in on errors and diagnose a fault. It makes it possible for others to reproduce your results. It's a really important feature.

    Python provides an environment variable that allows reproducibility. But they changed from arbitrary ordering to random ordering in a point release, and it's a really fucking surprising default behaviour.
  • 2
    @halfflat I think this thread shows there are people more interested in belittling the coding abilities of others than actually helping, or trying to understand something. I get it, this site is built upon quips eluding to personal insults. It can be fun! God knows I like making quips. I just don't think the majority of people take it seriously enough for any serious discussion. I could be wrong, yes "someone might be wrong on the internet". GASP!!!
  • 2
    Depends of python version you’re using there might be different results.

    https://stackoverflow.com/questions...
  • 3
    @vane Indeed. It was the change of behaviour between Python versions that made me learn this.
  • 2
    @halfflat I found it hard way when switching between languages that it’s good practice to treat dictionaries /maps as unordered key/values if documentation don’t say it explicitly. I see you got to that too now.

    Also another one is use exact version of software on local computer and production environment for error reproduction- that’s where containers found its place.
  • 0
    @vane It's not that they are unordered that's the problem — an unordered map is a really common data structure, which we're all familiar with.

    Usually (i.e. for me, every instance I'd so far come across, before python), if one can iterate over an unordered map, that iteration is performed in an arbitrary *but deterministic* sequence. This is the first time I've seen that arbitrary order in fact be random.
  • 2
    If found you're error: You are using a set (=a list of unique objects, which does not retain order). For performance reason it uses the object's hash. As hashes differ across invocations (and as a general purpose language the vulnerability of hash collisions is a really bad one), the order of the set is not defined - the alternative (e.g. forcing the order or sorting every time when iterating over it) would be a huge performance loss for many cases.
    If you want to remain order use lists or tuples (or use sorted before iterating over a set).

    Even the documentation says so: "Since sets only define partial ordering (subset relationships), the output of the list.sort() method is undefined for lists of sets."

    Oh, and if you want to have precise floats, the decimal library helps (or you might consider the fractions module). But as I said before, the concrete issue is not the use of floats, but of sets and assuming the order.
  • 1
    @sbiewald I mean, I know _now_, after being led on a goose chase, that Python's hashes have been changed to be literally random.

    But I never expected things to be in one particular order, nor was the code buggy in relying upon there being a particular order.

    The *only thing* that has me aggrieved, and which surprisingly no one else seems to care about, is that this ordering is random at run time rather than merely arbitrary.

    Changing semantics from deterministic (but arbitrary) to random for all Python code in order to alleviate a problem with one particular application of hashing in net-facing python code just seems utterly crazy to me. It's a hack. If hash collisions are a problem — and yeah, in this issue they were a problem — they should provide a mechanism for providing one's own hash function.
  • 0
    @halfflat No they shouldn't, because most people would fail providing a proper one.

    And if you want to force a deterministic behaviour, use the PYTHONHASHSEED environment variable (documented at https://docs.python.org/3/using/...), so I guess this is something I'm between.

    Also hash collisions and their prevention are relevant for many applications: They can be problematic in any application facing external input (e.g. web applications), which isn't atypical for a general purpose language, so the secure way was chosen as the default (oh, and a recent Python version fixed the undeterministic ordering for at least dictionaries).
  • 1
    @sbiewald The python library could have provided a random_hash function that did the job.

    Yes, there is an environment variable work-around (note that you can't fix this in the python script itself).

    I am just astonished that reproducibility is dismissed as being a minor concern. And that changing from deterministic to random behaviour is regarded with a collective 'meh'.

    Reproducibility is such an important feature of a program; it has to be sacrificed sometimes, of course, but it should not be done casually!
  • 0
    @irene It does have an order! We iterate over them: it has an order.

    We can't rely on that order being any one thing. It's arbitrary. I thought — and apparently so did many other people based on SO questions — that one could rely upon that order being deterministic. Like pretty much every other language.
  • 0
    @halfflat

    I would say it is worse if you could change the interpreter internals (e.g. the hash function) inside of it.

    Because:

    - All previously hashed content (e.g. dictionary keys) would have to be rehashed - that would result in a *huge* drop in performance.

    - If would constantly have to check if the wanted behavior is still active during my code or if it hasn't changed. (Let's say a web application wants hash randomization, but some mathematical functions disable it and in result the whole application is vulnerable to DoS).

    (continued)
  • 0
    @irene Iterators in python iterate through some collection one by one. This imposes an order. It is the order that the iterator spits things out in.

    Same in C++, of course, absent the arbitrary randomness.
  • 0
    @sbiewald I think it's safe to say that an alternative hash should be associated with the container, not the object. For the reasons you describe among others.
  • 0
    @halfflat

    There is no "correct" way to solve it - as there are only a few data types (and those are not ordered by design and that behavior is even documented) that are affected by that issue! And why would one assume that an unordered data type would be ordered by time of insertion? Or have any order at all? The order may break as soon as you add an element to the dict for example.

    Additionally the "ordering" of unordered data types can change between minor (like 3.8.x->3.9.y) versions, so the randomization is even better, because it teaches anyone to no rely on any accidental but seemingly consistent ordering - only seemingly because between minor versions it will break.

    People who need reproducibility use ordered data types.

    And the issue is even fixed for dictionaries now (just not for sets like in your example, you did not even use a dict there)!

    Look there: https://pyfiddle.io/fiddle/...
  • 0
    @halfflat So each object would consume more memory than necessary for the most operations?

    And you already have that choice: Use an OrderectDict. (https://docs.python.org/3.8/...)

    By the way, try my code in the Fiddle in Python 2 and 3. While you get the same results between any invocation, it is different between each versions.

    So if you rely on the order of such unordered datatype you create not portable code.
  • 0
    @sbiewald Just on the efficiency side of things: associating a non-default hash function with a container seems to be an overhead of exactly one object reference. I don't think this can really be used as an argument against such an implementation.

    But look, I know that Python we keep doing what Python wants; it's not going to change. But we can at least be clear about the distinctions.

    One might want there to be an exact alignment between insertion order and iteration order: OrderedDict provides this guarantee, and it's a strong requirement.

    One might not care what order things come out of a dictionary or set, but might care that given the same collection of items in the collection, that there is the same iterator order. This is a weaker requirement. Certainly useful, but not what I was assuming. Weaker still would be to allow it change between different versions of the language or library.

    [...]
  • 0
    Continued:

    Weaker still: that the iteration order can differ from iteration to iteration, but that the same input to a program produces the same output, if run in the same environment. This is the reproducibility requirement.

    Finally, there is the hands in the air like we don't care option: iteration order can be anything. Program behaviour is not reproducible unless an order is imposed upon the collection by some other means.

    In short: the requirement for reproducibility is much much weaker than what OrderedDict provides. Python offers (now) an all-or-nothing approach in its standard library.
  • 0
    And of course it's the non-reproducible, weakest possible guarantee that is offered by the native syntax for sets. We have to do extra work with extra syntax for something which frankly, should be a default property of a program.

    Edit: I'd be more sympathetic if Python was a multithreaded language, where asynchronous behaviour was built in to its very design. But Python is threading-hostile.
  • 0
    @halfflat "Iterators in python iterate through some collection one by one. This imposes an order. It is the order that the iterator spits things out in."

    The order of a hash map is the order of the hashes. The hashes are randomized, so that is the root of the problem...

    You are just making too much assumptions about the environment.
  • 0
    @sbiewald Yeah, I know man. This is what I'm complaining about. The decision to randomize something that breaks reproducibility.
  • 0
    @halfflat They are still given: You just do not consider all inputs.

    The input you are missing is the PYTHONHASHSEED. If that stays the same, the program is reproducible. And that's it.

    'Explicit is better than implicit (and implicit is better than nothing).' - Zen of Python

    Python chose the most explicit one (e.g. all or nothing), and leaves out most implicit guarantees (because I'm somehow sure, people would start to rely on implicit guarantees).

    AND BY THE WAY: Dicts have a deterministic, reproducible order now.

    If you want a deterministic order for sets (or know how to easily provide one) open a bug report / feature request on Python bug tracker.

    I agree, that reproducibility in that case would be better (besides setting the hash seed), but I have to say: It unfortunately has other drawbacks.
  • 0
    @sbiewald I'm beating a dead horse here, I know. But the environment variable pushes semantics of the program to the environment. If one is going to rely upon the promises of PYTHONHASHSEED, then one is going to need not just a python program, but that program plus some external script or procedure for setting up that environment.

    I don't think it's a good solution. At least, not without there being some way to enforce it also from within the python source too.
  • 0
    @halfflat I thought the docs basically said for dicts that you cannot rely on it being in any order. I read that back in the 2.x docs years ago. I get what you are saying, but dont use dicts. dicts aint gonna change. Just say no to dict!
  • 1
    @halfflat It already depends on something out of control of the script: The whole Python interpreter, which cannot be changed by the script either.
  • 0
    @Demolishun The statement without elaboration is still ambiguous. I'm certainly not alone in reading that as meaning you can't presume that the order is any given order, rather than as meaning the order you do get is going to change between program invocations.

    Looking through the Python 3.7 official documentation, it's really not at all clear that this is what happens. You have to read the info for object.__hash__() to learn this.
  • 0
    @sbiewald Yep, that's another Python problem. Writing python version agnostic code is a huge pain.
  • 0
    @halfflat C++ is worse. Really, I can't use const in a class anymore? I gotta use constexpr? Fuck you C++.
  • 0
    @halfflat Huh? What?

    In Python 3.7 (I think) it changed the semantic of the import statement a bit, but what other things are forward incompatible (and not announced at least 3 versions earlier)?

    If you still mean Python 2 and 3 agnostic code, well in less than a half year blame customers (or your company) for using a language version which is not officially supported anymore.
  • 0
    @Demolishun At least most C++ environments allow you to pick the standard against which your source is written. And even then, backwards incompatible changes are fairly infrequent.
  • 0
    @sbiewald

    print "Hello world"

    vs

    print("Hello world")

    ;-)

    You are right, its not that bad.
  • 0
    @sbiewald Apart from set and dict behaviour, ha ha.

    Nah, it's the 2 vs 3 thing. I'm looking forward to it dropping off support. Maybe finally we'll see vendors forced to migrate their ancient shitty environments.
  • 0
    @halfflat You don't choose what versions of python you are running? What? How is this a thing?
  • 0
    Oh, and of course build ecosystem changes. What a wonderfully shitty mess.
  • 0
    @Demolishun Got to write code that runs on systems I don't control.
  • 1
    @Demolishun Writing code for CentOS 6. Python 3? Nope, requires explicit permission by customer to install a system wide package. And this has to documented, done on all machines of the customer (of course no automation tool is present) and of course the customer wont pay for it... So the support is kept. Guess we will have to bundle our own Python version starting next year, because most of the dependencies will drop Python 2 support.
  • 0
Your Job Suck?
Get a Better Job
Add Comment