15

Why does MySql least and greatest functions return null if any of the arguments are null? Who was the genius behind the idea that it's the best way to implement it, and to change this behavior in a minor version?

Why does MySql return bigint when i convert a value to integer?

Why does MySql exist?

Comments
  • 1
    Changed in a minor versions :s this is bad
  • 0
    Because it's been bought by oracle.
  • 2
    Since NULL represents an unknown value, if you’re trying to work out which number is greatest in a set that includes, for example, NULL, 4, 7, and 9... the answer has to be NULL (ie you don’t know) because you can’t be sure that the unknown value is smaller than 9.
  • 0
    @rhubarbcode null is not unknown. It represents the absence of a value. A proper implementation would be that, when given N parameters, one of which is null, it would return the greatest or least of the existing N-1 values, since one is absent.

    It also makes sense in a practical sense. Why would you be interested in a null? Say you're joining 6 tables on a common key and you want to find the greatest modification date of all records, but some tables might not have a corresponding record. How will that look? Well in mysql 5.0 is would have been

    Least (a,b,c,d,e,f)

    In mysql 5.0.something it would have to be this:

    Least (
    Coalesce (a,b,c,d,e,f),
    Coalesce (b,c,d,e,f,a),
    Coalesce (c,d,e,f,a,b),
    Coalesce (d,e,f,a,b,c),
    Coalesce (e,f,a,b,c,d),
    Coalesce (f,a,b,c,d,e),
    )

    Which one is nicer? And even if you change this, ... minor version?!
  • 0
    @AndSoWeCode

    “NULL indicates that the value is unknown. A null value is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a null value and any other value, return unknown because the value of each NULL is unknown.”

    https://docs.microsoft.com/en-us/...

    SELECT LEAST(
    IFNULL(a, ~0 >> 1),
    IFNULL(b, ~0 >> 1),...
    )

    I agree that it’s kinda bad, but I think the reason it was changed in a minor version was that the previous implementation of the functions was wrong, because it didn’t match the results you would get in PL/SQL.

    Why can’t you just use MIN()?
  • 1
    @rhubarbcode null, technically is an empty value everywhere, including rdbms.
    https://en.m.wikipedia.org/wiki/...

    In certain cases it can be treated as unknown. However, generally there is no "unknown" in rdbms or any computer systems, because they operate on the closed works hypothesis. The closest we can get to unknown is to know that we have no information about something.
  • 0
    @rhubarbcode as for the alternative you proposed - it would work only for fixed precision, fixed size integer numbers. Not floating point, strings, dates or other comparables.

    As for making it the same as Oracle - it's a terrible idea. Oracle is a nightmare. And this only proves that.
  • 0
    @AndSoWeCode the page you linked states that NULL is a state, not a value, and that “SQL Null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information".”

    The page also explicitly states that NULL violates the closed world assumption: “Another point of conflict concerning Nulls is that they violate the closed world assumption model of relational databases... [and instead] operate under the open world assumption, in which some items stored in the database are considered unknown, making the database's stored knowledge of the world incomplete.”
  • 0
    @AndSoWeCode I definitely agree about Oracle.
  • 0
    @rhubarbcode read the definition, not the interpretation.

    "A null value indicates a lack of a value - a lack of a value is not the same thing as a value of zero in the same way that a lack of an answer is not the same thing as an answer of "no""

    Computers lack information about lots of stuff. It doesn't mean that they violate the closed world assumption. Computers do know that they don't have info about something.

    When you query for a null variable, you get null, as in "i know that it's not set". Same in databases. Not set. Absent information.
  • 0
    @AndSoWeCode you said earlier that “null is not unknown, it represents the absence of a value”. Those two things are not inconsistent, since null can represent that a value is either missing or inapplicable. The former is what is being referred to as an unknown value.

    I’m not sure what point you’re trying to make by talking about the difference between null and 0, because it doesn’t sound like we disagree on that.

    “Computers lack information about lots of stuff. It doesn’t mean that they violate the closed works assumption.” It often does. The closed world assumption is that everything the system does not currently know to be true must therefore be false. This is clearly incorrect when it comes to databases, for example if you have a column called “alive” in a table of people, the fact that one of the rows has a null value instead of a truth value does not necessarily mean the person is dead, it just means you don’t know whether they’re alive or dead.
  • 0
    @rhubarbcode my point was that the value is missing. You can interpret it as unknown, but that's just one particular case.

    As for the "alive" column, it is a design flaw. It doesn't make sense to store that. One must just store the date if death. Or, since we're mostly dealing with people that are alive, we would have a separate table storing deaths only.
    But overall in a db, one should store exceptions, rather than a common rule. That way you don't violate the closed world assumption, and your db is smaller and faster, because it stores less data.
  • 0
    @AndSoWeCode it is just one particular case, but the functions’ implementations can’t ignore that case, because it is part of how the database is defined. If a value is missing, the function cannot assume that this is because it is inapplicable, it has to allow for the possibility that it is applicable but unknown.

    If you store the date of death, a similar problem still arises: the fact that the date of death is missing does not imply that the person is alive. We may just not know when they died, or not know that they have died.
  • 0
    @rhubarbcode the point is that it's the developer's task to design, so that the closed world assumption would hold correct. And in all the cases that I came across, it's also more efficient, faster, easier to operate.

    A proper design would make the assumption correct.

    As for the functions in question, I explained above why it makes total sense not to return null. Because it's executed on a set of values, just like aggregate functions, that ignore null values, because those values are not there.

    You don't need to assume that a null is bigger or smaller, because it's none of those things. Null is the absence of a value, so it is never the greatest or least if the set. It's simply not in the set.

    There is no case where this behavior would make sense.
  • 0
    @AndSoWeCode the point is that the only way to design a database that doesn’t violate the closed world assumption is to never allow the use of NULL anywhere within it, because the act of allowing NULL to be in the database violates the assumption.

    If I have Anna (170cm tall), Bob (180cm tall), and Chris (175cm tall) in my “person” table, and there is a column for their heights, but Anna’s height is missing, does that mean that Chris is the shortest person in the table? No. If I use LEAST() on their heights, the answer should be unknown, because the database doesn’t know whether Anna is shorter than Chris.
  • 0
    @rhubarbcode "the point is that the only way to design a database that doesn’t violate the closed world assumption is to never allow the use of NULL anywhere within it"

    no.
    Why do you say that? A simple reading of my previous comments would give you simple examples of the opposite.

    When designing a database, one just has to think from a computer's perspective. Simply design it in a way that would make it really mean that if a value is absent, it can be safely assumed that it simply does not exist. There are hordes of examples out there, there is nothing outlandish about it, and everything works just fine, so that thousands upon thousands (millions maybe?) design databases that hold this assumption valid.

    As for the example of heights:
    1. That would be min()
    2. You also don't have in the DB Christopher's height, or name. Even if you had Anna's, does that mean that min() and max() should always be null?
    3. What would be the point in such an implementation? You would get nothing useful
  • 0
    @rhubarbcode let me put it in an illustration.

    You have a parking lot with 100 places. There are only 80 cars present. When someone asks which one is the biggest car in the parking lot, do you say "neither"?

    The parking lot is the table, that has a placeholder for a value, that is sometimes absent (null). The car is a value.
Add Comment