5

A bit longer rant, somehow triggered by the end of this rant:

https://devrant.com/rants/7145365/...

The discussion revolved around strpos returning false or a positive integer.

Instead of an Option or a Exception.

I said I'm a sucker for exception, but I'm also a sucker for typing.

Which is something most languages lack - except the lower level ones like C / C++.

I always loved languages which have unsigned and signed types.

There, I said it... :) I know that signed / unsigned is controversial, Google immediately leads to blog entries screaming bloody murder because unsigned can overflow – or underflow, if someone tries to use a -1on an unsigned integer.

Note that my love is only meant for numeric types, unsigned / signed char is ... a whole can of insanity on its own.

https://phoronix.com/news/...

If you wanna know more.

Back to the strpos problem, now with my secret love exposed:

strpos works on a single string, where a string is a sequence of chars starting with 0.

0 is a positive integer.

In case the needle (char that should be looked up in the string) cannot be found in the haystack (the string), PHP returns "false".

This leads to the necessity of explicitly checking the type as "0" (beginning of string, a string position)... So strpos !== false.

PHP interprets 0 as false, any other integer value is true.

In the discussion, the suggestion came up to return -1 if a value could not be found – which some languages do, for example Scala.

Now I said I have a love for unsigned & signed integers vs. just signed integers...

Can you guess why the -1 bothers me very much?

Because it's a value that's illogical.

A search in a sequence that is indexed by 0 can only have 0 or more elements, not less than zero elements.

-1 refers to a position in the sequence that *cannot* exist.

Which is - of course - the reason -1 was chosen as a return value for false, but it still annoys me.

An unsigned integer with an exception would be my love as a return value, mostly because an unsigned integer represents the return value *best*. After all, the sequence can only return a value of 0 ... X.

*sigh*

Yes, I know I'm weird.

I'm also missing unsigned in Postgres, which was more or less not implemented because it's not in the SQL standard...

*sob*

Comments
  • 2
    So my 2 cents about this here instead of the other thread:

    False

    The functions return type is int|false. So it's kind of clear for me and my ide what is happening. So I don't have much of a problem here tbh. But it's semantically weird that a function might return false, but not true, yes, admittedly.

    Exception raised

    I get the point but i would not find it very practical. I don't know if i would like to wrap all usages of strpos and the like in try/catch blocks. If blocks are more flexible, so I could put
    If (($str == "blockedem@mail") or (strpos($str, '@')) {...
    in one block instead of two.

    -1

    I think this is way worse than false tbh.
    1. There won't be a type error in stativ checking when i forget to check for the case and continue with value as int. The bug would be only seen in runtime.
    2. The return type would be just 'int' and I could not guess what happens when the string is not found and have to consult the docs. int|false or int|null are kind of obvious in comparism.

    Null
    I think null would have been best, yes.
  • 5
    I agree with everything except for throwing exceptions/errors.

    Semantically, it’s not an error. The function succeeded and found nothing. So it shouldn’t throw an error.

    Also, handling thrown exceptions is syntactically expensive in most languages.
  • 1
    Yeah, I agree with @Lensflare that it is not an exceptional situation by any means. It's a perfectly valid outcome.

    In C it returns -1 because it has to return something that as you said, is unambiguous, and must still be an int. (And you have no exceptions anyway).

    I guess php devs rationale was along your lines, but the pitfall lies instead in implicit conversions.

    A C compiler would warn about such a situation, and a php linter should too. That is why explicit comparison (===) exists.

    In modern c++, if it weren't because it would break C compatibility, and that is a bigger can of worms, I'd fully support strpos returning std::optional<unsigned int>.

    As for the unsigned/signed fiasco, it's the same with all lower level languages. Great power comes with great responsibility. The army teaches you how to hold your rifle so you don't shoot yourself in the foot but some klutzs still do 😂
  • 1
    @CoreFusionX You do have exceptions in C, sort of. setjmp / longjmp. Also, the C strstr function returns the position of a match as pointer, not as offset, so a nullpointer for "not found" makes sense.
  • 1
    The problem is that the return value indicates two different things in a single value: was the string in the string, and if yes, where was it.
  • 1
    @electrineer Which wouldn’t be a problem if it was properly typed with an algebraic sum type like optional.
  • 1
    @Lensflare I find myself in the slightly uncomfortable position of having disagreed with someone who understands the issues entirely! Most PHP devs get/got familiar with the strpos === (or !==) false thing early on, shortly after going looking for a
    'contains' function. Anyway, there *is* now a str_contains function, in PHP 8, which ofc returns true or false. The strpos workaround should drop out of use.
  • 0
    I see your point. I agree, false and 0 are the exact same thing. But i don't think throwing a exception with stack unwinding is a good solution.

    Exception are bad, especially for this situation. It isn't a error situation. Some languages and platforms don't have exception handling by stack unwinding or are hard to implement. They are also expensive on most other languages since the compiler needs to make sure heap allocations get freed.

    Instead of returning a integer, why not returning a pointer? Return NULL when not found. This is done in strchr() and strstr().
  • 1
    @Fast-Nop Not all platforms support stack unwinding and it is hard to add this to the standard library since you have a lot of additional work and overhead. (Creating the stack frame copy, passing it to the functions, make sure all allocs get freed by hand, ...)
  • 0
    @happygimp0 setjmp/longjmp are part of the C standard, so most environments should support it. That said, it's exceedingly rare that I have ever seen this being actually used.
  • 2
    I can understand most of the opinions here and agree, at least partially.

    Now just because I'm half assed drunk and enjoy having a non primal discussion (as in an intellectual discussion vs the usual screaming apes throwing shit on work)...

    Exceptions are not errors. At least not in the strictest sense.

    I would go so far to say it's a common misconception to think of an exception as an error.

    Java is interestingly the language where there are explicitly two different classes: Error and Exception, plus the superclass Throwable.

    Error - and thus implicitly Throwable, as it is the super class - are meant for Errors as in "shits burning… terminate".

    Exception just means that something unexpected happened....

    Given that strpos should return the position of the needle in the haystack, it is unexpected to not find it.

    Again: unexpected. not an error.

    I guess I'm walking a very thin line here, can understand the performance argument...

    Never the less... Returning unsigned and an exception if it wasn't found makes most sense to me.

    @electrineer summarized it pretty nicely.

    If it should return a value, I think I would definitely be a friend of an Optional.

    NULL is in most languages a very "fishy" thing.

    Interesting nitbit by the way

    https://github.com/php/php-src/...

    Had to look it up it just poked my curiosity too much... XD
  • 0
    @IntrusionCM Then I suppose all strpos functions should be renamed to maybeStrpos to denote that finding a substring is equally expected as not finding it (which is how most people treat strpos anyway). But that's just nit-picking about pointless semantics, which is utterly irrelevant to the problem at hand.
  • 0
    NULL, for me, has always been 'an absence of information'. You can't infer anything from it - other than precisely that, that you have no information?
  • 1
    @spongegeoff null isn’t the absence of information. It’s the information that there is no value.
  • 1
    @spongegeoff

    Exactly. But it depends on how the language treats it.

    E.g. in PHP

    var_export(null == 0);

    will return true.

    Yup.

    Or in C++ nullptr vs NULL.

    Java gained optionals cause the whole null pointer exception debacle - with Valhalla it comes to a point to declaring a variable strictly non null hopefully.

    Null has its purpose - yes. But the purpose was misused in many languages like exceptions... Sadly.

    What @hitko said summarizes imho the problem completely.

    He suggest renaming the function to maybeStrpos, but that's not the point of the discussion at all.

    Reason I wrote the rant is because this is one topic which I really really love.

    Error and exception handling and conveying meaning in a function is an underappreciated art imho.

    It's completely valid that a function say: "oops that didn't work out well".

    But as it's obvious from the rant, the "how" is a really interesting point, cause languages differ vastly and the interpretation of things differ vastly, too.

    I like these kinds of discussions where it's less of a "yes" or "no"...

    It might seem pointless / strictly philosophical discussion, but imho it's not. It just shows that such a trivial thing isn't as trivial as one might think.

    I really loved the different inputs here, despite me having another opinion.

    After all, that's the (true) purpose of the discussion - to see whats possible, gain some knowledge about yet unknown things and crawl the curiosity's chin.

    If there's one thing I know for sure after years of programming ... Error handling is one of the seemingly easiest topics...
    Yet it's complexity and nuances - and sadly it's misimplementations - always prove this point wrong. It's the hardest thing.
  • 0
    @IntrusionCM I disagree 😅
    Exception IS an error. And often it is abused for control flow. Even in standard libs.

    Afaik, Java has only Exception but there are checked and unchecked exceptions. Are you sure that the unchecked ones are called Errors?

    In C++, everything can be thrown.
    In C# only Exceptions can be thrown.
    In Swift only Errors can be thrown.
    So for me, Exception and Error is the same thing with different names.

    There is nothing exceptional about the function not finding a position. And in the example from the other rant the function was specifically used to find out if there is a position. So it was a completely expected and non exceptional result.

    (This is the answer to your second last comment 😄)
  • 1
    @Lensflare
    https://docs.oracle.com/javase/8/...

    Vs

    https://docs.oracle.com/javase/8/...

    Checked Exception vs Unchecked is yet another differentiation.

    Reason I mentioned Throwable and Error is because it's a common code smell to catch Throwable explicitly - cause it can include things that should *never* be catched regularly.

    And yeah... Misusing exceptions for control flow is muey bad.
  • 1
    @IntrusionCM Thanks. I must have forgotten that part about Java. It’s been really too long since I used it 😅

    Btw, I also love that kind of discussions.
  • 0
    @Fast-Nop Some of the software i write are written for platforms that don't support stack unwinding because of hardware limitations (stack is not accessible except for the call and ret instructions, variables can not be placed on stack).

    Don't forget strchr() and strstr() are standard functions, that should work on any platform.
  • 0
    @happygimp0 Yeah, e.g. the Keil compiler for 8051 only puts the addresses on the call stack, but maps local variables to the global address space. Still, that allows stack unwinding in that sense. Mostly at least, though local variables and function arguments have to be declared with volatile if they are meant to be restored.

    I'm not advocating setjmp/longjmp for strstr and strchr, these have the nullpointer return mechanism instead anyways. The setjmp/longjmp remark was just to point out that C does have some sort of exception mechanism. It's just rarely used because in most cases, it makes the control flow difficult to understand.
  • 0
    @IntrusionCM The thing is that errors and exceptions aren't something computers would be aware of. The computer will by definition always succeed at performing a task; then, someone will have to look at the result and decide whether it is a failure or no, and what to do about it. It's ultimately a matter of deciding where those checks are performed - at the hardware level (e.g. overflow flag in register), in the compiled code, within some function, or whether to just pass the result all the way to the top and let the end user decide what to do with it.

    Errors and exceptions are just a special way of passing information about those checks across the code. Some checks have to be performed at the OS level to prevent unauthorised access, some should be added by the compiler to provide memory safety or other intended features, but a lot of them are in this grey zone where it's only a matter of who, when, and how decides to do it depending on what they're trying to achieve.
  • 0
    @Fast-Nop

    Exception mechanisms are way more convoluted than just longjmp.

    Also, with modern compilers, at least c++ exceptions have no runtime cost if they aren't actually thrown.

    @IntrusionCM

    C++ doesn't suffer from implicit conversion from nullptr to int. That's precisely why nullptr was introduced.
  • 1
    @Lensflare re NULL being 'the information that there is no value' - I have to say that feels like a good definition, certainly fits the case where you've been to the database and found that a value has not been set. I don't think it's a good match to 'that substring/char does not have a position in the string' though. NULL would be an appropriate state before the test is done, but false seems more appropriate after? You're right though; knowing that a value is unset is information, not an absence of information. Perhaps leaves us needing an ULTRANULL, meaning 'we don't even know if the value is set or not - it might be', for initialisation?
  • 0
    @spongegeoff Not set is not set and not NULL. Not set means a undetermined value and accessing with anything else than a char or a integer without padding causes UB.

    NULL means no object. In the case of strchr() and strstr() a pointer to the first matching object is returned. If there is no matching object, what would better choice than NULL? But since NULL, 0 and false is the same, except the cast to (void*) and therefore forbidding any arithmetic with NULL, they are equal. (at least after returning since then you can only have a single type).
  • 0
    Other solution would be: Return if it was found or not and use a pointer passed as a pointer (pointer to pointer) to inform the caller about the position.

    bool getCharPosition(const char *string, int characterToSearch, cost char **position)

    {

    do

    {

    if(*((unsigned char*)string)==characterToSearch)

    { *position=string; return 1; }

    }

    while(*string++)

    return 0;

    }

    Not a good solution
  • 0
    @HappyGimp "since NULL, 0 and false is the same..."
    Er...
  • 0
    @spongegeoff Yes. false is literaly a integer constant with the value 0. So false and 0 are the same. For NULL, it is 0 cast to a pointer, most likely (always?) a void pointer. But since you can only return one type, there is no difference between 0, NULL or false.The code return 0; return NULL; and return false; all generate mostly like exactly the same machine code.

    The only difference is that you can do 0+5 and false+5 (both resulting in a int with the value 5) but not NULL+5.
  • 1
    @happygimp0

    This only holds in C though.

    In c++ already false is bool (even if implicitly convertible) and null is (rightly so) superseded by nullptr_t.

    In all higher level languages your statement simply does not hold outside of boolean contexts with, again, implicit conversions.
  • 0
    @CoreFusionX Almost every architecture has 0==NULL==false on a assembly level.
  • 0
    @happygimp0 higher level constructs like optionals with null don‘t necessarily translate to 0 in assembly. Some languages use algebraic types for those kind of abstractions.
Add Comment