Data Disinformation: the Next Big Problem

Automatic code generation LLMs like ChatGPT are capable of producing SQL snippets. Regardless of quality, those are capable of retrieving data (from prepared datasets) based on user prompts.
That data may, however, be garbage. This will lead to garbage decisions by lowly literate stakeholders.
Like with network neutrality and pii/psi ownership, we must act now to avoid yet another calamity.

Imagine a scenario where a middle-manager level illiterate barks some prompts to the corporate AI and it writes and runs an SQL query in company databases.
The AI outputs some interactive charts that show that the average worker spends 92.4 minutes on lunch daily.
The middle manager gets furious and enacts an Orwellian policy of facial recognition punch clock in the office.
Two months and millions of dollars in contractors later, and the middle manager checks the same prompt again... and the average lunch time is now 107.2 minutes!
Finally the middle manager gets a literate person to check the data... and the piece of shit SQL behind the number is sourcing from the "off-site scheduled meetings" database.
Why? because the dataset that does have the data for lunch breaks is labeled "labour board compliance 3", and the LLM thought that the metadata for the wrong dataset better matched the user's prompt.

This, given the very real world scenario of mislabeled data and LLMs' inability to understand what they are saying or accessing, and the average manager's complete data illiteracy, we might have to wrangle some actions to prepare for this type of tomfoolery.

I don't think that access restriction will save our souls here, decision-flumberers usually have the authority to overrule RACI/ACL restrictions anyway.
Making "data analysis" an AI-GMO-Free zone is laughable, that is simply not how the tech market works. Auto tools are coming to make our jobs harder and less productive, tech people!
I thought about detecting new automation-enhanced data access and visualization, and enacting awareness policies. But it would be of poor help, after a shithead middle manager gets hooked on a surreal indicator value it is nigh impossible to yank them out of it.

Gotta get this snowball rolling, we must have some idea of future AI housetraining best practices if we are to avoid a complete social-media style meltdown of data-driven processes.
Someone cares to pitch in?

  • 8
    I would rather have a strong union in the company that, upon such a decision from a manager, would just punch that manager in the face and continue beatings until the decision would be reconsidered.
  • 4
    @msdsk the good ol' frenchRev approach. can't say that it ain't satisfactory.
  • 1
    exponential growth is a rare thing in nature on a macro level. Even when it occurs, it is quickly stopped by an exponential rise in resources needed.

    We're safe.
  • 3
    @kiki problems grow exponentially, so does network complexity with very little increases in resources.
    But my point is not AI LLMs getting more powerful, it is dumber people using them in increasingly important situations. Powerful dumber people.
  • 5
    @JsonBoa You have to see ChatGPT as a jackhammer.

    It can make live/work of those who know how to use it easier.

    Give that hammer to someone who cannot use it and they will destroy something.

    And that is something teachers at school should start learning. IT should warn users and managers about the AI and the danger of using it without knowledge about the subject.

    What is next ? A bunch of doctors who got their degree using the chatgpt and then just removing stuff told by an AI ?
  • 3
    @Grumm yes, that is the problem I see on the horizon. People using hammers where a scalpel should be carefully applied.
    We need some type of guidelines to prevent it from happening, but I have no idea how to formulate or implement it.
    And yet it seems quite clear the type of problems we are to brace for without some framework for AI-usage etiquette.
    For organic stupidity is a real threat that is being enhanced by artificial intelligence.
  • 2
    @msdsk That would be an extreme example, but 99% of the time errors and consequences are much more subtle, and can take years to become apparent. There's a good chance that whoever made the mistake won't be with the company anymore by the time the mistake becomes apparent.

    Another thing is the "garbage in, garbage out" rule, i.e. if the original data is noisy and random, then even the best effort insight on such data will also be noisy and random. When even your best effort insights are of limited value, there's pretty much 50-50 chance some random AI code will seemingly outperform it and you won't notice until something goes completely wrong.
  • 0
    What is the difference between AI and clueless junior in this scenario?
  • 2
    @Demolishun If you buy a cheap shitty car and it runs "fine", you'll be happy because you got a functioning car for cheap. If you buy a significantly discounted high-end car and it doesn't run perfectly, you'll go ape shit even though it's still way better than anything else you could get for the same price. Similarly, a clueless junior will get much more scrutiny for even the tiniest mistakes, while some half-decent code generated by an AI model will be "fine" as long as things don't break too much.
  • 0
    @JsonBoa no they don't. Going from one petaflop to two is way harder than going from one teraflop to two.
  • 1
    That's why unions should dispense beatings to any manager who, unprompted, would dare to interfere in the development process. In the time of Garbage Data democratic process becomes even more important.
  • 3
    AI is like fire: a good servant but bad if it happens inside your colon.
  • 3
    @hitko an other big problem (based on the ChatGPT) is that the AI will give you the answer as if it is true.

    The AI will not doubt about what it will give you. A human junior will hesitate or question about something he doesn't know.

    The Chatbot doesn't do that.
  • 1
    @Demolishun plenty of people have replied, but the subject here is the automation of poor data sourcing.
    An inexperienced intern can make but a few mistakes and takes some days doing it, all while some seniors take notice and can intervene.
    A shitty SQL/BI made by an AI allows a shitty boss to get from prompt to stupid decisions in minutes, and undetected.
Add Comment