12

The more I work with performance, the less I like generated queries (incl. ORM-driven generators).

Like this other team came to me complaining that some query takes >3minutes to execute (an OLTP qry) and the HTTP timeout is 60 seconds, so.... there's a problem.

Sure, a simple explain analyze suggests that some UIDPK index is queried repeatedly for ~1M times (the qry plan was generated for 300k expected invocations), each Index Scan lasts for 0.15ms. So there you go.. Ofc I'd really like to see more decimal zeroes, rather than just 0.15, but still..

Rewriting the query with a CTE cut down the execution time to pathetic 0.04sec (40ms) w/o any loops in the plan.

I suggest that change to the team and I am responded a big fat NO - they cannot make any query changes since they don't have any control on their queries

....
*sigh*
....
*sigh*
but down to 0.04sec from 3+ minutes....
*sigh*

alright, let's try to VACUUM ANALYZE, although I doubt this will be of any help. IDK what I'll do if that doesn't change the execution plan :/ Prolly suggest finding a DBA (which they won't, as the client has no € for a DBA).

All this because developers, the very people sho should have COMPLETE control over the product's code, have no control over the SQLs.

This sucks!

Comments
  • 1
    Why did they even ask then?
  • 3
    @N00bPancakes probably to make some magic and run the

    ALTER TABLE products SET ( boost_performance = true )

    or add the
    -XX:+PleaseRunFaster=true

    :shrug:
  • 1
    @netikras

    That makes sense.

    Sounds like they've reached the edge of their ORM / company polices envelope there.
  • 2
    The day I became the captain now, is the day FatOrms went away at our company.
  • 2
    It's the broblem ignored by many developers. When you need to improve performance usually all you can do is add indexes to make generated queries faster
  • 1
    @Gxost aaand when the indices become too slow and you don't see a right way to tickle the db so it would change the exec plan - .... The end?

    Or perhaps there IS a way to ask postgre to change its exec plan? I like how Ora OEM does - its analyser can suggest a dba a few exec plans to select from. Not sure how that's done in postgre
  • 1
    @netikras you can - but only with extensions

    https://wiki.postgresql.org/wiki/...

    With ORMs and autogenerated SQL statements it can be fickle.

    Next level: auto rewriting with an proxy
  • 1
    @netikras In EF Core it's possible to execute SQL, but it's not allowed to get related data using JOIN. Anyway, raw SQL is the least preferred solution because LINQ is always checked by compiler and SQL is not
  • 0
    @Gxost I get the feeling that when you're working with a large amount of entries, type checking might not pay off what LINQ costs.
  • 0
    My experience with MS is pretty low.

    Mostly because MS is a red flag to me.

    But I've seen some LINQ stuff on devRant...

    And it is like it always is: Yes, to a certain point it's useful.

    But: Raw SQL must be an option. This is not for debate.

    You can test SQL. You can seed a database, even fuzz it and do tests.

    And till now I've never seen an ORM who makes it impossible to run an raw SQL query.

    Compiler checking sounds great.

    Except it is not. The compiler can check if it's an valid SQL and maybe run some static checks.

    But only a human can see the complexity of eg different input's for search queries, compute resulting rows and put all that together to generate an optimized SQL query.

    From experience, it's sometimes impossible to get an ORM to produce the SQL you need, unless you force choke the inner guts, which makes version upgrades a PITA.

    Raw SQL should always be an option.
  • 0
    Entity framework can get you pretty far but you need to understand what queries it's coming up with and why. So you can't rely on it as an abstraction for all that nasty sql or relational databases in general.

    But here's the problem with just using sql: The sql you execute is often determined at runtime. So *you* don't write sql. You write a program that generates sql at runtime. And that is all kinds of scary. I'm not talking about just shoving a parameter into a where clause. I'm talking about dynamically determining group bys and havings and wheres and fucking froms and mother fucking joins.

    I've seen things, man. I've worked on long lived projects that have sent invalid queries to their database. So just go fix the sql, right? Wrong. The query isn't in the source code. There's no one place where the query is. This generator function calls generator functions which call generator functions.

    (continued)
  • 0
    So just use stored procedures, right? Wrong. That doesn't magic away the need for dynamic sql. You're just gonna end up using sql to generate sql.

    So yeah, maybe it's not the worst thing to use a library to generate sql rather than try to roll your own. Ya know, some kind of object-relational mapper you might call it.

    At the end of the day, there is no perfect solution for connecting your application logic to your relational database. If you ever say "just use X" you're wrong. I don't care what X is.
  • 1
    @M3m35terJ05h If anything in your query that isn't a value comes from anywhere other than a compile-time constant, you fucked up. The point of a relational database is to solidify the structure. By dynamically determining fields or tables you're screwing up this structure.
  • 0
    @Lor-inc The complete list of column names shouldn't come from somewhere obscure. It should be a rather obvious array somewhere very close to the generator function for the specific operator for the specific table.
  • 0
    Table names shouldn't even be selectable, as in a relational database tables with overlapping schemas are merely a coincidence.
  • 1
    When performance starts to matter, it doesn't matter how much you fine tune it, at some point your going to end up removing it and going raw.

    This can be said for almost any thing.
  • 0
    If you get any more complex than that, you need to either get an ORM or create views or generated tables or whatever your DBMS calls them.
  • 0
    @Lor-inc Have you ever seen an advanced search screen? How do you think all those optional filters and sorts work?

    Or maybe a generated report with all kinds of configuration options?

    All of what I mentioned would be a massive fuck up for simple crud. But simple crud isn't what I was talking about
  • 1
    @M3m35terJ05h I didn't consider the kind of advanced search that's basically a GUI over SQL, I assumed we were talking about things you do with an ORM, i.e. complex queries behind simple actions. I don't think LINQ can even be used to dynamically specify columns, as you do that with C# and the SQL is generated at compile time.
  • 1
    @Lor-inc Entity framework won't generate a query until it needs to. It'll convert your lambda expressions into *something* at compile time, but it doesn't know what sql it needs generate until it's time to generate it.

    You can do something like:

    var query = context.Foos;
    if (request.HasBarFilter)
    {
    var value = request.BarFilter;
    query = query.Where(foo => foo.Bar > value);
    }
    return query.ToList(); // database call here
  • 1
    @M3m35terJ05h Still I don't think dynamically specifying columns is possible. The Reflection API would have to be deeply interconnected with LINQ, with a lot of tricky optimization on top.
  • 3
    @Lor-inc What you can do is request the whole entity of the "main" table you're querying and then dynamically tell it which of its relationships to include.

    Like if Bar has a one to many relationship with Foo, you're querying Foo and the user might sometimes request data from Bar, you can have say:

    var query = context.Foos:
    if (request.IncludeBar)
    query = query.Include(f => f.Bar);

    and that's a join that only happens sometimes. Entity framework will go and modify the select to include all the columns from Bar. You'll get a sequence of Foos and their Bar will be null or an object depending on if you ran that include. You're probably getting more columns back than you need but that's probably not the worst thing in the world for something like a report or search.

    Of course you can narrow the select to specific columns but you do have to know what those columns are at compile time.
  • 1
    @Lor-inc I haven't dealt with really huge databases, but on my generated test database SQL generated by LINQ performed well. With million records and a lot of related entities it worked as fast as with less than a hundred records. Sure, I needed to add indexes, but in result it worked fine. On MS SQL Server Express. Tried the same setup on MS SQL Server enterprise and it worked faster, even without indexes.
  • 2
    @Gxost @M3m35terJ05h
    If a company uses EF or hibernate, the dbas/devs with access should periodically do an evaluation of the queries that are being sent as a sanity check. Issues with buffered round tripping and deletes, and unintentional n+1 are well known.

    https://thinktecture.com/en/...
  • 1
    @SortOfTested Though generated queries must be checked time after time the mentioned issue is irrelevant nowadays. EF Core 3 translates single LINQ query to a single SQL query: https://docs.microsoft.com/en-us/...
  • 1
    @Lor-inc AutoMapper's ProjectTo allows to get only columns required for mapping. Since LINQ works with IQueryable<T> it's possible to add any conditions and includes dynamically, and with custom extension methods for LINQ such query may look pretty good.
  • 1
    @Gxost
    Grouping and lazy loading are still broken, however. There's more than 1 n+1 issue.

    I'm not crazy about using automapper for this use case. It has a fair amount of overhead, so using it in place of a select statement, or dapper, is leaving a lot of perf on the table for a boilerplate that can be generated by ides.
Add Comment