13
donuts
5y

Random question that apparently screwed up even Google...

how does an IDE evaluate (and format) code and errors so quickly?

Was thinking maybe building a compiler/programming language would be the best way to learn/practice algorithm... Is it?

Comments
  • 3
    I actually want to know like exactly how they detect errors, because I have an idea for basically a rubber duck that'll fucking glow red if my code has any errors. Obviously I'd need to know how to detect errors if I want to to do it though.
  • 1
    @24th-Dragon so how do you code one and how does it work/parse/identify issues so quickly?
  • 1
    @M1sf3t hmm... It's in C... And we'll lots of pieces. Sorta want to know how IDEs can parse and highlight, understand, format the code so quickly.
  • 9
    Its basically the first half of a compiler, the parser that reads the code and generates an ast (abstract syntax tree).

    When doing this it can detect errors and depending on how good it is and if it has rules for how to recover after the first error, for example, restart with next line or word until it finds something that seems like valid structure, it can detect multiple errors.

    On top of that it can use rules on the tree to detect things like uninitialized variables by walking the tree and checking if it finds any usage before setting and so on.

    Its not trivial but also not magic :)

    Also, the parsing categorizes all elements which can be used for highlighting.

    Some simple highlighting can even be done using simple pattern matching but thats rarely enough by todays standards.
  • 0
    @Voxera yes I was thinking about a parser as well at first. How would a program check if some text is valid html, json, etc. And parse it into a tree... So quickly. Have to match all the closing Tags and locate and parse all the children which also have children...
  • 0
    @24th-Dragon well then the question would be how does lint work then. It has to understand and validate the text.
  • 3
    @billgates computers are fast ;)

    I have built my own scripting language using a handwritten lexer and parser and it parses a page of texts in a few milliseconds.

    Its not as powerful when it comes to error reporting, it basically just returns the first but for my purposes its enough.

    I also know that IDE’s usually use some extra tweaks to avoid reparsing all text, it just reparses the current statement or line depending in language.

    And it also often can highlight just from the lexer which is much simpler and works on each token.
  • 3
    A lexer “tokenizes” the input into words and operators (+-. and such).

    The parser looks for patterns like

    An ‘if’ should be followed by an parentesed expression then a body.

    An expression ..

    And so on.

    An add expression is a pair of number or variable(identifier) with a + sign in the middle.

    Any whitespace is usually removed by the lexer.
  • 0
    @Voxera yes I googling it. So I guess taking html/xml as an example. First it just tokenizes everything in a flat way. And then based on the ordering of each token "sorts" the tokens into a parent/child structure?
  • 2
    @billgates more or less but html has a few extra problems.

    First its not as strictly structured with multiple elements that can be either self closing or paired with content, like the p tag.

    Also, many pages has broken html with tags out of order or missing so there are a whole set of common practices that html parsers use to repair and clean up.

    Also its not just html but css and javascript.

    And on top of that unstructured text which presents its own set of problems.

    So I would recommend using something like htmlagilitypack or similar for parsing html.

    Or use it to repair the document before parsing it with your own so that you can assume its all valid and well structured.

    I use it and then pull out the tree half way ;) then apply my parser to all attribute values and pure text.

    My old one just ignored any html but I still had to handle unstructured text and that is best handled early on with a set of good separators around your code, thats why most template libraries use {{ }} or [[ ]].
  • 0
    @Voxera yes i usually use parsers like htmlagility, gson, json .net but sorta had a thought this maybe something interesting to explore, learn. Sorta thinking what's separating me and those Google/CS guys... What exactly can they do intuitively that I can't?
  • 0
    Sorta thinking maybe this is the way for me to Ken algo D's. I don't like toy problems but if I'm actually building something that's usable and needs this stuff there I'll learn it because I need to rather then just to answer some stupid technical interview question
  • 2
    @billgates go for it, it’s an interesting set of problems with many layers of improvements possible :)
  • 1
    The first step is define the grammar of the language (semantics).
    Then you build a parser; you can build one from scratch yourself or use a generator like gnu bison.
    Then, once you have your source code parsed and stored you can write rules on it to search for common error, that is the basis of a linter.

    If you want the good stuff then there's static analysis that adds a whole lot of theory behind and allows you to find bugs that you can not describe with a syntactic rule.

    It's a very interesting field in my opinion and is quite important when you're writing critical code, like the one for planes, spacecrafts or very big high quality project, although it can be much helpful even on smaller projects. Take a look at infer from Facebook, for example.
  • 0
    @Voxera ++ for actually answering the question :)
  • 1
    You would be served well by a book that covers the subject of programming language translation.
  • 0
    @M1sf3t I never the source. code/project structure is hard to go through and I couldn't locate what I was looking for (a parser/linter).

    I tried learning C++ but had sort of the same issue. I had no use for it since whatever I wanted to code, I could already do in other languages I already know.
  • 2
    @billgates may I recommend the Compilers book by Aho and Ullman? It answers stuff like this (in addition to @Voxera 's excellent answers).
  • 1
    @M1sf3t heh
    Well, I would recommend doing that then, because there's a lot of knowledge in these books that just isn't available otherwise. Your average internet tutorial or whatever doesn't even come close to the depth and rigour these books reach. They can be pretty hardcore though so expect to budget a fair amount of time.

    Just putting it here in case anyone's interested, other good books on this topic would be Formal Languages and Automata Theory by Linz (the theoretical framework behind grammars etc.), Engineering a Compiler by Cooper & Torczon (modern compiler techniques), and Linkers and Loaders by Levine (old but pretty awesome).
Add Comment