15

Very excited to announce I've started another project I will never finish

Comments
  • 4
    Just stop acting as a human linter and it will be fine. What is the project? But seriously, program like nobody is gonna see it instead of managing high quality. First check if the project is worth it.

    With me - such stuff always starts with a domain name - here we go again! :P
  • 3
    @retoor I'm going to write an HTTP/HTML ecosystem using Zig and, later, Lua.

    I'm going to compete with PHP, .Net, and Node.JS.

    It will start with HTTP server and HTML parsers in Zig, then eventually move into a whole razor-like preprocessor language where Lua is the "guest" language (I don't really want to mess around with compiling something like zig on-the-fly for use in preprocessor statements)

    Eventually it would be nice to create an API where you're writing controllers and everything in Lua instead of Zig.

    However you and me both know it will barely get past the "creating a socket and parsing the incoming http request" phase.
  • 1
    @AlgoRythm I wouldn't worry about the server itself at all. It's more the parser for the templating. Parsing is an art. My advise is to use a lexer, char for char, no regex shit. It seems like a lot of work, but in the end, i think it will make you more happy. I wrote a regex parser and found out how crazy regex often is and how easy it is to make a mistake. Half of the time that i was debugging my parser it was the regex who was faulty. Here is a nice example of a lexer i wrote a few months ago: https://molodetz.nl/retoor/Dobre/.... Project is still in progress, goes slow. I already finished one language that you could make a http server with. It got lost, but it had some garbage collection issues anyway. Was a bit stuck.

    What do you think what are the hardest parts of your project? I think it's a cool idea to challenge those languages. Nice educational source code of an interpreter is wren. You can read it in a few hours.
  • 1
    The tsoding episode about c template rendering can also be a good resource to study. What he did was genius. I also wanted to make templating in C because I saw that generating templates was a bottleneck in my python app. But then I saw tsoding's solution and realized that I won't every beat that. He played the end game :P
  • 1
    @retoor We've had the parser discussion before. I've written HTML parser in c++.

    The hardest part will be finding time and motivation for it. I've already got my real job and a side hustle that takes up like 70 hours of my week but I want to do more challenging work with more challenging languages.

    I tried writing a Markdown parser in zig and got decently far with a very efficient design, but stopped short of a working product because Markdown is actually a pain in the FUCKING ass to parse (due to its lax rules around syntax and the importance of newlines)

    It's just a ton of work that I'll be excited about for maybe two weeks
  • 1
    @AlgoRythm you are right about markdown. I didn't want to spend time on it for a certain project and ripped from some places the markdown formatter and every formatter had an issue with corrupt markdown (crashing). I was streaming from LLM so was rendering a lot of half responses. Writing an own markdown parser is on my list. Will have the same idea as my json parser -> it handles corrupt json / extra data well so you can stream with it.
  • 1
    @retoor Markdown really is a pain. My advice to you is to treat all whitespace as significant tokens while lexing and then discard the insignificant ones while parsing. I went in with the same mentality as I had from my HTML parser ("Whitespace is almost never significant") and it really fucked me from the get-go.

    The parsing states also need to be recursive in some respects due to the fact that the Markdown syntax is indeed recursive (you can have a code block inside a multi-line quote) so you'll need to have a *stack* of states which can all talk to each other - which really blurs the lines between lexing and parsing.

    A *good* MD parser is actually incredibly difficult and I will end up giving it another go sometime in the future, but for now, I'm still feeling burned by it and I'm just gonna let it be.
  • 1
    @retoor One problem I really struggled with at the time and indeed still haven't quite cracked with regards to MD is what needed to be done at lexing level and what needed to be done at parsing level. Honestly looking back, maybe I just needed to go very basic at the lexing level and do most of the work while parsing. Unfortunately, tokens are extremely context-based and the context is extremely non-linear. For example something as simple as a "#" character has an extreme number of rules based on what the context around it is.
  • 1
    @AlgoRythm hmm, maybe it's an idea to create an intermediate format and parse that. Worth a try. Python is able known impossible to regex for example by having indents that matters. Markdown kinda same I guess. It will become a serious interpreter I guess.
  • 1
    @retoor You know what, maybe I will try again with MD instead of starting this huge time sink. I really don't have the hours to put into a .Net alternative, but the markdown parser might just scratch my itch AND I can use it in my side hustle.
  • 1
    @AlgoRythm please write it in zig so it has c bindings and I can use it 😂
  • 1
    @retoor My original plan was to make an executable that you could start up as a child process and feed characters into stdin and read them out of stdout, this way you can use it from any language that supports basic file operations. But of course it will be in Zig and it should be binary compatible with C if compiled as a static/dynamic library
  • 1
    @retoor I've just went out and had an idea by the time I got back. To solve my lexer/parser problem, why don't I just establish communication between the two?

    My regular design is to have the lexer parse tokens from text which are then passed to the parser, once at a time - but they never, ever communicate. They are two separate entities. But what if the lexer and the parser worked together to establish the context that the lexer needed to lex with? That way, the lexer can still follow primitive lexing rules but with context established by the complicated syntax that the parser can understand?

    It seems above my pay-grade but I'm going to try and prototype one over the course of this week. Maybe it could be really good and efficient and solve a lot of the "whose job is it anyways?" problems.
  • 2
    @AlgoRythm if it works with stdin, it works with any file descriptor I guess and there's no better api connection with other applications like that I guess if you want to communicate with all languages. I think it's a good idea.

    Making a straight connection between lexer and parser can be an interesting idea. Why lex first and then parse? But aren't one pass compilers doing that in general or am I completely wrong? It's possible that going from lexer to parser every time could course more communication in the end then doing both seperately but the parser has direct context if you combine the two in the collab case, if you lex all first, the parser has to discover itself what is lexed.

    It's very interesting, I can't calculate from head if it's a great idea or not. Have to see actually. Maybe @lorentz has an idea if it's a good idea to combine those two. Maybe there's a word for it. He studies languages.
  • 2
    @AlgoRythm this is what I love about interpreters, you can think think think until you can't anymore and you have to make a poc. When it comes to unfinished projects, interpreters is my #1 category. Never made one that was good quality without issues AND functional. The one I "finished" had technical debt regarding garbage collection. I thought meh, it's previous version had a great carnage collection and thought it was doable afterwards. I was so wrong. Sad that I lost the project but if I had to be honest - it came to a end anyway, it needed a rewrite. So succes in interpreter department: 0. Json interpreter doesn't count and my regex interpreter does a few things different than most, things I prefer. I made a new regex flavor. Succes in interpreter department: -1 in that case 😂 it's as appreciated as a new js framework 😂
  • 0
    Depends on the format and how masochistic you are, but unless you're parsing something with zero flexibility like regex or a deliberately highly principled data format like JSON or XML there should always be an intermediate datastructure which has less structure and more information than your final output, such as a token tree with tokens representing all the whitespace information that has any chance to be relevant depending on context.
  • 0
    In principle a single-pass parser can just be a normal N-stage parser with lazy datastructures, compilers are perfectly capable of nesting the stages inside each other so you just have to write the lazy-evaluated store once.

    The main design constraint then is that your datastructures have to be partitioned nicely so that you don't spend too much time following pointers but also don't calculate unused results.
  • 0
    Either way I think optimizing parsers for code specifically is rarely a good choice because they have other conflicting objectives and much of the work is just inherently slow if you want to do it perfectly right.

    My strategy now is to focus on good errors when writing the parser and then make sure that I never have to run it again by caching ASTs all over the place.
  • 0
    In particular, it shouldn't be possible for a code parser to hard crash or overflow the stack on any input no matter how silly. If you add a virtual stack and every possible bounds check to normal code, all of a sudden simplicity seems a lot more valuable.
  • 0
    @retoor Now you're confusing me because you always need to lex before you parse. You're typically parsing tokens which are produced by your lexer

    @lorentz my goal isn't to bring a new markdown parser to market, it's just to create one and learn/understand lexer/parsers more. Plus, as I've been saying, Markdown is actually very complicated in terms of syntax. I'm trying to do a single-pass where the lexer produces tokens that the parser is able to convert directly into an AST which can then be rendered directly into HTML by a renderer
  • 0
    @AlgoRythm I'm not sure I understand. When you say that the parser converts tokens directly into an AST, what sort of indirection are you avoiding? Are you saying that the parser can consume tokens one by one without organizing them into a structure first?
  • 0
    @lorentz you were talking about "[...] an intermediate datastructure which has less structure and more information than your final output, such as a token tree with tokens representing all the whitespace information that has any chance to be relevant depending on context."
Add Comment