parser

Ranter

typosaurus

10743

Comments

4

typosaurus

10743

275d

The advantage above a websocket is, is that you don't have to know the length of the json object and you can full stream, not preparing the whole data in memory client side what sucks if you query a huge resultset from the server. My memory usage went down by 99% server side or so
3

typosaurus

10743

275d

I do, while writing realize that I've could've splitted the json request by a byte that isn't allowed in json and chunked it that way. It would be even faster. But at least my protocol ensures valid data
4

Lensflare

19155

275d

I like the idea.
If I may propose a name:
rejsoor
3

typosaurus

10743

275d

@Lensflare sadly, the name of the parser is rliza. I just gave it the name because it wasn't supposed to be a json parser at the beginning, it was supposed to make json with benefits so didn't want to call it json(++). Rliza is just a brainfart. I didn't take the project so serious to think about a serious name and now it's used in thee other side projects already. A db server, a pubsub server and a python client
2

Chewbanacas

773

275d

Hell yea! Sounds awesome, got a link to the git repo? What about fault tolerance?
4

typosaurus

10743

275d

@Chewbanacas I say to much weird shit on this platform to share my github with full name 😂 A bit sad tho, I've around 50 repost of my own, no forks. I'm a mass producer.

What do you mean with fault tolerance in this case? If the json is wrong, it just expects another chunk of data to become valid. It doesn't block wait. It snoops 4096b from your json and hops to the next client to snoop from. It's not literally waiting for the next chunk. I use select() system call in a loop to check if a socket is readable, and if it is, I start his read function again that still remembers the state / already received data. Even with a buffer of 5b, it works perfectly, that means every 5b its the turn for next client. I'll do some performance testing with different buffer sizes for fun but I'm sure 4096 is just the best since messages are mostly smaller than that so it only requires one validation. With 5b, you're validating like a maniac
2

Lensflare

19155

275d

@retoor but you already shared your youtube channel 😅
2

typosaurus

10743

275d

@jestdotty you must read the values anyway. You can't read keys without reading value. If you would extract keys first, you have to go second time for the values. To get all values, you have to read all keys as well. So you'll end up literally doing everything twice. Running forward and peek as less as possible is the most efficient way to parse
3

typosaurus

10743

275d

@Lensflare I wonder how much views my video got because the advertisements here. I think most people who take the effort to watch my profile will watch it - why not, you were already interested enough to watch profile. So a video of me would be even better. Maybe I should change it to just "video" of me instead of "bikini video". I think in this case just video would work better. The "bikini" part would make some people think to see smth inappropriate so they don't click
2

typosaurus

10743

275d

Here's someone who made a json parser in python. The performance difference is almost factor 40: https://pypi.org/project/...

I will check performance of my parser against the python one. I think my parser can't be faster but we'll see
2

typosaurus

10743

274d

@jestdotty I'm parsing directly, I don't have to read until 'a' bracket. It stops automatically if end of content is reached and returns NULL or expected end is met and returns object. Whatever you do, in the end you must always touch every bite, the art is touching them as less possible. But it's impossible to skip parts. If you parse a json file of 1000 bytes, you'll have 1000 iterations at least (I have), the other time is spend on checks and duplicating data. I copy content to object so the calling function don't have to remember and free the resource. Faster way would be remembering the json and remember the positions of keys and values and read from the big string every time a property is requisted. You could consider this lazy. You still have to duplicate data every time a string is requested since you need to add a \0 terminator to the end
3

Chewbanacas

773

274d

@retoor DM it to me
5

Hazarth

9139

274d

Nice! Reminds of XML streaming used by XMPP except there you shouldn't close the stream tag, you keep it open and just stream sub objects into it, which on the other side is ofc parsed as perfectly valid XML. To end the communication stream you just close the </stream> tag and that's the end. Cool tech in general when you want to avoid the sync and overhead of http and other protocols.
1

Lensflare

19155

274d

@Demolishun Deutsche Mark
1

netikras

34570

274d

So a sequential json parser? I had one made for xml before I started my career. Hell of a fun!
There's one drawback though... Validation :) you won't be able to discard such a payload if it appears corrupted down the line, eg dupe keys
2

typosaurus

10743

274d

@netikras yes, it's a sequential parser indeed. Why would data be corrupted like dupe keys when using tcp? Or you mean the keys contained by the xml so you can't insert into database? Mine doesn't accept data that isn't expected, but half is ok. So {}{ is valid because its valid so far possible.

And yes! It's super fun. Did you do http chunking? I did that before but it's only one direction. Or wait, I realize that that doesn't have to be. Why did I think that, you can accept a request stream and get a request stream response. I could use official http client anyway and my database server still uses the chunking as response. I could just steal it from that. But what's the win, current solution uses lessest resources. It cost bit more cpu than the http chunking since many false validation but since network is kinda slow it doesn't matter. It easily can validate and server doesn't use cpu noticeable at all
1

typosaurus

10743

274d

@Hazarth both xml and json are not very efficient streaming protocols. A 4 is four bytes binary else the size while it also could've been two bytes; identifier star byte and value bytes until end byte. 100 would be 102 bytes then instead of 400.

But yes, that's exactly what I've built. Xml on the other hand hads a real end indeed, I would never know if it's finishes unless the connection gets broken in valid way.

My pubsub will be used for chat communication synchronization between multiple site instances. Sessions go trough the database and that's fine. I've written also an sqlite3 rest api server. I've build a session storage adapter for existing aiohttp_session library and it works great. My web app runs on three ports after boot and are all perfectly in sync using my own tech. I did use agressive polling as sync method before but wasn't snappy enough for chat + cost a lot of resource client side. Even idle. My current sync now 0% cpu with a few chatting clients
1

netikras

34570

274d

@retoor suppose you're ingesting an array. The first item is a string, the second one - an object. In the same array. I think spec doesn't allow that, ie a corrupted json

an object can only have different keys. If you're parsing an object which has two 'name' keys - it's an invalid json, and you won't catch it w/o prereading it all for validation.
1

netikras

34570

274d

@Demolishun good point, thx. The object key restriction still stands, I think
1

typosaurus

10743

274d

@netikras if it would have double keys, it will just pick first value. Different types in array is allowed. I have database field values and they can be any type. They're in a rows array [["a", true, 3.33]]. Making a nice dict of the rows and columns is work for client. As long syntax is correct, it won't crash. Values don't matter. Its a user problem, not a protocol issue
0

typosaurus

10743

274d

Edit: @Demolishun already checked the array statement
0

typosaurus

10743

274d

@Demolishun we're checking stuff with a linter, but a formatter would allow double variable names too. I also think that by spec double keys are not allowed
1

typosaurus

10743

274d

This seems to be json logo, never seen before
0

typosaurus

10743

274d

@Demolishun / @netikras I would've also expected that double names would be forbidden because it origins from a js object that doesn't support double keys. But it's allowed in the notation at least. Prolly not by the parser of js
0

typosaurus

10743

274d

@chatgpt is only the word true or null or a number considered valid json with no curly braces or array brackets around it?
1

chatgpt

605

274d

@retoor
No, a standalone word, true or null, or a number is not considered valid JSON without curly braces or array brackets. In JSON, data should be structured using key-value pairs enclosed in curly braces or arrays.
0

typosaurus

10743

274d

Meh, I support it anyway. I don't know why logically you would make a difference of an obj, array or another type. An obj or array has a closing tag, but in case of a value, it is the closing that. Both would work with a \0 terminator. My protocol would see truetruetrue as three records of true. 3.003.00 would go fine if \0 terminated by client, what should be the case. \0 terminating is not required for json l, after} is just end if parsing but for my protocol it would be if I want to support 3.003.00. I would like to be able to send just true as response
1

netikras

34570

274d

@retoor I think chatgpt is outdated

https://datatracker.ietf.org/doc/...
1

typosaurus

10743

274d

@netikras it's Wikipedia. I don't understand what you're trying to tell with your page. That true as only word IS valid json?
2

netikras

34570

274d

@retoor Yes. According to RFC and a few BNFs I found online, the following are all valid JSONs:

true

false

null

[]

{}

17

"hello"

{"hello":"world"}

[false]
2

typosaurus

10743

274d

@netikras ah cool, I already thought "why not?". After all, it seems that my protocol handles the json syntax after all. I'm also thinking about a binary json format its possible as long you know the full length until how far it can parse so it can read over \0. In that case I can insert binary blobs in db

Related Rants

Add Comment

random

python

stream

extending

json protocol