filesystem

Ranter

webketje

2163

Comments

3

IntrusionCM

13947

4y

"Without reading them in memory at once" - guess you mean reading the whole file to memory?

Usually you have low level implementations that works on an stream that gets tokenized.

Tokenized as in "reading char by char", while using a stack or similar structure to keep only the current token alive and the necessary information for validation.

E.g.
http://tutorials.jenkov.com/java-js...

Modifying gets trickier - but not impossible - tricky part is removing _nested_ structures, as you have to forward the input stream to skip till the end of the nested structure... All other content can be written from input stream to e.g. a file token by token.
1

JsonBoa

3048

4y

You were right when guessing that you would have to operate the file in chunks that are not by themselves valid JSON.

However, some string manipulations could be implemented by a seek-pointer, reading overlapping windows of several chars per chunk.

Let's say you want to replace all keys that include the word "pony" to "bronco".
You would need to make an sliding window with at least 2 times the length of the string "pony" (so, 8 chars);that slides at most the length of half the string "pony" (so, 2 chars). you would also need a regex to detect that a given window chunk contains the key declaration including the word "pony".
Finally, you would need to pipe the altered file to another, updating a pointer to avoid duplicating the overlapping part of several windows.

It works especially well for NLP.
1

webketje

2163

4y

@IntrusionCM @JsonBoa thx for the detailed replies! I read the linked article. I learned something, but afaik this is all too low-level for a JS static site gen.
If I end up using any readable or writable stream, it will have to be only for the advantages other than "lower memory footprint"
0

IntrusionCM

13947

4y

@webketje

Yes, it's only useful if you are severely memory constrained or have true "beasts" in term of size...

Eg log analysis becomes so much easier when a file of several gigs is in JSON and can be tokenized for quick peek a boo.

Related Rants

Add Comment

rant

question

streams

node