Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "string parsing"
-
Started being a Teaching Assistant for Intro to Programming at the uni I study at a while ago and, although it's not entirely my piece of cake, here are some "highlights":
* students were asked to use functions, so someone was ingenious (laughed my ass off for this one):
def all_lines(input):
all_lines =input
return all_lines
* "you need to use functions" part 2
*moves the whole code from main to a function*
* for Math-related coding assignments, someone was always reading the input as a string and parsing it, instead of reading it as numbers, and was incredibly surprised that he can do the latter "I always thought you can't read numbers! Technology has gone so far!"
* for an assignment requiring a class with 3 private variables, someone actually declared each variable needed as a vector and was handling all these 3 vectors as 3D matrices
* because the lecturer specified that the length of the program does not matter, as long as it does its job and is well-written, someone wrote a 100-lines program on one single line
* someone was spamming me with emails to tell me that the grade I gave them was unfair (on the reason that it was directly crashing when run), because it was running on their machine (they included pictures), but was not running on mine, because "my Python version was expired". They sent at least 20 emails in less than 2h
* "But if it works, why do I still have to make it look better and more understandable?"
* "can't we assume the input is always going to be correct? Who'd want to type in garbage?"
* *writes 10 if-statements that could be basically replaced by one for-loop*
"okay, here, you can use a for-loop"
*writes the for loop, includes all the if-statements from before, one for each of the 10 values the for-loop variable gets*
* this picture
N.B.: depending on how many others I remember, I may include them in the comments afterwards19 -
Waisting some times on codewars.com
~~~~
3 kyu challenge:
Given a string with mathematical operations like this: ‘3+5*7*(10-45)’, compute the result
~~~~~
*Does a quick and easy one liner in python using eval()*
*sees people actually writing some 100 lines parsing the string and calculating using priority of operation*
Poor them...
(Btw, passed to lvl 4 kyu thx to this)14 -
Recently saw a piece of code left by a former coworker where he converted an int to a long, by converting it to a string, then parsing the string.5
-
Super excited to get my first ML project on, been working on this on - off for a couple of years. Now that I had sometime to spare could get it completed :) And along the way learned a lot of things as well :)
Let me know what you guys think.
https://github.com/Prabandham/...
https://github.com/Prabandham/...
A lot of thanks to this Repo that helped in parsing the image to string.
https://github.com/BenNG/...
Of course PR's and suggestions really welcome and appreciated.9 -
A dev team has been spending the past couple of weeks working on a 'generic rule engine' to validate a marketing process. The “Buy 5, get 10% off” kind of promotions.
The UI has all the great bits, drop-downs, various data lookups, etc etc..
What the dev is storing the database is the actual string representation FieldA=“Buy 5, get 10% off” that is “built” from the UI.
Might be OK, but now they want to apply that string to an actual order. Extract ‘5’, the word ‘Buy’ to apply to the purchase quantity rule, ‘10%’ and the word ‘off’ to subtract from the total.
Dev asked me:
Dev: “How can I use reflection to parse the string and determine what are integers, decimals, and percents?”
Me: “That sounds complicated. Why would you do that?”
Dev: “It’s only a string. Parsing it was easy. First we need to know how to extract numbers and be able to compare them.”
Me: “I’ve seen the data structures, wouldn’t it be easier to serialize the objects to JSON and store the string in the database? When you deserialize, you won’t have to parse or do any kind of reflection. You should try to keep the rule behavior as simple as possible. Developing your own tokenizer that relies on reflection and hoping the UI doesn’t change isn’t going to be reliable.”
Dev: “Tokens!...yea…tokens…that’s what we want. I’ll come up with a tokenizing algorithm that can utilize recursion and reflection to extract all the comparable data structures.”
Me: “Wow…uh…no, don’t do that. The UI already has to map the data, just make it easy on yourself and serialize that object. It’s like one line of code to serialize and deserialize.”
Dev: “I don’t know…sounds like magic. Using tokens seems like the more straightforward O-O approach. Thanks anyway.”
I probably getting too old to keep up with these kids, I have no idea what the frack he was talking about. Not sure if they are too smart or I’m too stupid/lazy. Either way, I keeping my name as far away from that project as possible.4 -
I am much too tired to go into details, probably because I left the office at 11:15pm, but I finally finished a feature. It doesn't even sound like a particularly large or complicated feature. It sounds like a simple, 1-2 day feature until you look at it closely.
It took me an entire fucking week. and all the while I was coaching a junior dev who had just picked up Rails and was building something very similar.
It's the model, controller, and UI for creating a parent object along with 0-n child objects, with default children suggestions, a fancy ui including the ability to dynamically add/remove children via buttons. and have the entire happy family save nicely and atomically on the backend. Plus a detailed-but-simple listing for non-technicals including some absolutely nontrivial css acrobatics.
After getting about 90% of everything built and working and beautiful, I learned that Rails does quite a bit of this for you, through `accepts_nested_params_for :collection`. But that requires very specific form input namespacing, and building that out correctly is flipping difficult. It's not like I could find good examples anywhere, either. I looked for hours. I finally found a rails tutorial vide linked from a comment on a SO answer from five years ago, and mashed its oversimplified and dated examples with the newer documentation, and worked around the issues that of course arose from that disasterous paring.
like.
I needed to store a template of the child object markup somewhere, yeah? The video had me trying to store all of the markup in a `data-fields=" "` attrib. wth? I tried storing it as a string and injecting it into javascript, but that didn't work either. parsing errors! yay! good job, you two.
So I ended up storing the markup (rendered from a rails partial) in an html comment of all things, and pulling the markup out of the comment and gsubbing its IDs on document load. This has the annoying effect of preventing me from using html comments in that partial (not that i really use them anyway, but.)
Just.
Every step of the way on building this was another mountain climb.
* singular vs plural naming and routing, and named routes. and dealing with issues arising from existing incorrect pluralization.
* reverse polymorphic relation (child -> x parent)
* The testing suite is incompatible with the new rails6. There is no fix. None. I checked. Nope. Not happening.
* Rails6 randomly and constantly crashes and/or caches random things (including arbitrary code changes) in development mode (and only development mode) when working with multiple databases.
* nested form builders
* styling a fucking checkbox
* Making that checkbox (rather, its label and container div) into a sexy animated slider
* passing data and locals to and between partials
* misleading documentation
* building the partials to be self-contained and reusable
* coercing form builders into namespacing nested html inputs the way Rails expects
* input namespacing redux, now with nested form builders too!
* Figuring out how to generate markup for an empty child when I'm no longer rendering the children myself
* Figuring out where the fuck to put the blank child template markup so it's accessible, has the right namespacing, and is not submitted with everything else
* Figuring out how the fuck to read an html comment with JS
* nested strong params
* nested strong params
* nested fucking strong params
* caching parsed children's data on parent when the whole thing is bloody atomic.
* Converting datetimes from/to milliseconds on save/load
* CSS and bootstrap collisions
* CSS and bootstrap stupidity
* Reinventing the entire multi-child / nested params / atomic creating/updating/deleting feature on my own before discovering Rails can do that for you.
Just.
I am so glad it's working.
I don't even feel relieved. I just feel exhausted.
But it's done.
finally.
and it's done well. It's all self-contained and reusable, it's easy to read, has separate styling and reusable partials, etc. It's a two line copy/paste drop-in for any other model that needs it. Two lines and it just works, and even tells you if you screwed up.
I'm incredibly proud of everything that went into this.
But mostly I'm just incredibly tired.
Time for some well-deserved sleep.7 -
Imagine saving Integers and Floats in a MySQL table as strings containing locale based thousand sepatators...
man... fickt das hart!
Wait, there's more!
Imagine storing a field containing list of object data as a CSV in a single table column instead of using JSON format or a separate DB table.... and later parsing it by splitting the CSV string on ";"...7 -
I am currently in a bit of a (well-deserved) lull at work, both of my projects are finishing up/ finished, so tomorrow should be pretty light, as the latter half of today was.
And I have really gotten interested in the HTTP protocol. It's so interesting learning how it all works under the hood.
So I think I'm going to be researching/ messing around with creating a cpp project that essentially implements cURL from the ground up, creating sockets, reading from them, parsing the HTTP requests... all that. I don't expect to actually get it done, but it should be an immense learning experience. I have a clear goal: implement this function:
std::string get(const std::string&);
Once I'm able to just GET as simple as that, I know I have achieved my goal!3 -
Writing a function to take a string of delimited entities, parse each character to find the separators, capture the characters in between separators, and return an array of entities.
I used this for about a year before I learned about String.split()
Yeah.1 -
I have to write an xml configuration parser for an in-house data acquisition system that I've been tasked with developing.
I hate doing string parsing in C++... Blegh!16 -
I have quite a few of these so I'm doing a series.
(2 of 3) Flexi Lexi
A backend developer was tired of building data for the templates. So he created a macro/filter for our in house template lexer. This filter allowed the web designers (didn't really call them frond end devs yet back then) could just at an SQL statement in the templates.
The macro had no safe argument parsing and the designers knew basic SQL but did not know about SQL Injection and used string concatination to insert all kinds of user and request data in the queries.
Two months after this novel feature was introduced we had SQL injections all over the place when some piece of input was missing but worse the whole product was riddled with SQLi vulnerabilities.2 -
Should I be excited or concerned?
Newbie dev(babydev) who just learned string vs int and the word "boolean", is SUPER into data parsing, extrapolation and recursion... without knowing what any of those terms.
2 ½ hrs later. still nothing... assuming he was confused, I set up a 'quick' call...near 3 hrs later I think he got that it was only meant so I could see if/where he didnt understand... not dive into building extensive data arch... hopefully.
So, we need some basic af PHP forms for some public-provided input into a mySQL db. I figured I'd have him look up mySQL variables/fields, teach him a bit about proper db/field setup and give him something to practice on his currently untouched linux container I just set up so he could have a static ipv4 and cli on our new block (yea... he's spoiled, but has no clue).
I asked him to list some traits of X that he thinks could be relevant. Then to essentially briefly explain the logic to deciding/returning the values/how to store in the db... essentially basic conditionals and for loops... which is also quite new to him.
I love databases; I know I'm not in the majority... I assumed he'd get a couple traits in his mind and exhaust himself breaking them down. I was wrong. He was/likely is in his sleep now, over complicating something that was just meant as a basic af.
Fyi, the company is currently weighted towards more autistics (him and myself included) than neurotypicals.
I know I was(still am) extremely abnormal, especially when it comes to things like data.
So, should I be concerned/have him focus elsewhere for a bit?... I dont want to have him burnout before he even gets to installing mySQL44 -
I like my log messages to indicate automatically where in the code something happened, so that I can easily identify where a message originated from while tracking down problems.
In C/C++ this is nice and easy - write a logging routine, wrap it in macros for the different log levels and have that automatically output __FILE__, __LINE__ etc.
I wanted to do something similar in NodeJS, as I'd found myself manually writing the file name in the log message and then splitting functionality out into new files and it became a mess.
The only way I found to be able to do this was to create an "Error" object and access the "stack" member of it. This is a string containing a stack backtrace, suitable for writing to console/file. I just wanted the filename/line/routine.
So I ended up splitting the string into lines, then for each of the lines, trimming the surrounding spaces (or tabs?), and parsing them to see if the stack entry is inside my logger module. The first entry outside of that module must therefore be the thing that called it, so I then parse out the routine or object and method, filename and line number.
It's a lot of clumsy work but the output is pretty neat. I just wish it were simpler!2 -
Best:
Got a role change to automation engineer, which is sort of a 'just fix problems' position, like with tooling, get rid of manual work, remove as many spreadsheet as we can.
I started looking into rust.
Worst:
People think we depend heavily in javascript because our products are extensions, our golden product is an extension, so a few members of my team insist in depending in our core team and use their javascript stuff, even for string parsing, even if we do have a python package that does rhe same thing that is officially maintained too.
I refuse.
The good again:
My boss let's me refuse, I am not forced into javascript, they let me use whatever I want as long as it is reasonable.2 -
I wasn't hired to do a dev's job (handled sales) but they asked me to help the non-HQ end with sorting transaction records (a country's worth) for an audit.
Asked HQ if they could send the data they took so I wouldn't need to request the data. We get told sure, you can have it. Waits for a month. Nothing. Apparently, they've forgotten.
Asks for data again. They churn it out in 24 hours. Badly Parsed. Apparently they just put a mask of a UI and stored all fields as one entire string (with no separators). The horror!
Ended up wasting most of a week simply fixing the parsing by brute force since we had no time.
Good news(?): We ended up training the front desk people to ending their fields with semi-colons to force backend into a possibly parsed state. -
I had an interesting mystery the other day. I work in the UK, but I'm working remotely from the US for a while. First day, I made some changes, ran the tests and they failed. Weird part was the failing test was for a component I hadn't touched. I took a closer look, and realized it was a date off by several hours. The test was checking that a passed in date appears in the output. But it was creating the date by parsing a string. The library I was using defaults to local time, but the component uses UTC. So, I had inadvertently created a unit test that only passes when run from UTC. But I had never noticed before because my work is in that timezone. Yikes!
-
DAILY LARAVEL PROBLEMS
I need to parse a JWT with some custom claims. There's a JWT library with Laravel; documentation really lacking, kinda hardcoded to work with Laravel but whatever; it's already installed, let's see what can I do with it.
It turns out I can't say something like "take this token, parse it, tell me it's valid". Let's see how that goes.
You need to build a parsing class with a manager, some auth stuff, a parser.
To build said manager you need a provider that implements a contract, a blacklist, a factory (of what?)
To build the factory (of what?) you need a claim factory and a payload validator
To build the claim factory you need a request
To build the blacklist you need a Storage
To build the storage you need a CacheContract
To build a CacheContract you need IDK it's a mess
To build the contract you need... IDK for real
WHY LARAVEL IS SHIT: 'cause only in this framework it seems reasonable to build this clusterfuck to parse a base64 encoded string, throw some json_decode and check a signature. And have it work only to authenticate a user.1 -
!rant
...so I started learning F#... again, second attempt, first one failed like 3 years ago because I had no real usecase appropriate for the language, so I didn't get it.
...but now i'm trying to make my own language, meaning also (wanting) parser/interpreter/compiler, and I found a lecture where dude shows off THREE he wrote (within 24 hours) for three different languages...
...and it showed me that doing my parser/interpreter/compiler in F#, using pattern matching, is going to be incredibly awesome, as opposed to doing it as string parsing in C#...6 -
Spend several months designing an alternative approach to string matching/parsing not based on parser combination or Regex. Benchmark and profile the ever living hell out of it. Find out the performance is highly competitive with FParsec and far easier on memory than either approach. Discover FParsec responds very badly when failing. Document all of this, do a presentation on the design. Upload video of presentation with links to it on a few sites. Presentation gets downvoted, and receive hatemail. Yay.18
-
I use the ICU format often for translation because it's simple enough and supported on many platforms. It's something of a standard so I can use the same translation string format and similar library functions everywhere.
ICU is like a really simple templating language, somewhere between printf and something like smarty or twig simplified and specifically intended for internationalisation.
I updated a library providing ICU compatible parsing and formatting for one of the platforms I'm using and find tests break. I assume that only thing to change is the API. ICU very rarely changes and if it did it would be unexpected for it to break the syntax in a major way without big news of a new syntax.
The main contributor of the library has changed since some time last year. Someone else picked up the project from previous contributors.
Though the library is heavily advertised as using ICU it has now switched to using a custom extended format that's not fully compatible and that is being driven by use case demand rather than standardisation.
Seems like a nice chap but has also decided for a major paradigm shift for the library.
The ICU format only parses ICU templates for string substitution and formatting. The new format tries to parse anything that looks XML like as well but with much more strict rules only supporting a tiny subset of XML and failing to preserve what would otherwise be string literals.
Has anyone else seen this happen after the handover of an opensource library where the paradigm shifts?3 -
Aka... How NOT to design a build system.
I must say that the winning award in that category goes without any question to SBT.
SBT is like trying to use a claymore mine to put some nails in a wall. It most likely will work somehow, but the collateral damage is extensive.
If you ask what build tool would possibly do this... It was probably SBT. Rant applies in general, but my arch nemesis is definitely SBT.
Let's start with the simplest thing: The data format you use to store.
Well. Data format. So use sth that can represent data or settings. Do *not* use a programming language, as this can neither be parsed / modified without an foreign interface or using the programming language itself...
Which is painful as fuck for automatisation, scripting and thus CI/CD.
Most important regarding the data format - keep it simple and stupid, yet precise and clean. Do not try to e.g. implement complex types - pain without gain. Plain old objects / structs, arrays, primitive types, simple as that.
No (severely) nested types, no lazy evaluation, just keep it as simple as possible. Build tools are complex enough, no need to feed the nightmare.
Data formats *must* have btw a proper encoding, looking at you Mr. XML. It should be standardized, so no crazy mfucking shit eating dev gets the idea to use whatever encoding they like.
Workflows. You know, things like
- update dependency
- compile stuff
- test run
- ...
Keep. Them. Simple.
Especially regarding settings and multiprojects.
http://lihaoyi.com/post/...
If you want to know how to absolutely never ever do it.
Again - keep. it. simple.
Make stuff configurable, allow the CLI tool used for building to pass this configuration in / allow setting of env variables. As simple as that.
Allow project settings - e.g. like repositories - to be set globally vs project wide.
Not simple are those tools who have...
- more knobs than documentation
- more layers than a wedding cake
- inheritance / merging of settings :(
- CLI and ENV have different names.
- CLI and ENV use different quoting
...
Which brings me to the CLI.
If your build tool has no CLI, it sucks. It just sucks. No discussion. It sucks, hmkay?
If your build tool has a CLI, but...
- it uses undocumented exit codes
- requires absurd or non-quoting (e.g. cannot parse quoted string)
- has unconfigurable logging
- output doesn't allow parsing
- CLI cannot be used for automatisation
It sucks, too... Again, no discussion.
Last point: Plugins and versioning.
I love plugins. And versioning.
Plugins can be a good choice to extend stuff, to scratch some specific itches.
Plugins are NOT an excuse to say: hey, we don't integrate any features or offer plugins by ourselves, go implement your own plugins for that.
That's just absurd.
(precondition: feature makes sense, like e.g. listing dependencies, checking for updates, etc - stuff that most likely anyone wants)
Versioning. Well. Here goes number one award to Node with it's broken concept of just installing multiple versions for the fuck of it.
Another award goes to tools without a locking file.
Another award goes to tools who do not support version ranges.
Yet another award goes to tools who do not support private repositories / mirrors via global configuration - makes fun bombing public mirrors to check for new versions available and getting rate limited to death.
In case someone has read so far and wonders why this rant came to be...
I've implemented a sort of on premise bot for updating dependencies for multiple build tools.
Won't be open sourced, as it is company property - but let me tell ya... Pain and pain are two different things. That was beyond pain.
That was getting your skin peeled off while being set on fire pain.
-.-5 -
My dumbass thinking it would be easy to get a string value of an exported symbol in a .so and now I'm manually parsing and applying symbol relocations3
-
There's a moment when you're looking at a reStructuredText parser, you have a basic idea of the spec, and you know with absolute certainty it's not parsing it correctly. It gets the basics right, sure. But now you know why the tools that use it have disclaimers like "Any para lists or other directives must be at the end of the string". And you're thinking "how hard would it *really* be to write one?". After all, I'm now extracting all the docstrings from the code... Is it really that much more? Shit...
-
I am trying to extract data from the PubSub subscription and finally, once the data is extracted I want to do some transformation. Currently, it's in bytes format. I have tried multiple ways to extract the data in JSON format using custom schema it fails with an error
TypeError: __main__.MySchema() argument after ** must be a mapping, not str [while running 'Map to MySchema']
**readPubSub.py**
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import json
import typing
class MySchema(typing.NamedTuple):
user_id:str
event_ts:str
create_ts:str
event_id:str
ifa:str
ifv:str
country:str
chip_balance:str
game:str
user_group:str
user_condition:str
device_type:str
device_model:str
user_name:str
fb_connect:bool
is_active_event:bool
event_payload:str
TOPIC_PATH = "projects/nectar-259905/topics/events"
def run(pubsub_topic):
options = PipelineOptions(
streaming=True
)
runner = 'DirectRunner'
print("I reached before pipeline")
with beam.Pipeline(runner, options=options) as pipeline:
message=(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(subscription='projects/triple-nectar-259905/subscriptions/bq_subscribe')#.with_output_types(bytes)
| 'UTF-8 bytes to string' >> beam.Map(lambda msg: msg.decode('utf-8'))
| 'Map to MySchema' >> beam.Map(lambda msg: MySchema(**msg)).with_output_types(MySchema)
| "Writing to console" >> beam.Map(print))
print("I reached after pipeline")
result = message.run()
result.wait_until_finish()
run(TOPIC_PATH)
If I use it directly below
message=(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(subscription='projects/triple-nectar-259905/subscriptions/bq_subscribe')#.with_output_types(bytes)
| 'UTF-8 bytes to string' >> beam.Map(lambda msg: msg.decode('utf-8'))
| "Writing to console" >> beam.Map(print))
I get output as
{
'user_id': '102105290400258488',
'event_ts': '2021-05-29 20:42:52.283 UTC',
'event_id': 'Game_Request_Declined',
'ifa': '6090a6c7-4422-49b5-8757-ccfdbad',
'ifv': '3fc6eb8b4d0cf096c47e2252f41',
'country': 'US',
'chip_balance': '9140',
'game': 'gru',
'user_group': '[1, 36, 529702]',
'user_condition': '[1, 36]',
'device_type': 'phone',
'device_model': 'TCL 5007Z',
'user_name': 'Minnie',
'fb_connect': True,
'event_payload': '{"competition_type":"normal","game_started_from":"result_flow_rematch","variant":"target"}',
'is_active_event': True
}
{
'user_id': '102105290400258488',
'event_ts': '2021-05-29 20:54:38.297 UTC',
'event_id': 'Decline_Game_Request',
'ifa': '6090a6c7-4422-49b5-8757-ccfdbad',
'ifv': '3fc6eb8b4d0cf096c47e2252f41',
'country': 'US',
'chip_balance': '9905',
'game': 'gru',
'user_group': '[1, 36, 529702]',
'user_condition': '[1, 36]',
'device_type': 'phone',
'device_model': 'TCL 5007Z',
'user_name': 'Minnie',
'fb_connect': True,
'event_payload': '{"competition_type":"normal","game_started_from":"result_flow_rematch","variant":"target"}',
'is_active_event': True
}
Please let me know if I m doing something wrong while parsing the data to JSON. Also, I am looking for examples to do data masking and run some SQL within Apache Beam4