devRant - A fun community for developers to connect over code, tech & life as a programmer

Search - "string parsing"

122

shinypotato

1410

8y

Started being a Teaching Assistant for Intro to Programming at the uni I study at a while ago and, although it's not entirely my piece of cake, here are some "highlights":

* students were asked to use functions, so someone was ingenious (laughed my ass off for this one):
def all_lines(input):
all_lines =input
return all_lines

* "you need to use functions" part 2
*moves the whole code from main to a function*

* for Math-related coding assignments, someone was always reading the input as a string and parsing it, instead of reading it as numbers, and was incredibly surprised that he can do the latter "I always thought you can't read numbers! Technology has gone so far!"

* for an assignment requiring a class with 3 private variables, someone actually declared each variable needed as a vector and was handling all these 3 vectors as 3D matrices

* because the lecturer specified that the length of the program does not matter, as long as it does its job and is well-written, someone wrote a 100-lines program on one single line

* someone was spamming me with emails to tell me that the grade I gave them was unfair (on the reason that it was directly crashing when run), because it was running on their machine (they included pictures), but was not running on mine, because "my Python version was expired". They sent at least 20 emails in less than 2h

* "But if it works, why do I still have to make it look better and more understandable?"

* "can't we assume the input is always going to be correct? Who'd want to type in garbage?"

* *writes 10 if-statements that could be basically replaced by one for-loop*
"okay, here, you can use a for-loop"
*writes the for loop, includes all the if-statements from before, one for each of the 10 values the for-loop variable gets*

* this picture

N.B.: depending on how many others I remember, I may include them in the comments afterwards

rant highlights funny sad sigh coding teaching assistant students

19
53

-vim-

3083

8y

Waisting some times on codewars.com

~~~~
3 kyu challenge:

Given a string with mathematical operations like this: ‘3+5*7*(10-45)’, compute the result
~~~~~

*Does a quick and easy one liner in python using eval()*

*sees people actually writing some 100 lines parsing the string and calculating using priority of operation*

Poor them...

(Btw, passed to lvl 4 kyu thx to this)

random one liner always think before coding poor guy

12
41

AshMountDigger

920

10y

Recently saw a piece of code left by a former coworker where he converted an int to a long, by converting it to a string, then parsing the string.

undefined language

5
31

prabandham

259

7y

Super excited to get my first ML project on, been working on this on - off for a couple of years. Now that I had sometime to spare could get it completed :) And along the way learned a lot of things as well :)

Let me know what you guys think.
https://github.com/Prabandham/...
https://github.com/Prabandham/...

A lot of thanks to this Repo that helped in parsing the image to string.
https://github.com/BenNG/...

Of course PR's and suggestions really welcome and appreciated.

rant

9
28

PaperTrail

10389

9y

A dev team has been spending the past couple of weeks working on a 'generic rule engine' to validate a marketing process. The “Buy 5, get 10% off” kind of promotions.

The UI has all the great bits, drop-downs, various data lookups, etc etc..

What the dev is storing the database is the actual string representation FieldA=“Buy 5, get 10% off” that is “built” from the UI.

Might be OK, but now they want to apply that string to an actual order. Extract ‘5’, the word ‘Buy’ to apply to the purchase quantity rule, ‘10%’ and the word ‘off’ to subtract from the total.

Dev asked me:

Dev: “How can I use reflection to parse the string and determine what are integers, decimals, and percents?”
Me: “That sounds complicated. Why would you do that?”
Dev: “It’s only a string. Parsing it was easy. First we need to know how to extract numbers and be able to compare them.”
Me: “I’ve seen the data structures, wouldn’t it be easier to serialize the objects to JSON and store the string in the database? When you deserialize, you won’t have to parse or do any kind of reflection. You should try to keep the rule behavior as simple as possible. Developing your own tokenizer that relies on reflection and hoping the UI doesn’t change isn’t going to be reliable.”
Dev: “Tokens!...yea…tokens…that’s what we want. I’ll come up with a tokenizing algorithm that can utilize recursion and reflection to extract all the comparable data structures.”
Me: “Wow…uh…no, don’t do that. The UI already has to map the data, just make it easy on yourself and serialize that object. It’s like one line of code to serialize and deserialize.”
Dev: “I don’t know…sounds like magic. Using tokens seems like the more straightforward O-O approach. Thanks anyway.”

I probably getting too old to keep up with these kids, I have no idea what the frack he was talking about. Not sure if they are too smart or I’m too stupid/lazy. Either way, I keeping my name as far away from that project as possible.

undefined get off my lawn darn kids

4
25

Root

77073

7y

I am much too tired to go into details, probably because I left the office at 11:15pm, but I finally finished a feature. It doesn't even sound like a particularly large or complicated feature. It sounds like a simple, 1-2 day feature until you look at it closely.

It took me an entire fucking week. and all the while I was coaching a junior dev who had just picked up Rails and was building something very similar.

It's the model, controller, and UI for creating a parent object along with 0-n child objects, with default children suggestions, a fancy ui including the ability to dynamically add/remove children via buttons. and have the entire happy family save nicely and atomically on the backend. Plus a detailed-but-simple listing for non-technicals including some absolutely nontrivial css acrobatics.

After getting about 90% of everything built and working and beautiful, I learned that Rails does quite a bit of this for you, through `accepts_nested_params_for :collection`. But that requires very specific form input namespacing, and building that out correctly is flipping difficult. It's not like I could find good examples anywhere, either. I looked for hours. I finally found a rails tutorial vide linked from a comment on a SO answer from five years ago, and mashed its oversimplified and dated examples with the newer documentation, and worked around the issues that of course arose from that disasterous paring.

like.
I needed to store a template of the child object markup somewhere, yeah? The video had me trying to store all of the markup in a `data-fields=" "` attrib. wth? I tried storing it as a string and injecting it into javascript, but that didn't work either. parsing errors! yay! good job, you two.

So I ended up storing the markup (rendered from a rails partial) in an html comment of all things, and pulling the markup out of the comment and gsubbing its IDs on document load. This has the annoying effect of preventing me from using html comments in that partial (not that i really use them anyway, but.)

Just.
Every step of the way on building this was another mountain climb.
* singular vs plural naming and routing, and named routes. and dealing with issues arising from existing incorrect pluralization.
* reverse polymorphic relation (child -> x parent)
* The testing suite is incompatible with the new rails6. There is no fix. None. I checked. Nope. Not happening.
* Rails6 randomly and constantly crashes and/or caches random things (including arbitrary code changes) in development mode (and only development mode) when working with multiple databases.
* nested form builders
* styling a fucking checkbox
* Making that checkbox (rather, its label and container div) into a sexy animated slider
* passing data and locals to and between partials
* misleading documentation
* building the partials to be self-contained and reusable
* coercing form builders into namespacing nested html inputs the way Rails expects
* input namespacing redux, now with nested form builders too!
* Figuring out how to generate markup for an empty child when I'm no longer rendering the children myself
* Figuring out where the fuck to put the blank child template markup so it's accessible, has the right namespacing, and is not submitted with everything else
* Figuring out how the fuck to read an html comment with JS
* nested strong params
* nested strong params
* nested fucking strong params
* caching parsed children's data on parent when the whole thing is bloody atomic.
* Converting datetimes from/to milliseconds on save/load
* CSS and bootstrap collisions
* CSS and bootstrap stupidity
* Reinventing the entire multi-child / nested params / atomic creating/updating/deleting feature on my own before discovering Rails can do that for you.

Just.
I am so glad it's working.
I don't even feel relieved. I just feel exhausted.

But it's done.
finally.

and it's done well. It's all self-contained and reusable, it's easy to read, has separate styling and reusable partials, etc. It's a two line copy/paste drop-in for any other model that needs it. Two lines and it just works, and even tells you if you screwed up.

I'm incredibly proud of everything that went into this.
But mostly I'm just incredibly tired.

Time for some well-deserved sleep.

rant root made lots of things root made a thing they were hard things rails

7
22

PonySlaystation

20863

4y

Imagine saving Integers and Floats in a MySQL table as strings containing locale based thousand sepatators...
man... fickt das hart!

Wait, there's more!
Imagine storing a field containing list of object data as a CSV in a single table column instead of using JSON format or a separate DB table.... and later parsing it by splitting the CSV string on ";"...

rant cmsofdoom phptsd

6
18

Jantho1990

2137

8y

Writing a function to take a string of delimited entities, parse each character to find the separators, capture the characters in between separators, and return an array of entities.

I used this for about a year before I learned about String.split()

Yeah.

rant string parsing i was still learning how to code look for the existing answers first wk118

1
17

Gogeta70

2627

8y

I have to write an xml configuration parser for an in-house data acquisition system that I've been tasked with developing.

I hate doing string parsing in C++... Blegh!

rant i hate writing xml parsers

16
13

hjk101

5591

5y

I have quite a few of these so I'm doing a series.
(2 of 3) Flexi Lexi

A backend developer was tired of building data for the templates. So he created a macro/filter for our in house template lexer. This filter allowed the web designers (didn't really call them frond end devs yet back then) could just at an SQL statement in the templates.

The macro had no safe argument parsing and the designers knew basic SQL but did not know about SQL Injection and used string concatination to insert all kinds of user and request data in the queries.
Two months after this novel feature was introduced we had SQL injections all over the place when some piece of input was missing but worse the whole product was riddled with SQLi vulnerabilities.

rant wk234 sql injection

2
11

awesomeest

1033

2y

Should I be excited or concerned?

Newbie dev(babydev) who just learned string vs int and the word "boolean", is SUPER into data parsing, extrapolation and recursion... without knowing what any of those terms.

2 ½ hrs later. still nothing... assuming he was confused, I set up a 'quick' call...near 3 hrs later I think he got that it was only meant so I could see if/where he didnt understand... not dive into building extensive data arch... hopefully.

So, we need some basic af PHP forms for some public-provided input into a mySQL db. I figured I'd have him look up mySQL variables/fields, teach him a bit about proper db/field setup and give him something to practice on his currently untouched linux container I just set up so he could have a static ipv4 and cli on our new block (yea... he's spoiled, but has no clue).

I asked him to list some traits of X that he thinks could be relevant. Then to essentially briefly explain the logic to deciding/returning the values/how to store in the db... essentially basic conditionals and for loops... which is also quite new to him.

I love databases; I know I'm not in the majority... I assumed he'd get a couple traits in his mind and exhaust himself breaking them down. I was wrong. He was/likely is in his sleep now, over complicating something that was just meant as a basic af.

Fyi, the company is currently weighted towards more autistics (him and myself included) than neurotypicals.

I know I was(still am) extremely abnormal, especially when it comes to things like data.

So, should I be concerned/have him focus elsewhere for a bit?... I dont want to have him burnout before he even gets to installing mySQL

question devrant databases conditionals

44
9

mundo03

4828

5y

Best:
Got a role change to automation engineer, which is sort of a 'just fix problems' position, like with tooling, get rid of manual work, remove as many spreadsheet as we can.
I started looking into rust.

Worst:
People think we depend heavily in javascript because our products are extensions, our golden product is an extension, so a few members of my team insist in depending in our core team and use their javascript stuff, even for string parsing, even if we do have a python package that does rhe same thing that is officially maintained too.
I refuse.

The good again:
My boss let's me refuse, I am not forced into javascript, they let me use whatever I want as long as it is reasonable.

rant wk241

2
8

anon0330

108

9y

I wasn't hired to do a dev's job (handled sales) but they asked me to help the non-HQ end with sorting transaction records (a country's worth) for an audit.

Asked HQ if they could send the data they took so I wouldn't need to request the data. We get told sure, you can have it. Waits for a month. Nothing. Apparently, they've forgotten.

Asks for data again. They churn it out in 24 hours. Badly Parsed. Apparently they just put a mask of a UI and stored all fields as one entire string (with no separators). The horror!

Ended up wasting most of a week simply fixing the parsing by brute force since we had no time.

Good news(?): We ended up training the front desk people to ending their fields with semi-colons to force backend into a possibly parsed state.

undefined not really dev but close wk45 semi-colon love
8

nightowl

686

8y

I like my log messages to indicate automatically where in the code something happened, so that I can easily identify where a message originated from while tracking down problems.

In C/C++ this is nice and easy - write a logging routine, wrap it in macros for the different log levels and have that automatically output __FILE__, __LINE__ etc.

I wanted to do something similar in NodeJS, as I'd found myself manually writing the file name in the log message and then splitting functionality out into new files and it became a mess.

The only way I found to be able to do this was to create an "Error" object and access the "stack" member of it. This is a string containing a stack backtrace, suitable for writing to console/file. I just wanted the filename/line/routine.

So I ended up splitting the string into lines, then for each of the lines, trimming the surrounding spaces (or tabs?), and parsing them to see if the stack entry is inside my logger module. The first entry outside of that module must therefore be the thing that called it, so I then parse out the routine or object and method, filename and line number.

It's a lot of clumsy work but the output is pretty neat. I just wish it were simpler!

rant stack logging nodejs who called me

2
6

cat-on-keyboard

1595

8y

I had an interesting mystery the other day. I work in the UK, but I'm working remotely from the US for a while. First day, I made some changes, ran the tests and they failed. Weird part was the failing test was for a component I hadn't touched. I took a closer look, and realized it was a date off by several hours. The test was checking that a passed in date appears in the output. But it was creating the date by parsing a string. The library I was using defaults to local time, but the component uses UTC. So, I had inadvertently created a unit test that only passes when run from UTC. But I had never noticed before because my work is in that timezone. Yikes!

rant mtz oopsies unit tests
6

Midnight-shcode

4406

5y

!rant
...so I started learning F#... again, second attempt, first one failed like 3 years ago because I had no real usecase appropriate for the language, so I didn't get it.

...but now i'm trying to make my own language, meaning also (wanting) parser/interpreter/compiler, and I found a lecture where dude shows off THREE he wrote (within 24 hours) for three different languages...

...and it showed me that doing my parser/interpreter/compiler in F#, using pattern matching, is going to be incredibly awesome, as opposed to doing it as string parsing in C#...

rant f# seems awesome

6
6

IHateForALiving

2927

5y

DAILY LARAVEL PROBLEMS

I need to parse a JWT with some custom claims. There's a JWT library with Laravel; documentation really lacking, kinda hardcoded to work with Laravel but whatever; it's already installed, let's see what can I do with it.

It turns out I can't say something like "take this token, parse it, tell me it's valid". Let's see how that goes.
You need to build a parsing class with a manager, some auth stuff, a parser.
To build said manager you need a provider that implements a contract, a blacklist, a factory (of what?)
To build the factory (of what?) you need a claim factory and a payload validator
To build the claim factory you need a request
To build the blacklist you need a Storage
To build the storage you need a CacheContract
To build a CacheContract you need IDK it's a mess
To build the contract you need... IDK for real

WHY LARAVEL IS SHIT: 'cause only in this framework it seems reasonable to build this clusterfuck to parse a base64 encoded string, throw some json_decode and check a signature. And have it work only to authenticate a user.

rant jwt shit laravel

1
5

pk76

1140

7y

Spend several months designing an alternative approach to string matching/parsing not based on parser combination or Regex. Benchmark and profile the ever living hell out of it. Find out the performance is highly competitive with FParsec and far easier on memory than either approach. Discover FParsec responds very badly when failing. Document all of this, do a presentation on the design. Upload video of presentation with links to it on a few sites. Presentation gets downvoted, and receive hatemail. Yay.

rant

17
5

Lensflare

21314

1y

Why the hell do languages like Kotlin (Java) and C# handle dates and datetimes so needlessly complicated?
There are multiple types with different implementations and concepts like local time or time zones represented by those types. Some of them have capabilities like serialization, some of them don’t.
Parsing and encoding is tied to the types.

Why? Take Swift as an example:
It has one single Date type (including time) which represents a point in time independent of any calendar, time zone, encoding or format.
There is a DateFormatter to parse from APIs from iso or timestamps or whatever and to format to UI as a string in any language (localization), for any region, in any format.
If you just want a container for the date time components themselves (which the concept of local date time seems to be in those languages), you can use the DateComponents type. If you are interested in dates from the perspective of a calendar, there is a Calendar type.

Everything makes sense and the different concepts are decoupled from each other as they should be.

Damn! My memory about C# is a bit hazy but Kotlin, I’m disappointed in you! Date handling is a horrible mess!
Ok, I guess I can blame it on Java and JVM.

rant kotlin date time wtf

6
4

IntrusionCM

13702

4y

Aka... How NOT to design a build system.

I must say that the winning award in that category goes without any question to SBT.

SBT is like trying to use a claymore mine to put some nails in a wall. It most likely will work somehow, but the collateral damage is extensive.

If you ask what build tool would possibly do this... It was probably SBT. Rant applies in general, but my arch nemesis is definitely SBT.

Let's start with the simplest thing: The data format you use to store.

Well. Data format. So use sth that can represent data or settings. Do *not* use a programming language, as this can neither be parsed / modified without an foreign interface or using the programming language itself...

Which is painful as fuck for automatisation, scripting and thus CI/CD.

Most important regarding the data format - keep it simple and stupid, yet precise and clean. Do not try to e.g. implement complex types - pain without gain. Plain old objects / structs, arrays, primitive types, simple as that.

No (severely) nested types, no lazy evaluation, just keep it as simple as possible. Build tools are complex enough, no need to feed the nightmare.

Data formats *must* have btw a proper encoding, looking at you Mr. XML. It should be standardized, so no crazy mfucking shit eating dev gets the idea to use whatever encoding they like.

Workflows. You know, things like
- update dependency
- compile stuff
- test run
- ...

Keep. Them. Simple.
Especially regarding settings and multiprojects.

http://lihaoyi.com/post/...

If you want to know how to absolutely never ever do it.

Again - keep. it. simple.

Make stuff configurable, allow the CLI tool used for building to pass this configuration in / allow setting of env variables. As simple as that.

Allow project settings - e.g. like repositories - to be set globally vs project wide.

Not simple are those tools who have...
- more knobs than documentation
- more layers than a wedding cake
- inheritance / merging of settings :(
- CLI and ENV have different names.
- CLI and ENV use different quoting
...

Which brings me to the CLI.

If your build tool has no CLI, it sucks. It just sucks. No discussion. It sucks, hmkay?

If your build tool has a CLI, but...
- it uses undocumented exit codes
- requires absurd or non-quoting (e.g. cannot parse quoted string)
- has unconfigurable logging
- output doesn't allow parsing
- CLI cannot be used for automatisation

It sucks, too... Again, no discussion.

Last point: Plugins and versioning.

I love plugins. And versioning.

Plugins can be a good choice to extend stuff, to scratch some specific itches.

Plugins are NOT an excuse to say: hey, we don't integrate any features or offer plugins by ourselves, go implement your own plugins for that.
That's just absurd.
(precondition: feature makes sense, like e.g. listing dependencies, checking for updates, etc - stuff that most likely anyone wants)

Versioning. Well. Here goes number one award to Node with it's broken concept of just installing multiple versions for the fuck of it.
Another award goes to tools without a locking file.
Another award goes to tools who do not support version ranges.
Yet another award goes to tools who do not support private repositories / mirrors via global configuration - makes fun bombing public mirrors to check for new versions available and getting rate limited to death.

In case someone has read so far and wonders why this rant came to be...

I've implemented a sort of on premise bot for updating dependencies for multiple build tools.

Won't be open sourced, as it is company property - but let me tell ya... Pain and pain are two different things. That was beyond pain.

That was getting your skin peeled off while being set on fire pain.

-.-

rant build systems that suck

5
4

RANTSMCPANTS

392

6y

I use the ICU format often for translation because it's simple enough and supported on many platforms. It's something of a standard so I can use the same translation string format and similar library functions everywhere.

ICU is like a really simple templating language, somewhere between printf and something like smarty or twig simplified and specifically intended for internationalisation.

I updated a library providing ICU compatible parsing and formatting for one of the platforms I'm using and find tests break. I assume that only thing to change is the API. ICU very rarely changes and if it did it would be unexpected for it to break the syntax in a major way without big news of a new syntax.

The main contributor of the library has changed since some time last year. Someone else picked up the project from previous contributors.

Though the library is heavily advertised as using ICU it has now switched to using a custom extended format that's not fully compatible and that is being driven by use case demand rather than standardisation.

Seems like a nice chap but has also decided for a major paradigm shift for the library.

The ICU format only parses ICU templates for string substitution and formatting. The new format tries to parse anything that looks XML like as well but with much more strict rules only supporting a tiny subset of XML and failing to preserve what would otherwise be string literals.

Has anyone else seen this happen after the handover of an opensource library where the paradigm shifts?

rant handoff standards icu

3
2

12bitfloat

10817

2y

My dumbass thinking it would be easy to get a string value of an exported symbol in a .so and now I'm manually parsing and applying symbol relocations

rant elf fml

2
2

Liebranca

1248

252d

Almost finished with latest preprocessor.
Why am I always working on preprocessors tho? Shit...

Anyway, almost finished ok.

Idea is, basically, that inside a C source or header you can write a perl subroutine instead of `#define ...`.

The mechanism is rather simple:

```C (wat?)
macro mymacro($expr) {
· // perl code goes here
· return "$expr;"
};
```

`$expr` is just a string holding whatever block of code comes after an invocation of `mymacro`. You can use the builtins `tokenshift` and `tokenpop` on a string to get the first and last token, respectively, and then `tokensplit` gives you *all* the tokens.

Whatever string you return is what the expression you received is replaced by:
- You can just give back the expression as-is to get the exact same thing you wrote -- so `mymacro char* wat;` gives you `char* wat;`.
- But if you return a galaxy's worth of C code, then bam. Macro expanded into it, just like that. It's a perl subroutine, so let your imagination fly. Wanna run some scripts at (pre)compile time? Then you can.
- If you return an empty string, then puff. No code. Input consumed.
- If you give the name of another macro (eg "another_macro $expr;"), the expansion recurses.
- If you return the name of the currently executing macro, no recursion happens. This lets you wrap C keywords without (too much) fear.

It's kind of cool because a separate perl module is built from the macros themselves. So then you can include those in another C file. Syntax is basically more perl because why not:

```C (yes)
package mypkg;
· use lib "path/to/myshit/";
· use pm funk qw(mymacro);
```

The `lib` bit actually translates to `-I(path)` for gcc. But for some reason the way you add an include path in perl is `use lib "path"`, so yep. I get it's confusing but just go with the ::~ f l o w ~:: ok.

Then the `pm` stuff is not valid perl (i think), but I took the easy way out and invented it to ensure there is a way to say "OK I don't give a single shit about the C stuff, just give me these qw()'d funky macros from this file." If you simply `use funk qw(mymacro)` then you also get an `#include "funk.h"`.

Speaking of which, headers are automatically generated. Yeah, fuck you, I added `public` to C, bite me. It's actually quite sexy as I defined it using the preprocessor:

```C (yes but actually perl)
macro public($expr) {
· my $dst=cmamout()->{export};\
· tokentidy $expr;
· push @$dst,$expr;
· return "$expr;";
};
```

Where `cmamout()` is a hash from which the output is generated. Oh, and `tokentidy` is just a random builtin that cleans up extra whitespace, don't mind it.

So now the bad stuff: I have to fix a few things. For instance, notice how I had to escape a new line there? Yeah. It's called dumb fix to shit parsing, of course.

But overall I'm quite satisfied with this. And the reason why may not be so obvious so I'ma spill it out: backticks, motherfucker.

That's right. Have a source emitter written in an esoteric language?

```C (yes really but not really)
macro bashit($expr) {
· my ($exe,@args)=tokensplit $expr;
· return `$exe @args`;
};
```

So now you can fork off into parallel dimensions; what can I say pass the pipe brother.

MAMmoth in the room is yes, this depends on MAM. What is MAM? MAMMI. It's the original name of my infamous picture of an ouroboros eating it's own ass while stuck in limbo contemplating terrible life decisions of a build tool, avtomat (go ARSLASH <AR/> [habibi]).

So what's the deal with that? avtomat is a good build tool _for me_, not... ugh, you. I made it for *myself* baby things are not going to work out between us I'm sorry. MAM just does lots of things I wanted build tools to do in the __EXACT__ way I wanted them done. I'd say you should go use it too maybe, but actually don't and you shouldn't because I broke main some weeks ago to fix some other shit and then implement this. Yeah, pretty stupid, but what the hell. I'm the only user after all!

In conclusion, I am fully expecting to receive my mad props and street cred in the mail along with your marriage proposals en masse, effective immediately.

Further reading: https://youtube.com/watch/...

rant libranka y u no dev branch?!!

5
1

data4

1

3y

I am trying to extract data from the PubSub subscription and finally, once the data is extracted I want to do some transformation. Currently, it's in bytes format. I have tried multiple ways to extract the data in JSON format using custom schema it fails with an error

TypeError: __main__.MySchema() argument after ** must be a mapping, not str [while running 'Map to MySchema']

**readPubSub.py**

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import json
import typing

class MySchema(typing.NamedTuple):
user_id:str
event_ts:str
create_ts:str
event_id:str
ifa:str
ifv:str
country:str
chip_balance:str
game:str
user_group:str
user_condition:str
device_type:str
device_model:str
user_name:str
fb_connect:bool
is_active_event:bool
event_payload:str

TOPIC_PATH = "projects/nectar-259905/topics/events"

def run(pubsub_topic):
options = PipelineOptions(
streaming=True
)
runner = 'DirectRunner'

print("I reached before pipeline")

with beam.Pipeline(runner, options=options) as pipeline:
message=(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(subscription='projects/triple-nectar-259905/subscriptions/bq_subscribe')#.with_output_types(bytes)
| 'UTF-8 bytes to string' >> beam.Map(lambda msg: msg.decode('utf-8'))
| 'Map to MySchema' >> beam.Map(lambda msg: MySchema(**msg)).with_output_types(MySchema)
| "Writing to console" >> beam.Map(print))

print("I reached after pipeline")
result = message.run()
result.wait_until_finish()

run(TOPIC_PATH)

If I use it directly below

message=(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(subscription='projects/triple-nectar-259905/subscriptions/bq_subscribe')#.with_output_types(bytes)
| 'UTF-8 bytes to string' >> beam.Map(lambda msg: msg.decode('utf-8'))
| "Writing to console" >> beam.Map(print))

I get output as

{
'user_id': '102105290400258488',
'event_ts': '2021-05-29 20:42:52.283 UTC',
'event_id': 'Game_Request_Declined',
'ifa': '6090a6c7-4422-49b5-8757-ccfdbad',
'ifv': '3fc6eb8b4d0cf096c47e2252f41',
'country': 'US',
'chip_balance': '9140',
'game': 'gru',
'user_group': '[1, 36, 529702]',
'user_condition': '[1, 36]',
'device_type': 'phone',
'device_model': 'TCL 5007Z',
'user_name': 'Minnie',
'fb_connect': True,
'event_payload': '{"competition_type":"normal","game_started_from":"result_flow_rematch","variant":"target"}',
'is_active_event': True
}

{
'user_id': '102105290400258488',
'event_ts': '2021-05-29 20:54:38.297 UTC',
'event_id': 'Decline_Game_Request',
'ifa': '6090a6c7-4422-49b5-8757-ccfdbad',
'ifv': '3fc6eb8b4d0cf096c47e2252f41',
'country': 'US',
'chip_balance': '9905',
'game': 'gru',
'user_group': '[1, 36, 529702]',
'user_condition': '[1, 36]',
'device_type': 'phone',
'device_model': 'TCL 5007Z',
'user_name': 'Minnie',
'fb_connect': True,
'event_payload': '{"competition_type":"normal","game_started_from":"result_flow_rematch","variant":"target"}',
'is_active_event': True
}

Please let me know if I m doing something wrong while parsing the data to JSON. Also, I am looking for examples to do data masking and run some SQL within Apache Beam

question apache beam google-cloud-pubsub python

4

Top Tags

rant linux code windows fuck i java c programming android dev the is javascript js a life joke python

Weekly Rant

Most unrealistic deadline you've had?

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service