8

Why is python supposedly something big data people use ? Sounds like r and stats and well I don’t see the adoption of that though python is used somewhat I note in a lot of Linux apps and utilities

Just seems strange that an interpreted language would be used that way to me or am I an idiot ?

Comments
  • 2
  • 2
    @Python little more elaboration would be nice lol
  • 7
    There are multple causes:
    1. The language is easy to understand;
    2. There are multiple libs that help to process data in various forms;
    3. You dont need to worry about datatypes;
    4. If you need to speed up the programm you only need to run Cython over to most expensive parts of your program and you get a faster c version out of it.
  • 1
    @stop so what are some of these libs ?
  • 2
    Ease of use beats performance any day of the week
  • 2
    @EpicofGilgamesh not really. Most especially when you start to scale up usage the expense grows considerably I would agree for once and done single user utilities however. Context matters
    I would NEVER use node js express for a large web app I’d use Apache and some combination of backend technologies that will run as a new process per request on a Linux systenn
  • 4
    @MadMadMadMrMim
    Nodejs vastly outperforms an Apache+PHP or whatever setup.
    Spawning a process produces overhead, so does IPC.
    Async programming in node is far more resource efficient and it eliminates the overhead of context switching.
    That's why nginx outperforms Apache.
    Nginx handles request asynchronous and distributes them across multiple threads if necessary.
    Apache starts a new thread, if not a new process for every request, which increases overhead and resource usage.
  • 1
    Since you can wrap/include c libraries into python and vice versa, you basically have the best of both worlds. And honestly, noone will reinvent the wheel if not necessary, so they might aswell just use a few c libs with some python ontop basically.

    @EpicofGilgamesh
    @MadMadMadMrMim
    i agree, that performance in big data can be a significant factor, though ease of use aswell, maybe both for the same part? Outside of big data and other computing intensive areas usability wins by far though.
  • 2
    @EpicofGilgamesh were you being sarcastic I often feel you are being sarcastic and it doesn’t carry well in this medium lol
  • 1
    It's just that people don't feel like waiting something to compile, especially if it works relatively fast anyway.
  • 1
    Builds on msvc update 😋
  • 1
    When you’re debugging i mean
  • 1
    @MadMadMadMrMim nah am not being sacarstic am just voicing my opinion
  • 1
    I have heard that python also wins due to it being a very good glue language. Businesses dont do big data for doing big data, they have their businesses. I have heard that Python due to it being versatile serves as a very good glue language for business stuff and big data stuff.

    Also C++ is a nightmare for non fulltime developers. I am not sure about other languages. But python can be picked up by non technical people pretty fast.

    Also python has C extensions which if you are really concerned can come in useful. Or even better Cython.
  • 6
    Python is good, 'mkay?
  • 2
    I would imagine most people dealing with data analysis may be good in statistics they may not have a huge background in computer programming though, It would be prudent to pick python which is super easy to learn
  • 2
    @EpicofGilgamesh but data science is a bit different than statistical analysis surely
  • 1
    @MadMadMadMrMim and as we sit here acting like more thoughtful sophisticated machines that lost their bookmark i watch the equivalent of carnival ducks in a shooting game
    Continue to walk back and forth again this is
    Maddening
  • 3
    Python has a very fast typing-to-results ratio, a bunch of stellar libraries (numpy, pandas, sklearn, matplotlib etc.) which perform pretty well, and very convenient work environments (Jupyter, Spyder). You don't need to wait for stuff to compile. In a data science pipeline you'd use Python for the stuff which needs a lot of iteration and experimentation: making graphs, testing models and hypotheses, sharing ideas etc. It's more than fast enough for all that. R and Matlab are also heavily used for this.

    For deployment, you generally want efficiency and scalability, so there's are a bunch of ways eg. via Spark, C++ libs, TVM, etc. You don't want to iterate the same way you deploy, that's too slow for data science work, and you don't want to deploy the same way you iterate, that's too slow performance wise.
  • 1
    @metamourge node js runs on a single thread and Linux is designed to create processes over threads I’ll believe that when I see the benchmarks

    As per Ipc Apache communicates via environment variables and the stdio streams when it runs php c++ .net python Perl or whatever
  • 0
    @MadMadMadMrMim
    Every process schedule forces a full context switch, additional to all partial context switches while creating/destroying FDs.
    Just to give you an impression how much overhead a new process is, compared to async. If you'd exchange all partial context switches with full context switches, you cut down your systems performance by 1/3.
    Running everything in userspace threads on a single system thread is cheap compared to that.
  • 0
    @metamourge but that single thread is running like a task scheduler same as the linux os is scheduling processes.

    you'd have to say 'ok I got this list of waiting routines that executed to this point, so I have to let them all execute just a little bit, so i have to either stop them at a point or let them all execute to completion and move on to the next'... this kind of sounds like a process scheduler which the os already does, which node is mimicking.

    yes I know resource handles are created but here is the question, everytime a NEW web client calls my node js program what is happening ? and if the app fails does my whole web server crash ? no, apache does not crash if one webpage does.
    node is prone to doing that.
  • 0
    I mean, there are benchmarks. Several of them. Most of them stating that nodejs is incredibly faster than apache+whatever when using it correctly.

    - https://zgadzaj.com/development/...
    - https://dev.to/emiliosp/...
    - https://thirdrocktechkno.com/blog/...
  • 0
    lmao you don’t have to like nodejs but it’s ignorant to say it’s not faster than Apache. Just because you don’t understand it doesn’t mean it’s slow or not useful.
  • 1
    @Python oh no no you get me wrong i like the synatx for express.

    I think for a specific set of limited purposes like web services node is useful, but firstly its javascript and javascript is a tad spaghetti-like since it uses alot of async crap that yes you can await on but yeah.. still annoying.

    but apache is robust and featured and incredibly mature. its like a working version of IIS :P
  • 1
    > “javascript is a tad spaghetti-like”

    me 🤝 @MadMadMadMrMim
  • 0
    btw yeah josh this is me and so sorry that remembering that the world is so horrible now is difficult i was not stopped here to keep this consistent they have to offer me things that fixes my mind or stop blowing it
  • 0
    @Python i do not understand that emoji :P sorry lol
  • 0
    @Python oh thats scary and you mentioned it before, a handshake..
    i forgot they added emojis to unicode.. i was able to search by the character... now there is a frightening sign of the times lol
  • 0
    @MadMadMadMrMim here is an example of a terminated pathway a finality which is encouraged surely but also naturally occurring as the communicated point was previously elaborated enough to not require additional interaction
  • 0
    @stop didn’t you have to write those parts in c/c++ though I looked that up again
  • 0
    So I was looking closer at data libraries for python and I was feeling very... numpy
  • 0
    @MadMadMadMrMim no you need to run cython to generate the c sourcecode. Cython is an transpiler.
  • 0
    @stop nothing on the numpy joke ? fine then. anyway I did actually read
    https://cython.org/

    can the tool be used for what you're saying because this page says its a superset language.

    I am presently refusing to continue writing my project if I can't keep it without some jackass stealing it.

    which after the grinning monkey kind of indicates has been the case before.
  • 0
    @MadMadMadMrMim if you dont want other peopke steal it. I think python is the wrong choice. Cython has its own superset for python code but it can also compilranspile python code to C or C++. This code can be compiled to an binary or an lib that can be distributed. To enhance this you need to compile the code with optimizer setting to max and strip it of any other debuginfo. Or change the language.
Add Comment