9
atheist
2y

I'm writing a ML course that explains concepts by going through/getting the reader to write simple implementations of concepts. I've written a decision tree in 250 lines of code (including plotting it), that is 100 times faster than another (hilariously bad) attempt at a simple decision tree, and it's far more readable than anything else I've seen.

I'm having a good day.

Comments
  • 2
    Who’s the audience for this course?
  • 3
    I've been drinking wine and it's hot as balls right now. So my brain may not be on top form.

    @platypus basically, it's the course I think would have best helped me get into data science. I've had the idea brewing for a while, and I've done bits and pieces, but between being ill and life being hell, I've not got much done. I'm now a bit better and starting to make progress in a structured manner. I'm near having "lesson 1" done.

    Basically, the target audience is people that can already code, they don't need to have much of a maths background, but want to learn data science.

    There are a few similar things I've seen and will now heartily reccomend:

    Chris Albon, Director of ML at wikimedia, has created https://machinelearningflashcards.com/... (which I've bought, well worth the money). They're really simple explainer of ML concepts. Each card is fairly self contained, doesn't require a huge maths knowledge.

    Harrison Kinsley has created https://nnfs.io/ (which I've bought the hardback+epub, also well worth the money), this is *really close* to what I want to create. But it focuses very much on a single topic, and also is very "passive learning".

    I learn best by getting my hands dirty. I have an intuitive understanding of how various ML algorithms work because I've built some custom variant of them (mostly in the area of regression, clustering and interpolation). And I want a course that does *that*.

    Something that explains a problem to you, and asks you to solve a little bit. Then a little bit more. Then some more. Then put it all together, and give unit tests along the way. Nnfs and flashcards both lack that interactivity.

    Andrew Ng has an ML course that has this interactivity. But it's written in a horrible language and it doesn't bother explaining half of the concepts.
  • 1
    I've also spoken to people from a local data science meetup, professional data scientists by training, and talking to them, the things they don't get taught is "how to take the concepts and put it into production". My background is software engineering as much as data science, so I want to teach some of that too. They don't get taught how to deploy eg a prediction service as a lambda function. How to get real world data, clean it, build a data pipeline, retrain models and deploy updates. So some of that too. Even if just the basics. Also, version control of data/models/reproducibility, which has been a real thorn for me working with other data scientists, that share a model with a really fragile setup that can't be reproduced.

    What I don't want is some bullshit lecture that shows you some university level algebra and waves it away by saying "but you don't need to understand that". I fucking hate that. And I'm looking directly at Andrew Ng here, while he's super clever, has achieved great stuff, he can't teach. He doesn't bother explaining the hard stuff. This annoys me because the hard stuff isn't that hard. Most mathematical concepts have a comp sci parallel, like a function? Sum? Rules of algebra? Just explain how to implement these concepts and automate the boring bit of "actually solving the problem", we're software engineers, we automate solving the problem by definition. It's our job. We solve way harder problems than that.

    And this is speaking as someone with a degree in physics, that's *good* at the boring maths stuff.
  • 1
    So... That went on a bit. Apologies. I'm going back to my wine. As people are interested, I'll try to get something in a shape I'm comfortable sharing then try to post a link this weekend.
  • 1
    My aim for "lesson 1" at the moment is:

    What is classification?

    What makes a good prediction? Compare 2 black box models, evaluate accuracy, recall

    What is a decision tree? Simple binary tree with threshold

    How does it describe data and create predictions? Nested branches partition data, leaves contain labels

    How do you create a simple decision tree? CART algorithm

    What is good, what is bad about this decision tree? Overfitting, Training data vs Test data

    At every point where there's a question, for now I've implemented the solution, but I'm going to replace it with a function signature, some tests and hints for the user to fill in, with some questions to guide them.
  • 1
    And there's stuff like kaggle, but that doesn't really teach you data science. It teaches you how to use sci kit learn, etc. IMHO you don't get the same intuition or understanding.
Add Comment