Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Is data publicly accessible without ANY logins ?
If yes : No problems
If not : it’s classified as hacking. -
bad-frog5464y@NoToJavaScript glad to have stumbled on this thread.
is it still hacking if you scrape informations from your own account? -
@NoToJavaScript the second part should be clarified:
do you a legitimate login? if yes, you can scrape.
TOS and rate limiting are also a factor here. -
The desire of the page to allow bots should be in robots.txt on the website.
http://www.robotstxt.org/ -
bad-frog5464y@Demolishun cool thing.
just checked. devrant has one, however its a placeholder.
apparently it has been "moved permanently"
are there alternative names under which i should look for those? -
bad-frog5464y@Demolishun and i have the ambition of starting making a kinda trading bot in 2 months:)))
well, first stage at least: scrap the webz for all relevant info, in forums and actual quotes, automatize everything so that it spews out the essence.
once i get decision making right then i will automatize it all the way, but i have no date for that stage
fun thing is that it kinda comes toegether all by itself. -
@bad-frog I already have code which scraps all TSX symbols in real time every 15 seconds haha
Good luck with the bot tho. After 2 days the best I could do is "not losing money"
I'm using https://fr.investing.com/equities/... as source and scrap the table
I then use this data to "play" with bot settings. -
@bad-frog it's very quick and dirty as i was mostly doing it for funzies, but if it can help :
https://pastebin.com/3GzETLab
Also : last time I tested was about 7 weeks ago, so maybe there are some layout / html changes -
bad-frog5464y@NoToJavaScript 1000 thx
but isnt trading equity expensive?
tbh i thought more about crypto bc trading fees are basically inexistent, and my plan for crypto was:
scrap 4chan/biz and see the occurence of crypto names.
prolly build a sentiment analyze, maybe tinker with it until i get a real tool
cross-reference that with crypto quotes
record statistics so as to see mooning
i should also scap reddits and tweets and the like
extend that to names of companies so as to auto- find and verify if there is a squeeze going on. by then i should have enough monetary mass so as to ignore trading fees of playing with the big boys -
@bad-frog Well, if the provider (let's say reddit) has API, just use APIs. More efficiant and easy. And it doesn't depends on html changes.
For trading provider, there are some what allow free trades (no fees).
Robinhood (US only I think)
WelathSimple trade (Canada, this is the on eI use)
Quest Trade
more.
Wealthsimple doesn't have trading APIs, BUT the have a website. Sending an order should not be difficult to retro*engeenier with couple of F12 in the browser.
Fair warning, HTTP is not as fast as you think it is :)
If you want to scrap all forums and blogs when your bot makes a decision, it's already too late. Look at agregated datasources -
bad-frog5464y@NoToJavaScript 1000 thanks bro, you advanced the whole project by a week at least
i supposed i had to work with js (which i dont know yet) at a certain point, and now i have a working example -
bad-frog5464y@NoToJavaScript oh, it doesnt have to happen in an instant. also my internet wouldnt allow for a tradebot in the true sense:)
i will be perfectly content if i get my analysis on a daily basis at first. then maybe increase the frequency to see where i gan get, and with what i can get away...
i doubt many servers would like being submerged by requests...
but if i have a 10 second resolution, its good enough to even make statistics about the markets response to news, crossreferenced with forums etc...
the idea is to have a tool to understand trends and follow them -
bad-frog5464y@NoToJavaScript "but thats C#"
thats exactly what im saying:p
if its not C, C++ or python you got me lost
honorary title: bash -
@bad-frog /agree
Anyway it's a fun project ! I don't have enough motivation to work on it dailly, but every couple on months I add a brick :)
ir uses lib https://html-agility-pack.net/ which I find very good for html parsing. It even handles "broken" html (to some degree) -
bad-frog5464y@NoToJavaScript that was my intent too.
the first step will be in two months for me because it ties in with my learning curriculum
but otherwise i have a few ideas on the backburner too. also to tie in in time. -
bad-frog5464y@NoToJavaScript niiiiice
c# is also on my personal list so i might start right away
even tho parsing isnt hard with C like.
however i see that it builds requests for you and all
but then i will have to learn how to build those myself soon...
ill have to build a server in c++. only std maybe some other one or two, selected by the school
they really want us to know networking in and out for sure... -
@bad-frog And 2 rules for scraping data :
1. Always provide user agent
2. Always use cookies
Some sites will reject requests without these 2.
The most difficult one I ever did was LinkedIn. THAT SHIT Changes something in layout almost every 2 weeks. -
@Nanos I would think yes, but to proove it I don't see how.
I would do it personally -
I would just build a "virtual marketplace" for "practicing trading".
Like a game.
And then x amount of fake dollars translate into y subperecentage of real dollars.
So maybe 100k in the game market translates into $10.
and then the traders that are good, we aggregate their trades and execute them for real.
Of course the players don't need to know that and couldnt know that anyway.
Why invent effective AI when you can just crowdsource from people? I figure some small percentage of users are gonna be super predictors or naturally good at what they do.
Highly unethical of course if they're not informed. -
mundo0349014yHave you heard about robots.txt?
That file will tell you what the site wants you to grab and ehat they don't.
You can choose to ignore it.
Also depending of where you are there are copyright and privacy regulations that can get you in trouble.
Talk to a lawyer.
Related Rants
-
Redp1ll12Interview HR: So .. tell us .. where do you see our AI acting in 5 years? ME: Doing your job minus the stupi...
-
kekayan7When you wanted to know deep learning immediately
-
chrisebryan8The highest data transfer rate today - 256 gigabytes per second - was achieved when the cleaner's vacuum clean...
While scraping web sources to build datasets, has legality been ever a concern?
Is it a standard practice for checking whether a site prohibits scraping?
rant
data science
ai
legal
scraping
web scraping
data
ml
dataset