338

A super creepy webcrawler I built with a friend in Haskell. It uses social media, various reverse image searches from images and strategically picked video/gif frames, image EXIF data, user names, location data, etc to cross reference everything there is to know about someone. It builds weighted graphs in a database over time, trying to verify information through multiple pathways — although most searches are completed in seconds.

I originally built it for two reasons: Manager walks into the office for a meeting, and during the meeting I could ask him how his ski holiday with his wife and kids was, or casually mention how much I would like to learn his favorite hobby.

The other reason was porn of course.

I put further development in the freezer because it's already too creepy. I'd run it on some porn gif, and after a long search it had built a graph pointing to a residence in rural Russia with pictures of a local volleyball club.

To imagine that intelligence agencies probably have much better gathering tools is so insane to think about.

Comments
  • 54
    Oh shit.
  • 64
    You mind open sourcing it? Sounds like a really interesting project for infosec
  • 30
    Found the next NSA :p
  • 14
    I want source. Please!
  • 40
    "Talk is cheap, show me the code!" :D
  • 6
    Sounds a bit like what maltego does, but automated..
    Cool.
  • 28
    Someone go build a clone so we can use it to find OP and ask him irl to open source
  • 26
  • 3
    Open source this ish 😂😂😂
  • 20
    @ThoughtfulDev

    I did not open source it because in my opinion it violates people a bit too much.

    For most people it doesn't crawl much which isn't very publicly available on the net, but it does it within a few seconds of running a query (you're Mark/HHU/DC comics/Taylor Swift/Düsseldorf/Oneplus/Arch/Pineapple?/Handball?, but apparently there's a hardcore libertarian bitcoin startup CFO wine drinker from boston who shares your name).

    But for some people it rags together multiple online identities which they most likely would have preferred to keep separate.
  • 24
    @bittersweet I argue that it is more ethical to open source. The only advantage this gives is that you've put all the info in one place. Government agencies and even other motivated individuals can and probably do already have something like this.

    Therefore it I argue it is best to release so that people like myself may see how exposed our public identity is. In a sense it starts to serve the same purpose as those websites that put up the leaked password databases, you bring the leaked data to the people who are actually effected rather than only those who are dedicated enough to gather it all. With the knowledge of how exposed they are a person can then go and take the relevant steps, but first you must know your exposure!!

    Please open source, this is wicked useful.
  • 8
    "The other reason was pr0n of course." XDDDDDDDDDDDDDD
  • 5
    @Mitch377 not just that, mate.
    It can also find Jesus, Michael Jacksepticeye and GOD
  • 7
    @Mitch377 In most cases no. It doesn't track anything a human can't find out, it just does it faster.

    Weak points of many people are reusing profile photos across websites so it can link up multiple nicknames, people who use plenty of hashtags (searching for combinations of hashtags/likes with usernames often leads to reddit and other social network profiles), uploading photos with location data to websites which do not strip that info, and sport clubs.

    The trick is to keep reinforcing data: Once you find a place of residence for example, run all previous queries again with that place so you can increase weights/certainties on links, and potentially uncover new stuff.
  • 7
    @bittersweet Marc haha yeah most of that info is on my FB alone. but i wonder were it got the info that i use a OnePlus?

    Btw: i need to get to know this bitcoin startup cfo wine drinker from boston :D
  • 4
    @bittersweet but its impressive. :) Maybe i shouldn't use the same profil pic everywhere haha
  • 18
    @ThoughtfulDev Your github profile pic is used on a chat app website which lists the ID code of your oneplus, you had an issue with your camera which you posted about on Kali Nethunter repo, and you took selfies on Instagram.
  • 6
    @bittersweet oh I get it :) you mean keybase? Your program seems like a good tool. Freaks me out but nice job haha
  • 2
    @enen Haha your username is way too generic, just tells me you might be Czech? Something about language use. There's a grand total of 3 exifless pngs of which one is marked as "meme". Quite pointless 😂
  • 13
    @bittersweet I have an ideea: put it up as a service that one can only use on himself by connecting with one of his own accounts (be it google, facebook or whatever) and thus gather information on himself.

    All data collected can be encrypted to only be accessible to the user it belongs to, and add the option to purge all collected data when the user decides it.

    What do you say? :)
  • 2
    Just imagine the possibilities,
  • 3
    @enen I found it! You talked about some czech liquor here on devrant. It checks language use for foreign words to guess nationality, but that gives a lot of false positives.
  • 7
    Trolltrace is now a reality
  • 3
    A tool like this should bot be easely accesable for everyone. Cyber bulling is already a mayor problem with suicides as a consequent. If every idiot with hate in his heart kan vind everything about you... I prevere only a few powerfull companys with these kind of tools and pray to god there only interest is money.
  • 3
    I would love to get my hands on it to see what info it brings up on me!
    Although I do agree it shouldn't be open sourced as it sounds way too invasive to release.
  • 1
    @-Sam- Do you know Glenn & Emma?
  • 7
    Please for the love all that is good and holy in F.O.S.S publish the code. I beg of you, setup donation page whatever i need to see this.
  • 20
    @Lasagna I do find the discussion interesting, and I'm a proponent of open source, especially in security.

    But the problem is that open sourcing it doesn't help to build defenses, it just hands over an easy tool for offense to any scriptkid.

    For a skilled person, the technology to do this is fairly unremarkable, you can do it manually.

    * Do plain google searches with combinations of data you have.
    * Scrape images from profiles.
    * Use images.google.com, tineye.com, yandex.ru, etc for reverse image search. Programmatically you can use the Tineye API (paid), or browser automation (I use hswebdriver)
    * Use http://exif.regex.info/exif.cgi to check image metadata (I use hsexif locally). Usually comes up empty if the images were hosted on major sites.
    * Use twitter, facebook, instagram, gather hashtags, connections and likes.
    * Facebook's intersection searches are the devil's work.

    Now put above in a loop, and when you match something through 2 paths you increase the "truth" value...
  • 6
    you know there are such open source tools. you just have to chain them together.

    Athelion is Yahoos opensourced Webscraper
    https://goo.gl/Vsv5RR

    then there is apache tika which extracts metadata like language and stuff out of every known data format even with OCR
    https://tika.apache.org/

    and then there is Gaffer, the british qchq intelligence tool, which finds relationship and insights out of that scraped data.
    https://github.com/gchq/Gaffer
  • 3
    Wow man, you know you could sell it for possibly millions? Market researchers would kill for this.
  • 16
    @620hun Money is one of the least interesting things in this existence.
  • 11
    So, nobody asked this, so I'll change the direction a bit:
    1. How much time did it take you guys to finish the software?
    2. Were it only two of you?
    3. How hard did you work?
    4. Why Haskell?
    5. How much knowledge and experience did the project require?

    PS. Respect!
  • 8
    @Noob On and off for a few months. Haskell was the language we used professionally, working for a biotech laboratory on continuous flow reactor monitoring.

    Most annoying parts were using Neo4j (graph DB) with Haskell, we only had experience with Postgres & InfluxDB for time series monitoring, and Haskell is not the friendliest language for concurrent web crawling. The most suitable for a project like this would would probably be Go.
  • 4
    @bittersweet i'm totally on your side. Handing this Tool over to the world would turn the lifes of some people to Ruins. Just imagine some girl on insta having a Stalker who gets this Tool in his hands.

    P.S.: By now you're probably seeing my awkward emo Phase.
  • 8
    @Mitch377 Just from your username and devRant content? Very little.

    There's a few tips I'd give to everyone:

    Decide which platforms you want to use in your own name. Be polite and professional in those places, consider everything you do there a public performance — as if it was an interview on national TV.

    For me that's LinkedIn, Stackoverflow, Facebook, Github, YouTube, my own website. I do post about controversial topics there, but not overly offensive, nothing I don't want to come up in a job interview.

    That's privacy I choose to give up.

    For every other website, choose a unique username AND avatar. For that reason I love the devRant avatars: They're unique to this place. If you add date of birth or location, don't be too specific.

    If you post photos to the internet, host them in a place which strips exif tags. Luckily, that's almost every site — as long as you don't host them on your own WordPress.

    Don't embed Facebook or Dropbox photos on forums, use something like imgur. If you take an RL picture, check whether any object in the picture is recognizable. And in screenshots, crop/mask away anything that's not necessary.
  • 5
    Open source it or it didn't happen
  • 1
    @CogInTheWheel it did happen... He posted info about me. Look at the comments
  • 2
    Reminds me of my datingsite crawler
  • 1
    Could you message me what you can find about me? (Private please - keybase.io/marens101)
  • 2
    @bittersweet I commend your attitude
  • 1
    I don't want to know how quickly your crawler could grab all my info just using my profile pictures
  • 1
    Technology is crazy, but creepy... I still somehow don't quite believe this
  • 2
    Every time this gets a post, I feel more and more like asking OP to run the app on my username and give vague hints on what he found
  • 1
    @RealKC Îmi pare rău pentru tine că ai un laptop Intel i3, dar cred că este suficient de bun pentru Terraria.
  • 1
    @bittersweet Oh shit. Nu m-am mai jucat Terraria de câteva luni.

    Did you find that women's clothes store too?
  • 1
    @hash-table I think I'm joining the fan club too. Crazy stuff, and here I thought I what I was trying to build was somehow hard.

    I definitely don't think the author should open source this. There is no point in open sourcing everything solely for the sake of open source.

    Better keep these tools private to freak out boss at times. Plus trace the origins of that Russian volley club orgies
  • 1
    @bittersweet please check me :D

    Your project is really impressive, how long did it take you to "complete" it?
  • 1
    @losdanielos Just from your devrant profile as a starter you could be multiple people. German or Polish?
  • 2
  • 1
    @bittersweet Either open source the code or you'll be ousted from the kingdom of DevRant 😂😂😂
  • 2
    Maltego has something like this that will do something similar; you give it a name/email/ any info you have and it creates a web of 'nodes' which are possible links to your target. Alternate emails, social media accounts, as it links and builds, it gets more precise. IE it pulls your Facebook using your name and email, that's linked to your Instagram so it now has a username you use, it keeps searching for other links you've posted publicly, but now it searches that username till it comes across say a pornhub account with a different email. Now it checks that email and if it finds a link back to verified data, it saves the connection in a node. Ran it on myself and got nada... Ran it on a teacher, using their school email and got everything from Facebook, to their phone number, to Snapchat. Then I spent the next day and a half helping her clean her internet presence up.. I think the program is named Chlorine? Or Chloroform? I had it through my school before. I'll see if I can find it again. It's not free; I know the company AIS uses it. I also know this hotel I worked at purchased a copy to run background checks before they hired employees. They almost didn't hire me because I didn't exist XD
  • 1
  • 1
    @irene I confused the company/product name
Add Comment