1

I've over 17GB of data, downloaded a website, al of the content is .txt and .html.
I want to search inside all of these files.
What is the best tool to do that? any command or some software which can index so it'll be fast?

Comments
  • 0
    also is there a way to search inside file content or atleast files, in .rar archives without extracting them
  • 3
    ripgrep is rly fast
  • 1
    @awsmprog you can not, _by definition_, look inside an archive without extracting it (at the very least into memory). also: what exactly are you searching for?
  • 0
    @tosensei i'm searching for a string inside file contents in a .rar file which contains thousands of .txt and .html.

    now, i've extracted the archive.

    but the search is slow, using ripgrep makes it faster. but is there a way to index the content of the .txt and .html files for much faster search.
  • 0
    TextCrawler is good on Windows: https://digitalvolcano.co.uk/tcdown... (also perf wise)

    With metalsmith js (https://metalsmith.io) or a similar tool you can create custom processing pipelines without too much effort

    The closest you'd be to "reading an archive's contents would be to extract them to a temp folder and remove them after the search operation has completed

    VS Code also has a surprisingly performant and advanced search
  • 1
    @awsmprog to make it faster _than what_? sorry, but without providing any baseline, that question makes no sense.

    also: you know what an index means? _scanning the entire content_. which will only make sense if you're doing several searches. as for "is there a way to index it" - yeah. feed it into $SearchToolOfYourChoice. elasticsearch, a database with text index, windows search - pick your poison.

    but if you're only searching for something once: just start your script and get on with your life while it's doing its job.
  • 2
    Are you looking to search only once or many times? If the former you probably won't benefit from indexing
  • 0
    How about elasticsearch?
  • 0
    open in intellij. search for filename, particular text, everything is pretty fast + gui
  • 0
    for searching inside a zip, i think winrar is good for searching filenames as it provides a search bar. for non window platform, i am not sure.

    if they are uncompressed, then again, opening in intellij will work just fine
  • 0
    Just use plain notepad++.

    You can search entire folders. Lot's of plugins too
  • 1
    I'd give Sublime Texts "find in files" feature a go, you can use regex if you want.
  • 0
    Fd for Linux is a good tool
  • 0
    @awsmprog
    Only back up
  • 1
    rm -rf *
  • 0
    He is doing a SPAM extractor which will look for email adrtesses.
  • 0
    @NoToJavaScript nope. i've downloaded over 17gb archive of erotic stories . wanted to find some interesting stories
Add Comment