Thats exactly what grep does.
This might be faster with grep -F
In general, you probably wont beat grep for this sort of work, its been around for a very long time and is well optimized (without hand building an index anyway).-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions),
separated by newlines, any of which is to be matched.
Code: Select all
for file in filename1 filename2 filename3... do grep -F "10457" $file >> matches.txt done cat matches.txt
Its probably limited by disk IO speed. grep reads 32k at a time.
How do you do that?I tried a short search for non-existent string in a small, 14000 line text file and counted the instructions
With valgrind. Specifically:-
That reminds me of the two suggestions many years ago for a searchable company phone book:While grep and other regex-capable commands are the go-to for searching, it sounds like OP has structured data that might better be put into a database and (repeatedly?) searched that way. Loading it into SQLite would be my suggestion in this case.
Um, you're almost perfectly describing a database table there. While grep will work for simple queries, the shell code will get really slow and fiddly if someone asks you how many blue 10457's were made in Q3 2017.
Ahh - but Lucene doesn't produce nice graphs of the extracted data on a smartphone well, if even at all.
OK, for fun, I created a large file - 16GB and over 643 million lines.
Note that this was 100% disk IO limited (as you can see from the high ratio of user/sys to real).
Ah sorry — I've got a bunch of experience in corpus linguistics, so to me, searching a file might include collocates and lemmatized searches. While the Linux text tools are pretty good, I often get the feeling here that many people think they're the be-all-and-end-all of text management. There's a lot more out there than just the builtins. Also, get a better smartphone …