0

This is kind of a "big data with Amazon web services" question: consider a massive set of txt files (all with the same content format inside: [title;body;author]). I want to store them in AWS and be able to search a substring in the whole set. What would be a good approach for doing so. I would also thank advises on how to store all this data instead of txt files (we are talking about articles with title, body and author).

Thank you.

L RodMrez
  • 137
  • 8
  • I am not a big data expert, but you might be looking for EMR (Elastic Map Reduce) in AWS. Checkout its documentation [here](https://aws.amazon.com/emr/) – Rhythem Aggarwal Aug 01 '18 at 11:29
  • 2
    It it is text only then why not store and search with AWS Elasticsearch https://aws.amazon.com/elasticsearch-service/ – Kush Vyas Aug 01 '18 at 11:37
  • 1
    How big is your data size (GB, TB, PB)? How fast are you expecting the results (sub-second, seconds, minutes)? How many concurrent queries (one, thousands)? What are your technical skill bases (SQL, languages, Hadoop)? – John Rotenstein Aug 01 '18 at 11:53
  • I would suggest at least converting said file into Parquet or ORC format, then Athena queries will be fast. Otherwise, stick the data in RDS or Redshift – OneCricketeer Aug 06 '18 at 12:37

0 Answers0