I have a homework like this:
- Use python to read json files in 2 folders song_data and log_data.
- Use Python Kafka to publish a mixture of both song_data and log_data file types into a Kafka topic.
- Use Pyspark to consume data from the above Kafka topic.
- Use Stream processing to consume messages from song_data and create 2 dataframes, songs and artitst. and from log_data generate dataframe as users, time.
- Create songplays from dataframes of dimension tables.
I have a problems with read different file from 1 topic, 2 folder containt json file but 1 is song data and 1 is log. How can I get their own data from just 1 topics ?