0

For a current research project, I am planning to read the JSON object "Main_Text" within a pre-defined time range on basis of Python/Pandas. The code however yields the error TypeError: string indices must be integers for line line = row["Main_Text"].

I have alreay gone through pages addressing the same issue but not found any solution yet. Is there any helpful tweak to make this work?

The JSON file has the following structure:

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

And the corresponding code section looks this this:

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

print(filtered_dates)


# Processing
for row in filtered_dates:
    line = row["Text Main"]
    # Remove the leading spaces and newline character
    line = line.strip()
Malte Susen
  • 767
  • 3
  • 13
  • 36
  • 1
    there is no "Main_Text" in the data shared. it would also be helpful if u shared a few more lines of ur json data – sammywemmy May 12 '20 at 08:51
  • That was a transcription error when editing the question, well spotted and thanks for the hint (amending that accordingly). The error relates to `Text Main`. – Malte Susen May 12 '20 at 08:54
  • 1
    does this answer your question? https://stackoverflow.com/questions/6077675/why-am-i-seeing-typeerror-string-indices-must-be-integers – Giuppox May 12 '20 at 08:59
  • 1
    no worries. i'd still suggest that u add more lines to the json code supplied. pandas can handle json, removing the need to use the json module – sammywemmy May 12 '20 at 09:00
  • Thanks, I have in fact worked through the given thread before posting the question. `Text Main` is a list of strings while a conversion to a dictionary yields the same error. Also, I have tried to use a position allocator like [0] behind the command. This makes the code work but - little surprise - only counts single letters. – Malte Susen May 12 '20 at 09:10

1 Answers1

1

If the requirement is to collect all the contents of 'Text Main' column, this is what we can do:

line = list(filtered_dates['Text Main'])

We can then then also apply strip:

line = [val.strip() for val in line]
Anshul
  • 1,413
  • 2
  • 6
  • 14