-1

I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.

I have tried using the split keyword and delimiter to no success.

stuff = requests.get('https://data.import.io/extractor***')

stuff.content

I get the content, but I want to extract only Asins.

results

Community
  • 1
  • 1
  • The image is hard to read, but you can expect a JSON response from a RESTful API. Figure out what format you're getting the data you want back and parse it out. https://docs.python.org/3/library/json.html – pinkwaffles Oct 08 '19 at 19:32
  • Weird it's not returning json, but looking at the response I would try and split it on the new line. data = stuff.splitlines(). Then iterate through that and split each entry by "," – Voxum Oct 08 '19 at 19:35
  • What you are getting in the response is a `csv` file. There are libraries available that can help you processing the content. – Thomas Rückert Oct 08 '19 at 19:35
  • You are clearly getting back a CSV. You are also clearly _asking for a CSV_ since you have `/csv/` in the request URL. Look at the import.io docs for a way to request json instead (just a guess, but maybe replace `/csv/` with `/json/`) and your life will be much easier. – msanford Oct 08 '19 at 19:44
  • 1
    Don't post images unless conveying something that can't be represented in text (like a rendering or drawing issue). Post sample text content and ideally working code that returns the content. – Mark Tolonen Oct 08 '19 at 20:26

3 Answers3

1

What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.

https://realpython.com/python-requests/#content

stuff_dictionary = stuff.json()

With that, you can load the content is returned as a dictionary and you will have a much easier time.

EDIT:

Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367

So I tried the following in the terminal and got a dataframe from it

from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))

So you should be able to perform the following

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))

If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.

After that, selecting the ASIN (your second column) from your dataframe should be easy.

asins = df.loc[:, 'ASIN']
asins_arr = asins.array
GLJ
  • 1,074
  • 1
  • 9
  • 17
  • 1
    I'm not OP, I'm just contributing to the discussion, hopefully constructively, and your updated answer is great! +1 – msanford Oct 09 '19 at 13:13
1

While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.

response.txt

Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:

If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():

response.json()

The type of the return value of .json() is a dictionary, so you can access values in the object by key.

You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.

For More Info: https://realpython.com/python-requests/

ParthS007
  • 2,581
  • 1
  • 22
  • 37
0

The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251