I am parsing an image (png) with Amazon Textract and extracting the tables.
Here is an example of such csv when I open it with open(file_name, "r")
and reading it's lines:
['Table: Table_1\n',
'\n',
'Test Name ,Result ,Flag ,Reference Range ,Lab ,\n',
'HEPATIC FUNCTION PANEL PROTEIN, TOTAL ,6.1 ,,6.1-8.1 g/dL ,EN ,\n',
'ALBUMIN ,4.3 ,,3.6-5.1 g/dL ,EN ,\n',
'GLOBULIN ,1.8 ,LOW ,1.9-3.7 g/dL (calc) ,EN ,\n',
'ALBUMIN/GLOBULIN RATIO ,2.4 ,,1.0-2.5 (calc) ,EN ,\n',
'BILIRUBIN, TOTAL ,0.6 ,,0.2-1.2 mg/dL ,EN ,\n',
'BILIRUBIN, DIRECT ,0.2 ,,< OR = 0.2 mg/dL ,EN ,\n',
'BILIRUBIN, INDIRECT ,0.4 ,,0.2-1.2 mg/dL (calc) ,EN ,\n',
'ALKALINE PHOSPHATASE ,61 ,,40-115 U/L ,EN ,\n',
'AST ,27 ,,10-35 U/L ,EN ,\n',
'ALT ,19 ,,9-46 U/L ,EN ,\n',
'\n',
'\n',
'\n',
'\n',
'\n']
I can read it with pandas
read_csv
but I am getting errors (it's always come as different format - more or less spaces, different first lines before the titles).
Please advise how to extract the table from such csv's?