0

I'm experiencing the following error when using pandas.read_sas() on XPT files (for example, this file from NHANES which can be downloaded from here):

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
pd.read_sas('./dxx.xpt')

This single pandas.read_sas('./dxx.xpt') command returns this message six times before returning the dataframe. The dataframe seems to be fine, but why is this message appearing?

I've encountered this message before in scenarios like the one posed in this question (e.g. attempting to iteratively add rows to a dataframe), but I don't see any apparent connection between that scenario and this one.

I have a hunch that it might have something to do with missing data in the XPT file, but I'm not super familiar with these file formats and don't have any software that can open and view them directly to confirm. If missing data is in fact the reason for this message, then why doesn't this message appear when executing pandas.read_csv() on CSV files that have missing data?

jglad
  • 120
  • 2
  • 2
  • 13
  • Per [this thread](https://github.com/twopirllc/pandas-ta/issues/340), I am using `from warnings import simplefilter; simplefilter(action="ignore", category=pd.errors.PerformanceWarning)` to suppress the warning. It's not really an answer to my question, but it's a solution to making my Python script cleaner. – jglad Jan 21 '23 at 19:04

0 Answers0