1

I have some problems with symbol #.

Some data in columns contains the symbol #, for example

'JRE#150' 
'July banner #150' 

When I am inserting data from file all.csv into SQL Server, records containing this character are not inserted into the table correctly.

What do I mean?!

If I try to insert this value 'JRE#150', only this part 'JRE' is stored, NULL is inserted into other columns.

How the process looks like and what I am doing:

  1. The first independent engine sends me the all.csv file from the API to DataFrame.

    The following line is responsible for importing this data into a file.

    .csv is:

     df.to_csv(r'C:\\...\all.csv',  encoding='utf-8', index=False)
    
  2. Second independent mechanism is doing this:

     df = pd.read_csv(r'C:\\...\all.csv', sep=',', comment='#', encoding='utf-8', low_memory=False)
    
     df.to_sql(table_name, engine, if_exists = 'replace', chunksize = None, index=False)
    

How to insert data into SQL Server with #, do not replace for another symbol or delete?

What is the problem here and how can I fix it?

I will be grateful for the help.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Alex
  • 147
  • 8
  • I've never heard of this before. I just Googled it now and found this. https://stackoverflow.com/questions/32235696/pandas-to-sql-gives-unicode-decode-error – ASH Sep 15 '21 at 14:38

1 Answers1

0

Remove the comment='#' parameter from pd.read_csv(...).

As per the Pandas read_csv documentation:

comment: str, optional

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment='#', parsing #empty\na,b,c\n1,2,3 with header=0 will result in a,b,c being treated as the header.

AlwaysLearning
  • 7,915
  • 5
  • 27
  • 35