I am trying to load a CSV (tab delimited) using spark csv - by scala.
what I observed is , if a column contains new line character LF (\n) spark considering it as end of the line even though we have double quotes on both sides of the column in the csv file.
any body encountered similar problem ?
but spark recognizing it as a end of line
after @ashraful suggestion i tried below query :
but the result remains the same here is the record i am trying to load
`but after loading to spark data frame i am trying check the record but rest of the columns are loaded as null after the LS character.
same row is being split into two
this is the data : note there is LF character end of the first line.
cuenta_id fecha_creacion saldo descripcion saldo_inicial fecha_ultima_actualizacion usuario_id saldo_anterior saldo_reserva tipo_cuenta pais_iso_3166 moneda_iso_4217 valor_comision_retiro retiro_en_proceso plan_id grupo_comision_id documento tipo_documento operacion_local estado_cuenta nombre telefonos fax direccion nombre_contacto email_contacto url ciudad ventas_mensuales comentarios modelo_pagos modo_prueba tiene_fondo_reserva porcentaje_fondo_reserva aceptar_tc_internacionales dias_fondo_reserva telefono_servicio_cliente email_soporte_cliente pagina_web_soporte_cliente nombre_comercial fuente_creacion procesar_sin_cvv2 productos_servicios usuario_modificacion_id sesion_modificacion_id saldo_congelado perfil_usuario_id grupo_perfil_cobranza_id limite_creacion_transacciones limite_procesamiento_transacciones codigo_mcc_id url_atencion_cliente ubicacion_regional direccion_linea2 direccion_linea3 direccion_codigo_postal grupo_perfil_cobranza_personalizado_id logo migrada_acreditacion origen_comercial
"500181" "2012-04-13 17:14:47.958" "0.00" "CTA POL [MARISOL GONDOLA ROMERO PE]" "0.00" "500106" "0.00" "0.00" "CHECK_ACCOUNT" "PE" "PEN" "0.00" "f" "1" "9" "3742396" "ID" "f" "ENABLED" "MARISOL GONDOLA ROMERO" "5073984473" "" "CLL SALISBURY CASA 226 URBANIZACION COLINAS DEL GOLF CORREGIMIENTO DE JOSE D ESPINAR" "ALEJANDRO JUAN COMPARADA" "pol.no.responder@gmail.com" "LIMA" "VALIDAR TIPO DE DOCUMENTO Y TIPO DE CUENTA
" "AGGREGATOR" "f" "f" "t" "5073984473" "pol.no.responder@gmail.com" "f" "2" "C64C7A54C942D496DCFCA2C24671FD46.admin-nodo1" "0" "MG-PRICING_302_157077" "MG-PRICING_303_285226" "5311" "f"
following is the output:
cala> sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter", "\t").option("quote", "\"").option("charset","UTF-8").load("/home/goutham/Work/data/account.csv").show()
+--------------+--------------------+-----+--------------------+-------------+--------------------------+----------+--------------------+-------------+-------------+-------------+---------------+---------------------+-----------------+--------------------+-----------------+--------------------+--------------------+---------------+-------------+--------------------+----------+----+--------------------+--------------------+--------------------+----+------+----------------+--------------------+------------+-----------+-------------------+------------------------+--------------------------+------------------+-------------------------+---------------------+--------------------------+----------------+---------------+-----------------+-------------------+-----------------------+----------------------+---------------+-----------------+------------------------+-----------------------------+----------------------------------+-------------+--------------------+------------------+----------------+----------------+-----------------------+--------------------------------------+----+--------------------+----------------+
| cuenta_id| fecha_creacion|saldo| descripcion|saldo_inicial|fecha_ultima_actualizacion|usuario_id| saldo_anterior|saldo_reserva| tipo_cuenta|pais_iso_3166|moneda_iso_4217|valor_comision_retiro|retiro_en_proceso| plan_id|grupo_comision_id| documento| tipo_documento|operacion_local|estado_cuenta| nombre| telefonos| fax| direccion| nombre_contacto| email_contacto| url|ciudad|ventas_mensuales| comentarios|modelo_pagos|modo_prueba|tiene_fondo_reserva|porcentaje_fondo_reserva|aceptar_tc_internacionales|dias_fondo_reserva|telefono_servicio_cliente|email_soporte_cliente|pagina_web_soporte_cliente|nombre_comercial|fuente_creacion|procesar_sin_cvv2|productos_servicios|usuario_modificacion_id|sesion_modificacion_id|saldo_congelado|perfil_usuario_id|grupo_perfil_cobranza_id|limite_creacion_transacciones|limite_procesamiento_transacciones|codigo_mcc_id|url_atencion_cliente|ubicacion_regional|direccion_linea2|direccion_linea3|direccion_codigo_postal|grupo_perfil_cobranza_personalizado_id|logo|migrada_acreditacion|origen_comercial|
+--------------+--------------------+-----+--------------------+-------------+--------------------------+----------+--------------------+-------------+-------------+-------------+---------------+---------------------+-----------------+--------------------+-----------------+--------------------+--------------------+---------------+-------------+--------------------+----------+----+--------------------+--------------------+--------------------+----+------+----------------+--------------------+------------+-----------+-------------------+------------------------+--------------------------+------------------+-------------------------+---------------------+--------------------------+----------------+---------------+-----------------+-------------------+-----------------------+----------------------+---------------+-----------------+------------------------+-----------------------------+----------------------------------+-------------+--------------------+------------------+----------------+----------------+-----------------------+--------------------------------------+----+--------------------+----------------+
| 500181|2012-04-13 17:14:...| 0.00|CTA POL [MARISOL ...| 0.00| null| 500106| 0.00| 0.0|CHECK_ACCOUNT| PE| PEN| 0.0| f| 1| 9| 3742396| ID| f| ENABLED|MARISOL GONDOLA R...|5073984473|null|CLL SALISBURY CAS...|ALEJANDRO JUAN CO...|pol.no.responder@...|null| LIMA| null|VALIDAR TIPO DE D...| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null|null| null| null|
|" "AGGREGATOR"| f| f| null| t| null|5073984473|pol.no.responder@...| null| null| null| f| null| 2|C64C7A54C942D496D...| 0|MG-PRICING_302_15...|MG-PRICING_303_28...| null| null| 5311| null|null| null| null| null|null| null| f| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null| null|null| null| null|
+--------------+--------------------+-----+--------------------+-------------+--------------------------+----------+--------------------+-------------+-------------+-------------+---------------+---------------------+-----------------+--------------------+-----------------+--------------------+--------------------+---------------+-------------+--------------------+----------+----+--------------------+--------------------+--------------------+----+------+----------------+--------------------+------------+-----------+-------------------+------------------------+--------------------------+------------------+-------------------------+---------------------+--------------------------+----------------+---------------+-----------------+-------------------+-----------------------+----------------------+---------------+-----------------+------------------------+-----------------------------+----------------------------------+-------------+--------------------+------------------+----------------+----------------+-----------------------+--------------------------------------+----+--------------------+----------------+