Questions tagged [csv]

Comma-Separated Values or Character-Separated Values (CSV) is a common "flat file database" (or spreadsheet-style) format for storing tabular data in plain text, with fields separated by a special character (comma, tab, etc). Rows are typically denoted by newline characters. Use for any delimited file formats, including tab delimited (TSV)

CSV is a file format involving a plain text file with information separated by delimiters with the purpose of storing data in a table-structured format. CSV (comma separated values) files traditionally and most commonly use a comma delimiter (hence the name), but other characters can be used, such as semi-colons, tabs, pipe symbols (|), etc.

The MIME type for CSV files is text/csv.

Information is often stored in CSV format to make it easy to transfer tables of data between applications. Each row of a table is represented as a list of plain text (human-readable) values with a delimiter character between each discrete piece of data. Values may be enclosed in quotes, which is required if they contain the delimiter as a value. The first row of data often contains headers of table's columns, which describe the meaning of the data in each column.

Example

Tabular format

Time Temperature Humidity Description
08:00 70 35 Sunny and Clear
11:45 94 90 Hazy, Hot, and Humid
14:30 18 Freezing
16:00 -200 "Unliveable"

CSV format

Time,Temperature,Humidity,Description
08:00,70,35,Sunny and Clear
11:45,94,90,"Hazy, Hot, and Humid"
14:30,18,,Freezing
16:00,-200,,""Unliveable""

In this example, the first row of CSV data serves as the "header", which describes the corresponding data below it. There is no inherent way to describe within a CSV file whether the first row is a header row or not. Each successive line of the CSV file should neatly fit into the same field as the first line.

Note:

  • Empty fields (fields with no available data, such as the third field in the last line) are place-held with commas so that the fields that follow may be correctly placed.
  • Since the comma is the delimiter for fields, the commas in the Description field of the second line must be quoted (to prevent them from being interpreted as field delimiters). Wrapping the entire field in double quotes (") is the default method for protecting the delimiter character inside a field.
  • Since the double-quote is the delimiter quote character, double-quotes in the data, as in "Unliveable" on the fourth line, must also be protected. Doubling-up the double-quote is the default method for protecting the quote character inside a field.

Questions tagged are expected to relate to programming in some way, for example, parsing/importing CSV files or creating them programmatically.

Related links:

89606 questions
12
votes
2 answers

Wordcloud Python with generate_from_frequencies

I'm trying to create a wordcloud from csv file. The csv file, as an example, has the following structure: a,1 b,2 c,4 j,20 It has more rows, more or less 1800. The first column has string values (names) and the second column has their respective…
cmc_carlos
  • 123
  • 1
  • 1
  • 8
12
votes
3 answers

Add header to CSV without loading CSV

Is there a way to add a header row to a CSV without loading the CSV into memory in python? I have an 18GB CSV I want to add a header to, and all the methods I've seen require loading the CSV into memory, which is obviously unfeasible.
Josh Kidd
  • 816
  • 2
  • 14
  • 35
12
votes
1 answer

How to drop the index column while writing the DataFrame in a .csv file in Pandas?

My DataFrame contains two columns named 'a','b'. Now when I created a csv file of this DataFrame: df.to_csv('myData.csv') And when I opened this in an excel file, there is an extra column with indices that appears alongside the columns 'a' and…
aroma
  • 1,370
  • 1
  • 15
  • 30
12
votes
1 answer

Spark dataframe save in single file on hdfs location

I have dataframe and i want to save in single file on hdfs location. i found the solution here Write single CSV file using spark-csv df.coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") …
shikha dubey
  • 139
  • 1
  • 1
  • 5
12
votes
5 answers

Creating a |N| x |M| matrix from a hash-table

Imagine I have a dictionary / hashtable of pairs of strings (keys) and their respective probabilities (values): import numpy as np import random import uuid # Creating the N vocabulary and M vocabulary max_word_len = 20 n_vocab_size =…
alvas
  • 115,346
  • 109
  • 446
  • 738
12
votes
1 answer

Exporting CSV data using SQLCMD.EXE

I'm trying to export data from SQL Server into CSV format. I have a bat task to do this that's run at regular intervals. Command is: SQLCMD.EXE -d [db details] -i c:\export.sql -o c:\export.csv -s"," -W The SQL file is just a SELECT * from a view.…
Tim Fountain
  • 33,093
  • 5
  • 41
  • 69
12
votes
6 answers

Efficient way to get the unique values from 2 or more columns in a Dataframe

Given a matrix from an SFrame: >>> from sframe import SFrame >>> sf =SFrame({'x':[1,1,2,5,7], 'y':[2,4,6,8,2], 'z':[2,5,8,6,2]}) >>> sf Columns: x int y int z int Rows: 5 Data: +---+---+---+ | x | y | z | +---+---+---+ | 1 | 2 |…
alvas
  • 115,346
  • 109
  • 446
  • 738
12
votes
5 answers

How to parse CSV file into an array in Android Studio

i'm wondering how to parse a CSV file and just store the contents into an array. My csv file looks something like this: 1,bulbasaur,1,7,69,64,1,1 2,ivysaur,2,10,130,142,2,1 I only want the names, so the second field. I want to store all of these…
Varun Vu
  • 305
  • 2
  • 6
  • 14
12
votes
2 answers

Pandas read_csv without knowing whether header is present

I have an input file with known columns, let's say two columns Name and Sex. Sometimes it has the header line Name,Sex, and sometimes it doesn't: 1.csv: Name,Sex John,M Leslie,F 2.csv: John,M Leslie,F Knowing the identity of the columns…
leekaiinthesky
  • 5,413
  • 4
  • 28
  • 39
12
votes
6 answers

reading and doing calculation from .dat file in python

I need to read a .dat file in python which has 12 columns in total and millions of lines of rows. I need to divide column 2,3 and 4 with column 1 for my calculation. So before I load that .dat file, do I need to delete all the other unwanted…
bhjghjh
  • 889
  • 3
  • 16
  • 42
12
votes
3 answers

what does 'rb' mean in csv files?

import csv with open('test.csv','rb') as file: rows = csv.reader(file, delimiter = ',', quotechar = '"') data = [data for data in rows] This was in Python: reading in a csv file and saving…
evtoh
  • 444
  • 3
  • 10
  • 21
12
votes
4 answers

Python pandas NameError: StringIO is not defined

I am unable to read data in Pandas: Input: import pandas as pd data = 'a,b,c\n1,2,3\n4,5,6' pd.read_csv(StringIO(data),skipinitialspace=True) Output: NameError:name 'StringIO' is not defined Please let me know why the error occurred and also let…
Abhishek
  • 515
  • 1
  • 5
  • 12
12
votes
4 answers

Pandas read_csv dtype specify all columns but one

I've a CSV file. Most of it's values I want to read as string, but I want to read a column as bool if the column with the given title exists.. Because the CSV file has a lots of columns, I don't want to specify on each column the datatype directly…
elaspog
  • 1,635
  • 3
  • 21
  • 51
12
votes
1 answer

Write Spark dataframe as CSV with partitions

I'm trying to write a dataframe in spark to an HDFS location and I expect that if I'm adding the partitionBy notation Spark will create partition (similar to writing in Parquet format) folder in form of partition_column_name=partition_value ( i.e…
Lior Baber
  • 852
  • 3
  • 11
  • 25
12
votes
3 answers

JavaScript - Convert CSV to XLSX (Preferably Without Use of Library(s))

As the title says, I currently have a CSV file created from SharePoint list data and in order to display this information as a spreadsheet, I want to convert it to an Excel XLSX file. I prefer to do this without relying on a third-party library. …
LaLaLottie
  • 393
  • 1
  • 4
  • 17