Questions tagged [csv]

Comma-Separated Values or Character-Separated Values (CSV) is a common "flat file database" (or spreadsheet-style) format for storing tabular data in plain text, with fields separated by a special character (comma, tab, etc). Rows are typically denoted by newline characters. Use for any delimited file formats, including tab delimited (TSV)

CSV is a file format involving a plain text file with information separated by delimiters with the purpose of storing data in a table-structured format. CSV (comma separated values) files traditionally and most commonly use a comma delimiter (hence the name), but other characters can be used, such as semi-colons, tabs, pipe symbols (|), etc.

The MIME type for CSV files is text/csv.

Information is often stored in CSV format to make it easy to transfer tables of data between applications. Each row of a table is represented as a list of plain text (human-readable) values with a delimiter character between each discrete piece of data. Values may be enclosed in quotes, which is required if they contain the delimiter as a value. The first row of data often contains headers of table's columns, which describe the meaning of the data in each column.

Example

Tabular format

Time Temperature Humidity Description
08:00 70 35 Sunny and Clear
11:45 94 90 Hazy, Hot, and Humid
14:30 18 Freezing
16:00 -200 "Unliveable"

CSV format

Time,Temperature,Humidity,Description
08:00,70,35,Sunny and Clear
11:45,94,90,"Hazy, Hot, and Humid"
14:30,18,,Freezing
16:00,-200,,""Unliveable""

In this example, the first row of CSV data serves as the "header", which describes the corresponding data below it. There is no inherent way to describe within a CSV file whether the first row is a header row or not. Each successive line of the CSV file should neatly fit into the same field as the first line.

Note:

  • Empty fields (fields with no available data, such as the third field in the last line) are place-held with commas so that the fields that follow may be correctly placed.
  • Since the comma is the delimiter for fields, the commas in the Description field of the second line must be quoted (to prevent them from being interpreted as field delimiters). Wrapping the entire field in double quotes (") is the default method for protecting the delimiter character inside a field.
  • Since the double-quote is the delimiter quote character, double-quotes in the data, as in "Unliveable" on the fourth line, must also be protected. Doubling-up the double-quote is the default method for protecting the quote character inside a field.

Questions tagged are expected to relate to programming in some way, for example, parsing/importing CSV files or creating them programmatically.

Related links:

89606 questions
131
votes
5 answers

Are there any CSV readers/writer libraries in C#?

Are there any CSV readers/writer libraries in C#?
mrblah
  • 99,669
  • 140
  • 310
  • 420
130
votes
12 answers

SQL split values to multiple rows

I have table : id | name 1 | a,b,c 2 | b i want output like this : id | name 1 | a 1 | b 1 | c 2 | b
AFD
  • 1,341
  • 2
  • 9
  • 8
129
votes
18 answers

Convert specified column in a multi-line string into single comma-separated line

Let's say I have the following string: something1: +12.0 (some unnecessary trailing data (this must go)) something2: +15.5 (some more unnecessary trailing data) something4: +9.0 (some other unnecessary data) something1: +13.5 …
Alex Coplan
  • 13,211
  • 19
  • 77
  • 138
129
votes
4 answers

How to write header row with csv.DictWriter?

Assume I have a csv.DictReader object and I want to write it out as a CSV file. How can I do this? I know that I can write the rows of data like this: dr = csv.DictReader(open(f), delimiter='\t') # process my dr object # ... # write out…
user248237
129
votes
5 answers

Read specific columns with pandas or other python module

I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner). Let's say I want 2 columns: 59 which in the header is star_name 60 which in the header is…
129
votes
9 answers

Filter string data based on its string length

I like to filter out data whose string length is not equal to 10. If I try to filter out any row whose column A's or B's string length is not equal to 10, I tried this. df=pd.read_csv('filex.csv') df.A=df.A.apply(lambda x: x if len(x)== 10 else…
notilas
  • 2,323
  • 4
  • 23
  • 36
129
votes
2 answers

Way to read first few lines for pandas dataframe

Is there a built-in way to use read_csv to read only the first n lines of a file without knowing the length of the lines ahead of time? I have a large file that takes a long time to read, and occasionally only want to use the first, say, 20 lines to…
beardc
  • 20,283
  • 17
  • 76
  • 94
128
votes
7 answers

Subset of rows containing NA (missing) values in a chosen column of a data frame

We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for…
John
  • 1,401
  • 2
  • 11
  • 12
128
votes
5 answers

How to use the CSV MIME-type?

In a web application I am working on, the user can click on a link to a CSV file. There is no header set for the mime-type, so the browser just renders it as text. I would like for this file to be sent as a .csv file, so the user can directly open…
theman_on_vista
128
votes
4 answers

Why does csvwriter.writerow() put a comma after each character?

This code opens the URL and appends the /names at the end and opens the page and prints the string to test1.csv: import urllib2 import re import csv url = ("http://www.example.com") bios = [u'/name1', u'/name2', u'/name3'] csvwriter =…
Zeynel
  • 13,145
  • 31
  • 100
  • 145
127
votes
4 answers

pandas read_csv and filter columns with usecols

I have a csv file which isn't coming in correctly with pandas.read_csv when I filter the columns with usecols and use multiple indexes. import pandas as pd csv = r"""dummy,date,loc,x bar,20090101,a,1 bar,20090102,a,3 bar,20090103,a,5 …
chip
  • 2,262
  • 2
  • 20
  • 24
126
votes
13 answers

Load CSV file with PySpark

I'm new to Spark and I'm trying to read CSV data from a file with Spark. Here's what I am doing : sc.textFile('file.csv') .map(lambda line: (line.split(',')[0], line.split(',')[1])) .collect() I would expect this call to give me a list of…
Kernael
  • 3,270
  • 4
  • 22
  • 42
125
votes
9 answers

CSV new-line character seen in unquoted field error

the following code worked until today when I imported from a Windows machine and got this error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? import csv class CSV: def __init__(self,…
GrantU
  • 6,325
  • 16
  • 59
  • 89
125
votes
9 answers

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one: import csv with open('some.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row) I only want to read one line of data and enter it into…
andrebruton
  • 2,268
  • 4
  • 29
  • 36
124
votes
5 answers

How should I escape commas and speech marks in CSV files so they work in Excel?

I'm generating a CSV file (delimited by commas rather than tabs). My users will most likely open the CSV file in Excel by double clicking it. My data may contain commas and speech marks, so I'm escaping those as follows. Reference, Title,…
centralscru
  • 6,580
  • 3
  • 32
  • 43