Questions tagged [csv]

Comma-Separated Values or Character-Separated Values (CSV) is a common "flat file database" (or spreadsheet-style) format for storing tabular data in plain text, with fields separated by a special character (comma, tab, etc). Rows are typically denoted by newline characters. Use for any delimited file formats, including tab delimited (TSV)

CSV is a file format involving a plain text file with information separated by delimiters with the purpose of storing data in a table-structured format. CSV (comma separated values) files traditionally and most commonly use a comma delimiter (hence the name), but other characters can be used, such as semi-colons, tabs, pipe symbols (|), etc.

The MIME type for CSV files is text/csv.

Information is often stored in CSV format to make it easy to transfer tables of data between applications. Each row of a table is represented as a list of plain text (human-readable) values with a delimiter character between each discrete piece of data. Values may be enclosed in quotes, which is required if they contain the delimiter as a value. The first row of data often contains headers of table's columns, which describe the meaning of the data in each column.

Example

Tabular format

Time Temperature Humidity Description
08:00 70 35 Sunny and Clear
11:45 94 90 Hazy, Hot, and Humid
14:30 18 Freezing
16:00 -200 "Unliveable"

CSV format

Time,Temperature,Humidity,Description
08:00,70,35,Sunny and Clear
11:45,94,90,"Hazy, Hot, and Humid"
14:30,18,,Freezing
16:00,-200,,""Unliveable""

In this example, the first row of CSV data serves as the "header", which describes the corresponding data below it. There is no inherent way to describe within a CSV file whether the first row is a header row or not. Each successive line of the CSV file should neatly fit into the same field as the first line.

Note:

  • Empty fields (fields with no available data, such as the third field in the last line) are place-held with commas so that the fields that follow may be correctly placed.
  • Since the comma is the delimiter for fields, the commas in the Description field of the second line must be quoted (to prevent them from being interpreted as field delimiters). Wrapping the entire field in double quotes (") is the default method for protecting the delimiter character inside a field.
  • Since the double-quote is the delimiter quote character, double-quotes in the data, as in "Unliveable" on the fourth line, must also be protected. Doubling-up the double-quote is the default method for protecting the quote character inside a field.

Questions tagged are expected to relate to programming in some way, for example, parsing/importing CSV files or creating them programmatically.

Related links:

89606 questions
13
votes
4 answers

How to read a specific line number in a csv with pandas

I have a huge dataset and I am trying to read it line by line. For now, I am reading the dataset using pandas: df = pd.read_csv("mydata.csv", sep =',', nrows = 1) This function allows me to read only the first line, but how can I read the second,…
Guido Muscioni
  • 1,203
  • 3
  • 15
  • 37
13
votes
1 answer

WordListCorpusReader is not iterable

So, I am new to using Python and NLTK. I have a file called reviews.csv which consists of comments extracted from amazon. I have tokenized the contents of this csv file and written it to a file called csvfile.csv. Here's the code : from…
Aarushi Aiyyar
  • 369
  • 1
  • 5
  • 11
13
votes
1 answer

read_csv() parsing error message, how to interpret?

I am in the middle of parsing in a large amount of csv data. The data is rather "dirty" in that I have inconsistent delimiters, spurious characters and format issues that cause problems for read_csv(). My problem here, however, is not the dirtiness…
Angelo
  • 2,936
  • 5
  • 29
  • 44
13
votes
1 answer

How to import a CSV file into a BigQuery table without any column names or schema?

I'm currently writing a Java utility to import few CSV files from GCS into BigQuery. I can easily achieve this by bq load, but I wanted to do it using a Dataflow job. So I'm using Dataflow's Pipeline and ParDo transformer (returns TableRow to apply…
Vijin Paulraj
  • 4,469
  • 5
  • 39
  • 54
13
votes
1 answer

pandas read_csv not recognizing \t in tab delimited file

I'm trying to read in the following tab separated data into pandas: test.txt: col_a\tcol_b\tcol_c\tcol_d 4\t3\t2\t1 4\t3\t2\t1 I import test.txt as follows: pd.read_csv('test.txt',sep='\t') The resulting dataframe has 1 column. The \t is…
Omar
  • 329
  • 1
  • 3
  • 10
13
votes
2 answers

NameError: name 'csv' is not defined

I am new to Python, and I want to write a csv file, that lists the roots of my equation. I am working on Sage. My code is : with open('out.csv', 'w') as f: c = csv.writer(f) c.writerows(root) The error I am getting is " NameError: name…
Nicole
  • 161
  • 1
  • 1
  • 4
13
votes
7 answers

Program for working with large CSV Files

Are there any good programs for dealing with reading large CSV files? Some of the datafiles I deal with are in the 1 GB range. They have too many lines for Excel to even deal with. Using Access can be a little slow, as you have to actually import…
Kibbee
  • 65,369
  • 27
  • 142
  • 182
13
votes
0 answers

Is there a Python csv file writer that can match data.table's fwrite speed?

I want to match R's data.table::fwrite csv file writing speed in Python. Let's check some timings. First…
cryo111
  • 4,444
  • 1
  • 15
  • 37
13
votes
4 answers

Pandas is faster to load CSV than SQL

It seems that loading data from a CSV is faster than from SQL (Postgre SQL) with Pandas. (I have a SSD) Here is my test code : import pandas as pd import numpy as np start = time.time() df = pd.read_csv('foo.csv') df *= 3 duration = time.time() -…
Haelle
  • 335
  • 1
  • 3
  • 10
13
votes
5 answers

Python pandas load csv ANSI Format as UTF-8

I want to load a CSV File with pandas in Jupyter Notebooks which contains characters like ä,ö,ü,ß. When i open the csv file with Notepad++ here is one example row which causes trouble in ANSI…
MBUser
  • 415
  • 2
  • 6
  • 15
13
votes
7 answers

How to check if .xls and .csv files are empty

Question 1: How can I check if an entire .xls or .csv file is empty.This is the code I am using: try: if os.stat(fullpath).st_size > 0: readfile(fullpath) else: print "empty file" except OSError: print "No file" An empty…
bob marti
  • 1,523
  • 3
  • 11
  • 27
13
votes
3 answers

How to plot CSV data using matplotlib and pandas in python

I have a python code in which I read a csv file using pandas and store date and time in one column Datetime. Now i want to plot Sensor Value on y-axis and datatime on x-axis. How can i achieve this? My code is below: import pandas as pd import…
rushan
  • 211
  • 2
  • 6
  • 16
13
votes
4 answers

Error tokenizing data. C error: out of memory pandas python, large file csv

I have a large csv file of 3.5 go and I want to read it using pandas. This is my code: import pandas as pd tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False) df = pd.concat(tp,…
Amal Kostali Targhi
  • 907
  • 3
  • 11
  • 22
13
votes
3 answers

Reading CSV file some missing columns

I am trying to read in a CSV file into my VB.net application using the following code: While Not EOF(1) Input(1, dummy) Input(1, phone_number) Input(1, username) Input(1, product_name) Input(1, wholesale_cost) Input(1,…
charlie
  • 415
  • 4
  • 35
  • 83
13
votes
4 answers

Simple CSV to XML Conversion - Python

I am looking for a way to automate the conversion of CSV to XML. Here is an example of a CSV file, containing a list of movies: Here is the file in XML format: War,…
L Marfell
  • 339
  • 2
  • 4
  • 15