0

I am planning to request the users of the my web application to submit bulk data in the following format.

BatchID:1
TotalRecords:200

Cust:CustID~FirstName~LastName~Age~Dob~City~State~Country
1~John~Abraham~35~10/10/1974~New York~NY~US

Order:OrderID~CustID~Qty~Amount~DOS
1~1~10~100.00~1/1/2012
2~1~100~1000.00~1/1/2012
1~1~10~100.00~1/1/2012

OrderDet:OrderDetID~OrderID~Reg1~Remarks
1~1~12393A~Testing order
2~1~23123B~tesitng order 1

Above shown is just a set of records. One flat file will have upto 200 files.

Do you think this is a right way of doing it? We need to allow batch upload so that users need not enter each record at a time from the website.

If you can think of any other format that will be of great help

acadia
  • 2,619
  • 18
  • 55
  • 72
  • There, finally fitted the title. –  Jun 10 '12 at 03:50
  • I deal with CSV. It works (but I wouldn't recommend it for new data-exchange format in *most* cases). However, *please use "real" CSV*. Also using a "combined stream" approach, as per above, might require a little bit more work because most [standard] CSV parsers only know how to deal with "one table" worth of data per stream (read: file). –  Jun 10 '12 at 03:56

4 Answers4

2

I strongly recommend you use XML for a new project. It seems that you have some hierarchy to your data (e.g. Customers have Orders). That strongly supports using XML since you can reflect those relationships naturally.

If for some reason your users cannot do that, CSV is still very widely used and there's much better tool support than for ~ delimited files (for example, one can save to CSV from Excel).

I can tell you from experience that someone's order description will have a ~ character in it one day. I once worked at a company that used pipe (|) as a delimiter. Worked great until someone thought to name his company Acme ||.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • XML sounds much better, although it is not the only option. If it *is* CSV, then ... stick with *real* CSV (and separate streams). –  Jun 10 '12 at 03:51
1

XML with a defined schema would be a nice choice.

A couple problems offhand with your format:

  • What happens if the remarks contains a ~? eg. Test~ing Remarks. How to escape this?
  • Are the users of your application all in the same locale? Will they all expect to enter dates as Month/Day/Year? Or will it be Day/Month/Year? Are these UTC dates?
  • Do you need to quickly determine if the user entered "invalid" data (e.g. missing required fields?)

There are well known XML tools to help with validating what the user is uploading. It's worth considering...

I'm not a huge fan of CSV but at least there are well known tools to work with it. however, as I recall there is some variation in the CSV formats. For example, How to escape the comma? Is it by putting the string field in double quotes eg. "My first, second, third" etc. Is it by using \,? etc.

jglouie
  • 12,523
  • 6
  • 48
  • 65
  • CSV is 100% standardized (not that every implementation follows the standard perfectly). Fields that contain a comma must be quoted with ", and if there's a quote in a field that was quoted, it must be double-quoted "", e.g. "This, the ""example""" – Eric J. Jun 10 '12 at 03:53
  • @EricJ. The problem is the hypothetical format is *not* CSV *and* it combines streams ;-) –  Jun 10 '12 at 03:55
  • @jglouie Keep reading: http://tools.ietf.org/html/rfc4180 Also, a proper CSV-esque format *will* allow for escaping of the separator with the, IMOHO, acceptable form being `one,"two-a,two-b",three,`. –  Jun 10 '12 at 03:59
  • Yeah, I saw that right after I posted :). I suppose the standard exists now, but existing modules before the RFC was written may have incomplete support for it, e.g. Python's CSV support (see the comment at the top: http://docs.python.org/library/csv.html) – jglouie Jun 10 '12 at 04:02
  • @jglouie It's not a "standard", but it is well-followed by many tools. (But not by all, unfortunately. No point inventing a new format.) –  Jun 10 '12 at 04:04
  • @jglouie Having worked with almost all major data companies, there is a de facto standard. I have never had issues interchanging data between 20+ companies. Guess they all follow RFC 4180, though I never had to look up any RFC to produce and consume compatible data. – Eric J. Jun 10 '12 at 18:20
0

In general:

  • If both sender and receiver use the same technology (for example, bulk import to Microsoft SQL Server) use whatever technology-specific format provides optimal performance.

  • If you want maximum flexibility, then XML is your best choice. Creating a matching XSchema, and validating your exports and imports, isn't a bad idea either.

  • Otherwise, "whatever works". CSV is a pop favorite, provided your data lends itself to a simple, homegeneous table format.

All things being equal, I'd probably vote "XML".

IMHO .. PSM

paulsm4
  • 114,292
  • 17
  • 138
  • 190
0

I notice nobody said JSON. So if an actual CSV is inapplicable, use JSON. It's tidier than XML and shorter, too. And, if you expect your users to understand your custom format, they'll definitely be able to deal with JSON without an XML-Schema-friendly editor.

zmbq
  • 38,013
  • 14
  • 101
  • 171