0

I need to write some scripts to carry out some tasks on my server (running Ubuntu server 8.04 TLS). The tasks are to be run periodically, so I will be running the scripts as cron jobs.

I have divided the tasks into "group A" and "group B" - because (in my mind at least), they are a bit different.

Task Group A

  1. import data from a file and possibly reformat it - by reformatting, I mean doing things like santizing the data, possibly normalizing it and or running calculations on 'columns' of the data

  2. Import the munged data into a database. For now, I am mostly using mySQL for the vast majority of imports - although some files will be imported into a sqlLite database.

Note: The files will be mostly text files, although some of the files are in a binary format (my own proprietary format, written by a C++ application I developed).

Task Group B

  1. Extract data from the database
  2. Perform calculations on the data and either insert or update tables in the database.

My coding experience is is primarily as a C/C++ developer, although I have been using PHP as well for the last 2 years or so (+ a few other languages which are not relevant for the purpose of this question). I am from a Windows background, so I am still finding my feet in the Linux environment.

My question is this - I need to write scripts to perform the tasks I described above. Although I suppose I could write a few C++ applications to be used in the shell scripts, I think it may be better to write them in a scripting language, but this may be a flawed assumption. My thinking is that it would be easier to modify things in a script - no need to rebuild etc for changes to functionality. Additionally, C++ data munging in C++ tends to involve more lines of code than "natural" scripting languages such as Perl, Python etc.

Assuming that the majority of people on here agree that scripting is the way to go, herein lies my dilemma. Which scripting language do I use to perform the tasks above (giving my background)?

My gut instinct tells me that Perl (shudder) would be the most obvious choice for performing all of the above tasks. BUT (and that is a big BUT). The mere mention of Perl makes my toes curl, as I had a very, very bad experience with it a while back (bought the Perl Camel book + 'data munging with Perl' many years ago, but could still not 'grok' it just felt too alien. The syntax seems quite unnatural to me - despite how many times I have tried to learn it - so if possible, I would really like to give it a miss. PHP (which I already know), also am not sure is a good candidate for scripting on the CLI (I have not seen many examples on how to do this etc - so I may be wrong).

The last thing I must mention is that IF I have to learn a new language in order to do this, I cannot afford (time constraint) to spend more than a day, in learning the key commands/features required in order to do this (I can always learn the details of the language later, once I have actually deployed the scripts).

So, which scripting language would you recommend (PHP, Python, Perl, [insert your favorite here]) - and most importantly WHY? Or, should I just stick to writing little C++ applications that I call in a shell script?

Lastly, if you have suggested a scripting language, can you please show with a FEW lines (Perl mongers - I'm looking in your direction [nothing too cryptic!]) how I can use the language you suggested to do what I am trying to do i.e.

  • load a CSV file into some kind of data structure where you can access data columns easily for data manipulation
  • dump the columnar data into a mySQL table
  • load data from mySQL table into a data structure that allows columns/rows to be accessed in the scripting language

Hopefully, the snippets will allow me to quickly spot the languages that will pose the steepest learning curve for me - as well as those that simple, elegant and efficient (hopefully those two criteria [elegance and shallow learning curve] are not orthogonal - though I suspect they might be).

tmthydvnprt
  • 10,398
  • 8
  • 52
  • 72
morpheous
  • 16,270
  • 32
  • 89
  • 120
  • 6
    "I cannot afford (time constraint) to spend more than a day, in learning the key commands/features required in order to do this" == Act in Haste, Repent at Leisure. Always a good policy. – S.Lott May 14 '10 at 11:44
  • Yes. Such is life. One sometimes has to make difficult choices. There is a gulf of difference between theoretically "clean" solutions and pragmatic ones (i.e. what happens in the vast majority of cases in the REAL world) - note however, that I stated that I can always learn the details of the language later, once I have actually deployed the scripts. – morpheous May 14 '10 at 12:07
  • 4
    Unrelated but - If I remember correctly, walking on two legs seemed extremely unnatural for me. I had a very very bad experience when I fell down - flat on my face. You see, I was used to crawling on all fours. But, I realized that I could travel greater distances and move faster if I walked/ran instead of crawling. Now, my toes curl with joy! - A Perl enthusiast! :) – Susheel Javadi May 14 '10 at 13:04
  • "Show me a few lines how I can use the lanuguage [...] to do what I want to do"? You haven't even clearly stated what you're trying to do. Your description of what kind of data and what kind of processing you're dealing with is extremely vague. That's very hard to encode in a sample program. – tsee May 14 '10 at 15:56
  • 1
    @morpheous: Who's talking about 'theoretically "clean" solutions'? Learn a language in a day is going to end badly. There's no 'theoretically "clean" solutions' to learning a language. It takes time and effort and sound engineering judgement. Not theory. Pragmatic hard work. Learning the details "later" after you've made of host of mistakes won't happen. You'll be moved to a project where you won't do as much damage and be replaced by someone who does things pragmatically (i.e., learning the solution architecture and language). – S.Lott May 14 '10 at 17:18
  • 1
    IME, **Programming Perl** is a pretty hefty read if you just want to learn some basic Perl and get some work done ASAP. **Learning Perl** is a better resource. An experienced programmer can work through it in a couple of days and be productive. Don't get me wrong, **PP** is a great book, but it works better if you already have some Perl under your belt. – daotoad May 14 '10 at 17:59
  • Sorry that "Data Munging with Perl" gave you a hard time :-/ – Dave Cross Mar 10 '12 at 11:40

3 Answers3

4

Well, I was you a few years back. Didn't like Perl at all and would re-write any scripts my peers wrote in Perl back to Python - because I could not stand Perl. Long story short - let's just say I am fairly conversant with Perl now. I would recommend a book called "Impatient Perl" which explains the really important stuff quite nicely and which converted me to Perl. :) Another thing, is to install the Perl documentation on your computer - this was really important for me - easy and quick access to sample code, etc.

Teaser Script for Task A - to read a file, format it and then write to the database.

use autodie qw(:all);
use Text::CSV_XS ();
use DBI ();

my $csv = Text::CSV_XS->new({binary => 1}) 
  or die 'Cannot use CSV: ' . Text::CSV->error_diag;

{
    my $database_handle = DBI->connect(
        'dbi:SQLite:dbname=some_database_file.sqlite', undef, undef, {
            RaiseError => 1,
            AutoCommit => 1,
        },
    );
    $database_handle->do(
        q{CREATE TABLE something_table_or_other ('foo' CHAR(10), 'bar' CHAR(10), 'baz' CHAR(10), 'quux' CHAR(10), 'blah' CHAR(10))}
    );

    my $statement_handle = $database_handle->prepare(
        q{INSERT INTO something_table_or_other ('foo', 'bar', 'baz', 'quux', 'blah') VALUES (?, ?, ?, ?, ?)}
    );

    {
        open my $file_handle, '<:encoding(utf8)', 'data.csv';
        while (my $columns_aref = $csv->getline($file_handle)) {
            my @columns = @{ $columns_aref };

            # sanitize the columns - maybe substitute commas, numbers, etc.
            for (@columns) {
                s{,}{};  # substitutes commas with nothing
            }

            # insert columns into database now, using placeholders
            $statement_handle->execute(@columns);
        }
    }
}

Note: Given your current distaste for Perl, I would well recommend you do the above "tasks" in any programming language you are comfortable in. The above is only an attempt to show you that it might not be so cryptic after all. You get to be cryptic when you don't want to repeat yourself! :)

daxim
  • 39,270
  • 4
  • 65
  • 132
Susheel Javadi
  • 3,034
  • 3
  • 32
  • 34
  • 1
    And similar to what the python poster recommends, if the data is CSV or binary or fixedlength, there are libraries (Text::CSV_XS, Parse::Binary, Parse::FixedLength) and functions (pack, unpack) to easily deal with those also. Oh, and also similar to what that poster recommends, let me just add, "Perl excels at this" (as does almost any scripting language). – runrig May 14 '10 at 15:07
  • 2
    Replaced it with exemplary modern code that actually works. – daxim May 14 '10 at 16:28
  • Thanks, daxim! I was planning to do that tonight. :) – tsee May 14 '10 at 16:41
3

import data from a file and possibly reformat it

Python excels at this. Be sure to read up on the csv module so you don't waste time inventing it yourself.

For binary data, you may have to use the struct module. [If you wrote the C++ program that produces the binary data, consider rewriting that program to stop using binary data. Your life will be simpler in the long run. Disk storage is cheaper than your time; highly compressed binary formats are more cost than value.]

Import the munged data into a database. Extract data from the database Perform calculations on the data and either insert or update tables in the database.

Use the mysqldb module for MySQL. SQLite is built-in to Python.

Often, you'll want to use Object-Relational mapping rather than write your own SQL. Look at sqlobject and sqlalchemy for this.

Also, before doing too much of this, buy a good book on data warehousing. Your two "task groups" sound like you're starting down the data warehousing road. It's easy to get this all fouled up through poor database design. Learn what a "Star Schema" is before you do anything else.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • @s.lott: " sound like you're starting down the data warehousing road". I like it when people are able to "reverse engineer" scenarios like this. Indeed you are right, it is a kind of "poor mans's data warehouse" I am building – morpheous May 14 '10 at 11:58
  • 1
    Perl also excels at this. If, however, you don't care for Perl, then go ahead and use Python. Or Ruby. – runrig May 14 '10 at 22:19
1

I'd go with Python or Ruby. You will most likely find them much faster/easier to pick up than Perl, and they are still very powerful/efficient languages in their own right for "data munging". You should be able to pick up either of them in a day or less, not counting looking up random library functions every so often.

To pick up Python quick: http://diveintopython3.ep.io/

I personally can't recommend a Ruby tutorial myself, but I'm sure others can chime in with good options.

If you want to play around with either, http://www.trypython.org and http://www.tryruby.org each host online interactive-shell versions of the interpreters for their respective languages.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Amber
  • 507,862
  • 82
  • 626
  • 550
  • I suspected Python would feature in the least. I am not too averse to it, as I have played with it in the past. Early days yet. I'll wait and see if a consensus opinion emerges. Thanks for your valued input – morpheous May 14 '10 at 12:00
  • 1
    Could those who downvoted this please have the courtesy to comment on why? – Amber May 14 '10 at 19:30
  • 1
    Probably for unsupported opinions. s/Go with/I would go with/, s/You will most likely/I found Python/, and delete "coming from a C/C++ background" completely (although if you come/came from a C/C++ background, and really feel that this somehow was a factor in the time that it took to pick up Python vs. Perl, then feel free to edit as appropriate). – runrig May 14 '10 at 22:18
  • 1
    runrig: I don't really see my answer as having much more in the way of "unsupported opinions" than other answers which were not downvoted, and I've done my best to give links to resources that would help the OP decide for their self. I'd personally assume that any answer on this site for this kind of question is going to be an opinion, given that there's almost never a single answer to the question of "what language should I write this in?" – Amber May 15 '10 at 03:53
  • 2
    Also, I think the "you will most likely" comment is appropriate *given that* the OP already stated they had trouble wrapping their head around Perl, but also is experienced with C/C++ and thus doesn't have trouble with programming in general, just Perl specifically. – Amber May 15 '10 at 03:55