Scientific Problems for Python Coding Dojos

Question

We are organizing a Coding Dojo of scientific applications in the Brazilian Python Community, the main goals are: improve our skills in Numpy (and some others scientific libs); improve the use of TDD in this kind of applications; and better understand of limitations of these APIs.

I'm looking for problems that fit these goals (mainly using Numpy). Any suggestions?

Update 1:

It's a randori coding dojo.

We don't have preferences for a specific area (mostly work in different areas), and since this is ours first "scientific dojo" we don't know exactly what is the best kind of problems for a sci-dojo.

Anyway, the problems must be small, probably we will need to explain the theory behind the problem, so, they also can't be complex (unless in special occasions). An example: implement a multivariate normal function

Summary for the future generation:

Principal component analysis (PCA) for projecting a set of data on a 2D plan.
Implementing a part of speech tagger using Vitterbi algorithm.
Picture color quantification using a mixture of gaussian, and the EM algorithm (Using scikit?)
Simulating stochastic partial differential equation.
Implement a Multivariate Normal Function.
... What else? ...

Just a small question. How should the problem be? You want to present it or the audience should solve it by themselves? Basically how big should you problem be? — ahelm, Jul 15 '11 at 08:57
The problems should be small, it's a randori dojo, so everybody will code trying to solve. An example: implement a [multivariate normal function](http://en.wikipedia.org/wiki/Multivariate_normal_distribution) — renatopp, Jul 15 '11 at 11:06

score 3 · Answer 1 · answered Jul 15 '11 at 11:56

3

Software Carpentry, a set of educational materials for scientific computing, is mostly in Python and has a number of well thought out example problems.

answered Jul 15 '11 at 11:56

Jonathan Dursi

50,107
9
127
158

score 3 · Answer 2 · answered Jul 15 '11 at 18:15

You sold take a look on this lecture of the MIT. Back in the days I learned some new stuff and also learned how to deal with python. They have some simple examples of different things and present the basic idea of computations.

My Point of view is that you should implement some examples of SciPy cookbook and also some Numpy examples. Doing some scientific stuff without NumPy/SciPy would be impossible. Also the implementations of methods which are already available with NumPy like multivariate normal distribution is a waste of time and inefficient. I would say, use some calculations like Newton-Iterations or something equal which is easy to program and looks good in python. There is also a small book which is perfect for your course. It's about using python for science. This book deals with Numpy/Scipy, Matplotlib and other examples which are important for scientist. The things presented there are useful, but I didn't find it via Google. I will search for you in my small library but it may take some time (it's somewhere there - I know it).

Hope this helps you.

WOW, scipy cookbook is an amazing base! I'm waiting for your book tip =] — renatopp, Jul 20 '11 at 20:46
Didn't found the book so far. Maybe you want to take a look inside [Python Scripting for Computational Science](http://www.amazon.com/reader/3642093159/ref=rdr_sb_li_hist_1&state=01111#reader_3642093159) to get some ideas want kind of problems you can present to the audience. It's a book which deals with different topics. Didn't read this book totally but some parts of it and they helped to improve my knowledge ;) — ahelm, Jul 26 '11 at 22:07

score 2 · Answer 3 · answered Jul 15 '11 at 10:13

You don't mention what code kata resources you're using and why they're not suitable.

Many code kata postings are just fine for this kind of thing.

To create a new code kata for scientists, you need to brainstorm the kinds of things that are common data processing tasks. You need a bunch of user stories from which you can derive a good code kata.

Working with actuaries, for example, I spend a lot of time reading raw source data, filtering, cleansing, organizing and summarizing. Often, in a single, short Python application that uses CSV a few if-statements, a few dictionaries and a final print-loop.

Often, I can bang one of these out in an hour or two, depending on the complexity and the number of tests I have to write to be sure anything good will happen.

I'll update the post with some infos. We don't have preferences for a specific area (mostly work in different areas), and since this is ours first "scientific dojo" we don't know exactly what is the best kind of problems for a dojo. I'll take your suggestions of problems in filtering, cleansing, organizing or summarizing data, have you in mind some easy algorithms in these domains? — renatopp, Jul 15 '11 at 11:27

fulmicoton · Answer 4 · 2011-07-16T00:48:59.423

1

How long do you need these to be?

PCA for projecting a set of data on a 2D plan
Implementing a part of speech tagger using Vitterbi algorithm.
Picture color quantification using a mixture of gaussian, and the EM algorithm
Simulating stochastic partial differential equation.

edited Jul 16 '11 at 00:48

answered Jul 15 '11 at 10:32

fulmicoton

15,502
9
54
74

The problems must be small, probably we will need to explain the theory behind the problem, so, also they can't be complex (unless in special occasions) - I specially liked the Viterbi algorithm, and we can try mixtures models with python-scikit. I noted all your suggestions =] – renatopp Jul 15 '11 at 11:17

Scientific Problems for Python Coding Dojos

4 Answers4