3

Please consider:

dalist={{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 
       {2.88`, 2.04`, 4.64`,0.56`, 4.92`, 2.06`, 3.46`, 2.68`, 2.72`,0.820},   
       {"Laura1", "Laura1", "Laura1", "Laura1", "Laura1", 
       "Laura1", "Laura1", "Laura1", "Laura1","Laura1"}, 
       {"RIGHT", 0, 1, 15.1`, 0.36`, 505, 20.059375`,15.178125`, ".", "."}}

enter image description here

The actual dataset is about 6 000 rows and 147 columns. However the above reflects its content. I would like to compute some basic statistics, such as the mean. My attempt:

Table[Mean@dalist[[colNO]], {colNO, 1, 4}]

enter image description here

How could I create a function such as to:

  • Avoid non-numerical values and

  • Count the number of non numerical values found in each lists.

I have not succeeded in finding the right pattern mechanism yet.

Nayuki
  • 17,911
  • 6
  • 53
  • 80
500
  • 6,509
  • 8
  • 46
  • 80

3 Answers3

6

First observation: you could use Mean /@ dalist if you wanted to average across rows. You don't need a Table function here.

Try using Cases (documentation), eg. Mean /@ (Cases[#,_?NumericQ] & /@ dalist)

If you want to be tricky and eliminate rows from your data that have no numeric elements (eg your third column), try the following. It first picks only the rows that have some numeric elements, and then takes only the numeric elements from those rows.

Mean /@ (Cases[#,_?NumericQ] & /@ (Cases[dalist, {___,_?NumericQ,___}]))

To count the non-numeric elements, you would use a similar approach:

Length /@ (Cases[#,Except[_?NumericQ]] & /@ dalist)

This answer has the caveat that I typed it out without the benefit of a Mathematica installation to actually check my syntax. Some typos could remeain.

Verbeia
  • 4,400
  • 2
  • 23
  • 44
  • Thank You very much ! It is perfect. And without a .nb ! – 500 Aug 17 '11 at 23:17
  • 3
    Instead of `Cases` you could also use `Select[#, NumericQ] &`. The use of a test is natural for `Select` where `Case` uses patterns that in this case have to be converted to tests using `PatternTest (?)` – Sjoerd C. de Vries Aug 17 '11 at 23:17
  • @Sjoerd, for some obscure resins to me, your code works with Select[#, NumericQ] &@dalist[[2]] for example. This solution seems optimal in my case, thanks. – 500 Aug 17 '11 at 23:31
  • +1 @Sjoerd - I just trust myself with `Cases` more than `Select` when typing blind :) – Verbeia Aug 17 '11 at 23:57
  • 2
    Alternatively, to count non-numeric terms, `Count[#, Except[_? NumericQ]] & /@ dalist` – 681234 Aug 18 '11 at 07:59
  • (Modified) from Mma 7 help on the (now-obsolete?) DataManipulation package. To get the mean of all rows **which do not contain** non-numeric data: `Mean /@ Select[dalist, VectorQ[#, NumericQ[#] &] &]`. The original entry uses `Select[mylist, VectorQ[#, NumberQ[N[#]] &] &]`. I can't see any advantage of NumberQ[N[#]] over NumericQ[#] but perhaps there is one? – 681234 Aug 18 '11 at 15:07
  • @TomD, I think this is an independent answer? OP hasn't accepted any yet so you may as well put it up as an answer. – Verbeia Aug 18 '11 at 23:07
3

Here is a variation of Verbeia's answer that you may consider.

Assuming that this is a rectangular array (all rows are the same length), then setting d to the row length (which can be found with Dimensions):

d = 10;

{d - Length@#, Mean@#} &@Select[#, NumericQ] & /@ dalist
(* Out: *) {{0, 11/2}, {0, 2.678}, {10, Mean[{}]}, {3, 79.5282}}

That is, pairs of {number_of_non-numeric, average}.

Mean[{}] appears where there are no numeric values to average. This could be removed from the list with DeleteCases but the results would no longer align with the rows of dalist. I think it would be better to use something like: /. Mean[{}] -> "NO AVERAGE" if needed.

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
2

The key to answering your question is the NumberQ function: "*NumberQ[expr] gives True if expr is a number, and False otherwise."

To compute the mean of only numeric elements in each list:

Map[Function[lst, Mean[Select[lst, NumberQ]]], dalist]

To count the number of non-numeric elements in each list:

Map[Function[lst, Length[Select[lst, Function[x, !NumberQ[x]]]]], dalist]
Nayuki
  • 17,911
  • 6
  • 53
  • 80