0

I have a non-uniform array 'A'.

A = [1,3,2,4,..., 12002, 13242, ...]

I want to explore how many elements from the array 'A' have values above certain threshold values.

For example, there are 1000 elements that have values larger than 1200, so I want to plot the number of elements that have values larger than 1200. Also, there are other 1500 elements that have values larger than 110 (this includes the 1000 elements, whose values are larger than 1200).

This is a rather large data set, so I would not like to omit any kind of information.

Then, I want to plot the number of elements 'N' above a value A vs. Log (A), i.e.

**'Log N(> A)" vs. 'Log (A)'**.

I thought of binning the data, but I was rather unsuccessful. I haven't done that much statistics in python, so I was wondering if there is a good way to plot this data?

Thanks in advance.

Victor
  • 1,014
  • 1
  • 9
  • 11

1 Answers1

0

Let me take another crack at what we have:

A = [1, 3, 2, 4, ..., 12002, 13242, ...]

# This is a List of 12,000 zeros.
num_above = [0]*(12000)

# Notice how we can re-write this for-loop!
for i in B:
    num_above = [val+1 if key <= i else val for key,val in enumerate(num_above)]

I believe this is what you want. The final list num_above will be such that for num_above[5] equals the number of elements in A that are above 5.

Explanation::

That last line is where all the magic happens. It goes through elements in A (i)and adds one to all the elements in num_above whose index is less than i.

The enumerate(A) statement is an enumerator that generates an iterator of tuples that include the keys and values of all the elements in A: (0,1) (1,3) -> (2,2) -> (3,4) -> ...

Also, the num_above = [x for y in List] statement is known as List Comprehension, and is a really powerful tool in Python.

Improvements: I see you already modified your question to include these changes, but I think they were important.

  1. I removed the numpy dependency. When possible, removing dependencies reduces the complexity of projects, especially larger projects.
  2. I also removed the original list A. This could be replaced with something that was basically like A = range(12000).
john_science
  • 6,325
  • 6
  • 43
  • 60
  • Thanks for the help! I read the question one more time, and it was a little bit confusing. I have edited the questions again. Thanks! – Victor Jun 28 '13 at 20:31
  • "I removed the numpy dependency. When possible, removing dependencies reduces the complexity of projects, especially larger projects." I disagree - this is _screaming_ out for a `numpy` array: `count = (A > thresh).sum()`. – ali_m Jun 28 '13 at 23:34