Identify binary state of data set (frequency on/off)

Question

I have a large data set that has values ranging from [-3,3] and I'm using a hard limit at 0 as the boundary.

The data has a binary value of 1 when its oscillating from -3,3 at a 56kHz frequency. What this means is that the data will be changing from -3 to 3 and back every N data values where N is typically < 20.

The data has a binary value of 0 when the data is 3 constantly (this can typically last 400+ samples long)

I cant seem to group the data into their binary categories and also know how many samples wide the group is.

Example data:

1.84    |
2.96    |
2.8     |
3.12    |
.       |  I want this to be grouped as a 0
.       |
3.11    |_____
-3.42   |
-2.45   |
-1.49   |
3.12    |
2.99    |  I want this to be grouped as a 1
1.97    |
-1.11   |
-2.33   |
.       |  
.       |  Keeps going until for N cycles

The cycles in-between the logic HIGH state are typically small (<20 samples).

The code I have so far:

state = "X"
for i in range(0, len(data['input'])):    
    currentBinaryState = inputBinaryState(data['input'][i]); # Returns -3 or +3 appropriately

    if(currentBinaryState != previousBinaryState):

        # A cycle is very unlikely to last more than 250 samples
        if y > 250 and currentBinaryState == "LOW": # Been low for a long time
            if state == "_high":
                groupedData['input'].append( ("HIGH", x) )
                x = 0

            state = "_low"

        else:
            # Is on carrier wave (logic 1)
            if state == "_low":
                # Just finished low
                groupedData['input'].append( ("LOW", x) )
                x = 0

            state = "_high"


        y = 0

Obviously, the result isn't as I should expect as the LOW group is very small.

[('HIGH', 600), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 8), ('HIGH', 1168), ('LOW', 9)]

I understand I could of asked this on the signal processing SA but I deemed this problem to be more programming oriented. I hope I explained the problem sufficiently, if there's any questions just ask. Thanks.

Here is a link to the actual sample data:

https://drive.google.com/folderview?id=0ByJDNIfaTeEfemVjSU9hNkNpQ3c&usp=sharing

Visually, it is very clear where the boundaries of the data lie.

Update 1

I've updated my code to be more legible as single letter variables isn't helping with my sanity.

previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):    
    currentBinaryState = inputBinaryState(data['input'][i]);

    if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3 

        #print sinceLastChange

        if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
            groupedData['input'].append( ("LOW", x) )
            lengthAssert += x
            x = 0
            previousGroup = "LOW"

        elif sinceLastChange > 20 and previousGroup == "LOW": # Finished HIGH group
            groupedData['input'].append( ("HIGH", x) )
            lengthAssert += x
            x = 0
            previousGroup = "HIGH"

        sinceLastChange = 0

    else:
        sinceLastChange += 1

    previousBinaryState = currentBinaryState
    x += 1

Which, for the sample data, outputs:

8
7
8
7
7
596   <- Clearly a LOW group
7
8
7
8
7
7
8
7
8
7
7
8
7
8
7
7
8
7
8
.
.
.

Problem is the HIGH group is lasting longer than it should be:

[('HIGH', 600), ('LOW', 1176), ('HIGH', 1177), ('LOW', 1176), ('HIGH', 1176), ('LOW', 1177), ('HIGH', 1176), ('LOW', 1176)]

There are only 8 groups made but the plot clearly shows a lot more. The groups appear to be twice the size of what they should be.

Currently this question is rather hard to answer. Could you give us some sample data? Even better if you can give sample data with it marked up which portion you would consider 1 and 0. The boundaries here are going to be very difficult. For example, if the data is sampled a 1 MHz and you have 400 samples at value 3, followed by 20 @ -3, 20@+3, 20@-3, 400 @ +3, does the `1` value start at sample 380, or 400? And does it end at sample 440 or 460? Or does it not matter? Also - do you have a minimum threshold for the number of samples to be considered a zero? — J Richard Snape, Aug 16 '16 at 11:20
@JRichardSnape Thanks for the comment. The exact point at which there is a boundary between 0 and 1 isn't important. It wouldn't matter if the boundary started at 380 or 400 as long as the 0 grouping encapsulates all the complete cycles (-3->+3->-3) in the sample data. The minimum threshold for which I consider the data to be in the 0 group is somewhat arbitrary. I've chosen 250 samples at -3 to be sufficient. I have attached a link to the data file and an image that shows a plot of the data. Visually, It is very clear where the boundaries lie. I hope that answers your questions. — Roy, Aug 16 '16 at 12:17

score 0 · Accepted Answer · answered Aug 16 '16 at 14:37

I've finally found a solution. I spent far too long getting my head around, what appears to be, a fairly simple problem but it works now.

It won't pick up the last group in the data set but that's fine.

previousBinaryState = "X"
x = 0
sinceLastChange = 0
previousGroup = inputBinaryState(data['input'][0])
lengthAssert = 0
for i in range(0, len(data['input'])):    
    currentBinaryState = inputBinaryState(data['input'][i]);

    if(currentBinaryState != previousBinaryState): # Changed from -3 -> +3 or +3 -> -3 

        #print sinceLastChange

        if sinceLastChange > 250 and previousGroup == "HIGH" and currentBinaryState == "LOW": # Finished LOW group
            groupedData['input'].append( ("LOW", x) )
            lengthAssert += x
            x = 0
            previousGroup = "LOW"

        sinceLastChange = 0

    else:
        if sinceLastChange > 20 and previousGroup == "LOW":
            groupedData['input'].append( ("HIGH", x) )
            lengthAssert += x
            x = 0
            previousGroup = "HIGH"
            sinceLastChange = 0

        sinceLastChange += 1

    previousBinaryState = currentBinaryState
    x += 1

20 is the maximum number of cycles in the HIGH state and 250 is the maximum number of samples for which the group is in the LOW state.

[('HIGH', 25), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574), ('HIGH', 602), ('LOW', 575), ('HIGH', 601), ('LOW', 575), ('HIGH', 602), ('LOW', 574)]

When comparing that to the graph and the actual data, it appears to be correct.

Identify binary state of data set (frequency on/off)

1 Answers1