2

I have some data, and am trying to keep track of various issues with paperwork for work. The data looks something like this:

ID      |     Paperwork 1    |    Paperwork 2    |    Paperwork 3
1       |     No Signature   |     No Signature  |      Missing Date
2       |     Completed      |      No Signature |     Missing Signature
3       |      Missing Date  |     Completed     |       Completed
4       |      None          |    Completed      |       Missing Signature

Over N categories of paperwork, and M different ID's or rows, and X different possible responses in each field, I am trying to determine what response for which ID and which category of paperwork is where.

I've thought of two different table designs.

Two Column Design:

ID    |    Paperwork
1     |      1
2     |      4
3     |      10
4     |     257

With the corresponding code base

0 = No Paperwork 1
1 = No Paperwork 2
2 = No Paperwork 3
4 = Paperwork 1 Submitted Successfully
8 = Paperwork 2 Submitted Successfully
16 = Paperwork 3 Submitted Successfully
32 = Paperwork 1 Missing Date
64 = Paperwork 2 Missing Date
128 = Paperwork 3 Missing Date
256 = Paperwork 1 Missing Signature
512 = Paperwork 2 Missing Signature
1024 = Paperwork 3 Missing Signature

So that we know simply by looking at whatever number is in the paperwork column, what the issues are with the different paperwork for the period. In the two column design, the errors would then be:

Two Column Design:

ID    |    Paperwork
1     |     No Paperwork 2
2     |     Paperwork 1 Submitted Successfully, No Paperwork 2
3     |     Paperwork 2 Submitted Successfully, No Paperwork 3
4     |     Paperwork 1 Missing Signature, No Paperwork 2

The issues with this design are: 1) It is easy to get the numbers to put in the paperwork column - someone needs to look over all of the paperwork from that given period, and note the various issues or lack of issues with each piece of paperwork. Once those issues are found, a number is assigned to each issue, and then those numbers are summed up to find an overall number which encapsulates all of the issues in that given period. However, I don't know how to make the text outputs from those specific numbers, and I don't want to brute force a list of all the potential text outputs because there are way too many different possibilities. It can be narrowed somewhat, as you cannot have a completed piece of paperwork that's missing a signature, but that sounds like way too many possibilities So my question is: If I get the number 257, how can I tell that 257 is really 256+1? Do I need to, or am I making this unnecessarily complicated?

Alternatively, another way I have thought up was to include all errors in a more binary variable. So instead of the error codes listed above, it would look like:

Many Columns

0 = No Paperwork 
1 = Paperwork Completed
2 = There is an error with Paperwork 1
3 = There is an error with Paperwork 2
4 = There is an error with Paperwork 3

So the table would then be:

ID      |     Paperwork 1    |    Paperwork 2    |    Paperwork 3
1       |     2              |     3             |      4
2       |     1              |     3             |      4  
3       |     2              |     1             |      1
4       |     0              |     1             |      4

And then when I am trying to run a report on the various problems with the paperwork, all I need to do is have a lookup function that counts how many instances of the number 2 there are in paperwork 1 column, and find the corresponding ID with it. I would then give this information to my co-worker and boss, and then they would in turn know that there is an issue with this piece of paperwork for this person; but they would not know specifically what the issue is, and would need to be relying on firsthand knowledge of the books themselves.

The problem with this design is that it implicitly requires people to check with my co-worker who administers the books to find out what the issue is with the paperwork, and the whole point of my involvement in the exercise is to help her with her work and to use the power of computers to help her with her task. So the second design doesn't seemingly use the full potential of computer technology to help with tasks.

So beyond the question posed before, I wholly welcome any questions, points, or directions in this matter. If you have an alternative table design, I welcome that too.

I apologize in advance if this post has violated any stackoverflow policies about not being direct, I tried to make it as direct as I could.

user3097236
  • 37
  • 1
  • 8
  • What is your question? Why is this tagged "math"? – Gordon Linoff May 17 '15 at 15:48
  • The question is highlighted in bold. It's math because I know that it has to do with numbers, powers of 2, combining powers of 2 together to form a greater number, and dissecting numbers to find their constituent parts. So for instance, I'm trying to find a scheme which shows that: 257 = 256 + 1 . While 257 also = 200+57, I don't care about that fact for this exercise. I'm trying to use the powers of computation to easily aggregate and dis-aggregate data. – user3097236 May 17 '15 at 17:01
  • So I'm wondering if I can do this in vba. I've used vba, but never in Microsoft excel - basically to find what the parts of 17 are, I'd find the highest power of 2 that's less than 17 (16), make a note of it, subtract 16 from 17, then repeat. The next iteration of the loop would bring out 1. So I could reasonably see a string generated which with an input of 17, would output 16,1. – user3097236 May 17 '15 at 19:05

1 Answers1

0

To answer your question about "disaggregating" 257 (i.e. basically finding its binary representation), you'll probably want to use bitwise operations: http://www.excely.com/excel-vba/bit-operations.shtml. However, your second idea seems much better, as it allows you to see the data more explicitly. Ideally, though, your process will likely be even better and easier to maintain if you move to something more advanced than Excel, such as using a python script and a csv file (or even a database, but that may require more technical knowledge and maintenance).

arghbleargh
  • 3,090
  • 21
  • 13