The issue is not just which data type will hold the smallest value, but also what is the most efficient way to access bit-level memory.
With my limited knowledge I might try setting up a bit-array of ints (which are, from my understanding, the most efficient way to access memory for bit-arrays; I may be mistaken in my understanding, but the same principles apply if there is a better one), then using bit-wise operators to write/read.
Here are some partial codes that should give you an idea of how to proceed with 2-bit definitions and a large array of ints.
Assuming a pointer (a) set to a large array of ints:
unsinged int *a, dna[large number];
a = dna;
*a = 0;
Setting up bit definitions:
For A:
da = 0;
da = ~da;
da = da << 2;
da = ~da; (11)
For G:
dg = 0;
dg = ~dg;
dg = dg << 1;
dg = ~dg;
dg = dg << 1; (10);
and so on for T and C
For the loop:
while ((b = getchar())!=EOF){
i = sizeof(int)*8; /*bytes into bits*/
if (i-= 2 > 0){ /*keeping track of how much unused memory is left in int*/
if (b =='a' || b == 'A')
*a = *a | da;
else if (b == 't' || b == 'T')
*a = *a | ta;
else if (t...
else if (g...
else
error;
*a = *a << 2;
} else{
*++a = 0; /*advance to next 32-bit set*/
i = sizeof(int)*8 /* it may be more efficient to set this value aside earlier, I don't honestly know enough to know this yet*/
if (b == 'a'...
else if (b == 't'...
...
else
error;
*a = *a <<2;
}
}
And so on. This will store 32 bits for each int (or 16 of letters). For array size maximums, see The maximum size of an array in C.
I am speaking only from a novice C perspective. I would think that a machine language would do a better job of what you are asking for specifically, though I'm certain there are high-level solutions out there. I know that FORTRAN is a well-regarded when it comes to the sciences, but I understand that it is so due to its computational speed, not necessarily because of its efficient storage (though I'm sure it's not lacking there); an interesting read here: http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/. I would also look into compression, though I sadly have not learned much of it myself.
A source I turned to when I was looking into bit-arrays:
http://www.mathcs.emory.edu/~cheung/Courses/255/Syllabus/1-C-intro/bit-array.html