If you take the interval between every timestamp and the previous one and express it in microseconds (i.e. as integers), the distribution of values per bit depth in your sample file is:

So 52.285% of the values are either 0 or 1, there are only a handful of other values below 64 (2~6-bits), 27.59% of the values are 7~12-bits, there's a fairly even distribution of around 2.1% per bit up to 20-bits, and only 3% above 20-bits, with a maximum of 25-bits.
Looking at the data, it's also obvious that there are many sequences of up to 6 consecutive zeros.
These observations gave me the idea of using a variable bit-size per value, something like this:
00 0xxxxx 0 (xxxxx is the number of consecutive zeros)
00 1xxxxx 1 (xxxxx is the number of consecutive ones)
01 xxxxxx xxxxxxxx 2-14 bit values
10 xxxxxx xxxxxxxx xxxxxxxx 15-22 bit values
11 xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 23-30 bit values
A quick test showed that this resulted in a compression rate of 13.78 bits per timestamp, which isn't quite the 10 bits you were aiming for, but not a bad start for a simple scheme.
After analysing the sample data some more, I observed that there are a lot of short sequences of consecutive 0's and 1's, like 0 1 0
, so I replaced the 1-byte scheme with this one:
00xxxxxx 00 = identifies a one-byte value
xxxxxx = index in the sequence table
The table of sequences:
index ~ seq index ~ seq index ~ seq index ~ seq index ~ seq index ~ seq
0 0 2 00 6 000 14 0000 30 00000 62 000000
1 1 3 01 7 001 15 0001 31 00001 63 000001
4 10 8 010 16 0010 32 00010
5 11 ... ... ...
11 101 27 1101 59 11101
12 110 28 1110 60 11110
13 111 29 1111 61 11111
For the example file with 451,210 timestamps, this brings down the encoded file size to 676,418 bytes, or 11.99 bits per timestamp.
Testing the above method revealed that there were 98,578 single zeros and 31,271 single ones between larger intervals. So I tried using 1 bit of each larger interval to store whether it was followed by a zero, which decreased the encoded size to 592,315 bytes. And when I used 2 bits to store whether larger intervals were followed by 0, 1 or 00 (the most common sequence), the encoded size decreased to 564,034 bytes, or 10.0004 bits per timestamp.
I then changed to storing the single 0's and 1's with the following large interval instead of the preceding one (purely for reasons of code simplicity) and found that this resulted in a file size of 563.884 bytes, or 9.997722 bits per timestamp!
So the complete method is:
Store the first timestamp (8 bytes), then store the intervals as either:
00 iiiiii sequences of up to 5 (or 6) zeros or ones
01 XXxxxx xxxxxxxx 2-12 bit values (2 ~ 4,095)
10 XXxxxx xxxxxxxx xxxxxxxx 13-20 bit values (4,096 ~ 1,048,575)
11 XXxxxx xxxxxxxx xxxxxxxx xxxxxxxx 21-28 bit values (1,048,576 ~ 268,435,455)
iiiiii = index in sequence table (see above)
XX = preceded by a zero (if XX=1), a one (if XX=2) or two zeros (if XX=3)
xxx... = 12, 20 or 28 bit value
Example of an encoder:
#include <stdint.h>
#include <iostream>
#include <fstream>
using namespace std;
void write_timestamp(ofstream& ofile, uint64_t timestamp) { // big-endian
uint8_t bytes[8];
for (int i = 7; i >= 0; i--, timestamp >>= 8) bytes[i] = timestamp;
ofile.write((char*) bytes, 8);
}
int main() {
ifstream ifile ("timestamps.txt");
if (! ifile.is_open()) return 1;
ofstream ofile ("output.bin", ios::trunc | ios::binary);
if (! ofile.is_open()) return 2;
long double seconds;
uint64_t timestamp;
if (ifile >> seconds) {
timestamp = seconds * 1000000;
write_timestamp(ofile, timestamp);
}
while (! ifile.eof()) {
uint8_t bytesize = 0, len = 0, seq = 0, bytes[4];
uint32_t interval;
while (bytesize == 0 && ifile >> seconds) {
interval = seconds * 1000000 - timestamp;
timestamp += interval;
if (interval < 2) {
seq <<= 1; seq |= interval;
if (++len == 5 && seq > 0 || len == 6) bytesize = 1;
} else {
while (interval >> ++bytesize * 8 + 4);
for (uint8_t i = 0; i <= bytesize; i++) {
bytes[i] = interval >> (bytesize - i) * 8;
}
bytes[0] |= (bytesize++ << 6);
}
}
if (len) {
if (bytesize > 1 && (len == 1 || len == 2 && seq == 0)) {
bytes[0] |= (2 * len + seq - 1) << 4;
} else {
seq += (1 << len) - 2;
ofile.write((char*) &seq, 1);
}
}
if (bytesize > 1) ofile.write((char*) bytes, bytesize);
}
ifile.close();
ofile.close();
return 0;
}
Example of a decoder:
#include <stdint.h>
#include <iostream>
#include <fstream>
using namespace std;
uint64_t read_timestamp(ifstream& ifile) { // big-endian
uint64_t timestamp = 0;
uint8_t byte;
for (uint8_t i = 0; i < 8; i++) {
ifile.read((char*) &byte, 1);
if (ifile.fail()) return 0;
timestamp <<= 8; timestamp |= byte;
}
return timestamp;
}
uint8_t read_interval(ifstream& ifile, uint8_t *bytes) {
uint8_t bytesize = 1;
ifile.read((char*) bytes, 1);
if (ifile.fail()) return 0;
bytesize += bytes[0] >> 6;
for (uint8_t i = 1; i < bytesize; i++) {
ifile.read((char*) bytes + i, 1);
if (ifile.fail()) return 0;
}
return bytesize;
}
void write_seconds(ofstream& ofile, uint64_t timestamp) {
long double seconds = (long double) timestamp / 1000000;
ofile << seconds << "\n";
}
uint8_t write_sequence(ofstream& ofile, uint8_t seq, uint64_t timestamp) {
uint8_t interval = 0, len = 1, offset = 1;
while (seq >= (offset <<= 1)) {
seq -= offset;
++len;
}
while (len--) {
interval += (seq >> len) & 1;
write_seconds(ofile, timestamp + interval);
}
return interval;
}
int main() {
ifstream ifile ("timestamps.bin", ios::binary);
if (! ifile.is_open()) return 1;
ofstream ofile ("output.txt", ios::trunc);
if (! ofile.is_open()) return 2;
ofile.precision(6); ofile << std::fixed;
uint64_t timestamp = read_timestamp(ifile);
if (timestamp) write_seconds(ofile, timestamp);
while (! ifile.eof()) {
uint8_t bytes[4], seq = 0, bytesize = read_interval(ifile, bytes);
uint32_t interval;
if (bytesize == 1) {
timestamp += write_sequence(ofile, bytes[0], timestamp);
}
else if (bytesize > 1) {
seq = (bytes[0] >> 4) & 3;
if (seq) timestamp += write_sequence(ofile, seq - 1, timestamp);
interval = bytes[0] & 15;
for (uint8_t i = 1; i < bytesize; i++) {
interval <<= 8; interval += bytes[i];
}
timestamp += interval;
write_seconds(ofile, timestamp);
}
}
ifile.close();
ofile.close();
return 0;
}
Because of the long double output bug in the MinGW/gcc 4.8.1 compiler I'm using, I had to use this workaround: (this shouldn't be necessary with other compilers)
void write_seconds(ofstream& ofile, uint64_t timestamp) {
long double seconds = (long double) timestamp / 1000000;
ofile << "1" << (double) (seconds - 1000000000) << "\n";
}
Note to future readers: this method is based on analysis of an example data file; it will not give the same compression rate if your data is different.