0

Along the lines of How to get this PRNG to generate numbers within the range? , I am this far (parts is an array of 32 2-character "symbols"):

const parts = `mi
ma
mo
ne
nu
di
da
do
be
bu
ti
te
ta
to
tu
ki
ke
ka
ko
ku
si
sa
so
ze
zu
fi
fa
fo
ve
vu
xe
xu`.trim().split(/\n+/)

const fetch = (x, o) => {
  if (x >= o) {
    return x
  } else {
    const v = (x * x) % o
    return (x <= (o / 2n)) ? v : o - v
  }
}

const fetchLarge = (x) => fetch(x, 41223334444555556666667777777888888889999999997n)

// the last number can be anything.
const buildLarge = (x, o) => fetchLarge((fetchLarge(x) + o) % BigInt(Math.pow(32, 31)) ^ 2030507011013017019023n)

const createArray = (n, mod = 32n) => {
  if (!n) return [0];
  let arr = [];
  while (n) {
    arr.push(Number(n % mod));
    n /= mod;
  }
  return arr;
}

const write = (i) => {
  const x = buildLarge(i++, 272261127249452727280272961627319532734291n)
  return createArray(x).map(x => parts[x]).join('')
}

let i = 1n
while (i < 10000) {
  console.log(write(i))
}

I am generating results along the lines of:

kitekutefaxunetotuzezumatotamabukidimasoxumoxudofasositinu,6038940986212279582529645303138677298679151
sokiketufikefotekakidotetotesamizununetefokixefitetisovene,5431347628569519336817719657935192515363318
xudamituzesimixuxemixudakedatetutununekobuzexesozuxedinenu,5713969289948157459645315228321450728816863
dazenenemovudadikukatatakibekaxexemovubedivusidatafisasine,5082175912370834928186684152014555456835302
xufotidosokabunudomimibefisimakusimokedamomazexekofomokane,4925740069222414438181195472381794794580863
sodozekadakuzemaxetexukuzumisikitazufitizexekatetotuxusone,5182433137814021540565892366585827483507958
kikokasatudatidatufikizesadimatakakatudisibumofotuzutaze,1019165422643074024784461594259815846823503
dakikinetofonexesimavufafisaxefosafisikofotasanekovetevu,1279315636939618596561544978621478602915302
kinunebebuzukokemidatekobusofokikozukobedodakesisikunuki,659622269329577207976266617866288582888591
sozesifamoxebusitotesisasizekudasomitatavudidizukadimate,480714979099063166920265752208932468511478
xumakikofakumixefotisikunumovudafasofikimozenudafosidaka,749508057657951412178315361964670398839871
dazedokutituzufakebutifokekusobuzutemanesadafadatetitamo,103886097260879003150138325027254855900902
xukemizukozefaxetudizukedimotevubesitekitavukakevutisibe,376136321524704717800574424940622855799327
dozexedivenudifabuvedavebukeketozukumasimakuvetuketomafaxe,42948292938975784099596927092482269526555367
mimasatukidisodifikekutovumazefikefonemofimotesonusazexuxe,43196343143305047528500657761292227037320224
zedafimasobukudizedozefoketuzekisadotufikudadokisakedofoxe,43000150124549846140482724444846720574088407
kisafimosotuvuvuzuzukodibevutemidazusisamokososikomofavuma,2692423943832809210699522830552769656612527
soxutokonebusidaketesomoxemibesonubudibekunumatifokokanemo,2942202721014541374299446744441542204274678
xusikematetemititafafakuxusinekefoketonebetokudonesomosama,2312137916289687577537008913213461971911327

How can I make it so all strings are of length 31 "symbols" (since symbols are 2 characters in this example, that is a total of 62 characters), like this:

xusikematetemititafafakuxusinekexusikematetemititafafakuxusine

That is: What should the 3 large bigint numbers be above in the algorithm? Also, what should they be so the distribution appears random? I noticed that using large numbers close to the boundary resulted in much better apparently-randomized results, compared to smaller numbers. Also, you can't just prefix the bigint x with 0's, which would result in mamamamamama.... Finally, there can at most be 2 pairs of same letters in a sequence, which I assume you can only really solve by just skipping over the results that don't fit that constraint (unless there is some math magic that can somehow tell if more than two of these 32 "symbols" appear next to each other).

Regarding the last part, these are valid results:

mamavumamavumama...
nanavumamavuvuma...

These are not valid:

mamamavumamavuma...
mavuvuvumamavuma...

Because there are 3 pairs in a row that are the same.

To summarize:

  1. How to make it so all strings are 62 characters in length, without padding with zeroes? That means it must fit within some range of BigInts I'm not too sure about.
  2. So that the distribution appears enormously random (i.e. so we don't get just the tail tip of the sequence changing slowly, but instead the entire number seems to completely change, as the examples show).
  3. So that no more than two pairs are similar in a sequence? This part can just be solved by skipping results we find in the pseudo-randomized sequence, unless there is some magic to accomplish it that I'm not possibly fathoming :) For example, maybe there is some magic to do with multiples of similar 5-bit chunks or something, I don't know. But don't need to get fancy, skipping the results that match a regex is fine too.
Lance
  • 75,200
  • 93
  • 289
  • 503
  • Why would you need bigints for this at all? Just use any of quite a few PRNGs, and use that to select 31 "random" elements from your domain array, yielding the same 62 character string given the same seed? – Mike 'Pomax' Kamermans Jan 04 '22 at 22:39
  • PRNGs don't guarantee _uniqueness_ of each "pseudo random" number of the entire resulting set from an incrementing sequence. That's why the need for this fancy function. There can be no duplicates. – Lance Jan 04 '22 at 22:40
  • 1
    "*How to make it so all strings are 62 characters in length, without padding with zeroes?*" - actually, zero-padding *is* the correct solution for that. Your PRNG should have a uniform distribution, and it will be just as likely to generate `10000…` as it will generate `…00001`. – Bergi Jan 04 '22 at 22:44
  • @Bergi shouldn't it instead be just a number which is of a certain range/size? Then you don't need padding, I just don't know how to mathematically figure out what that range is. – Lance Jan 04 '22 at 22:46
  • @LancePollard A number, in the range 0 to 32**31 (31 5-bit symbols), zero-padded so that they all have the same 31-symbol size. – Bergi Jan 04 '22 at 22:48
  • I guess you could do that, but then cut out everything 00000... up to the constraint where there are no more than 2 "symbols" the same in a sequence. But if you try that given my 3 large number inputs/seeds, it will result in the first several tons of numbers all starting with mamama which I want to avoid. – Lance Jan 04 '22 at 22:50
  • Use base 31 instead, and zero-padding. With 32 symbols, for the current digit, pick only from those symbols that haven't been used in the previous digit. In this way you can guarantee that no symbol will appear twice in a row. Though you could end up with sequences like `mavumavumavumavumavumavu...` – Ouroborus Jan 04 '22 at 22:55
  • @Ouroborus I'm not quite sure how that would work. It is also important (though I probably should have stated) that the string `mamavu...` be reversible back to the bigint. – Lance Jan 04 '22 at 22:57
  • 1
    Well, you would never get a string like `mamavu...` since you'd never have two of the same symbol in a row using my strategy. It is reversible though. – Ouroborus Jan 04 '22 at 23:01
  • @Ouroborus can you show in some code in an answer please? – Lance Jan 04 '22 at 23:27
  • @Lance If you need to prevent repeating symbols, fix that in your generator. Either way, a sequence beginning with the `0` symbol is a valid sequence and must be supported by zero-padding your integer. (Tbh I don't understand the complex generator function or why it returns a bigint instead of an array of symbols). – Bergi Jan 05 '22 at 02:52

1 Answers1

1

Here we use Base 32 (hard-coded, but could be parts.length) for the 2 (maxRepeat) least significant digits and Base 31 (hard-coded, but could be parts.length-1) for the remaining digits. This gives the maximum range of values for the length.

All values from 0n to getMax(), inclusive, can be encoded to 31 (minLength) symbols.

The magic for preventing repeats longer than maxRepeat is to check the ith digit against the i - maxRepeat digit, making an adjustment to the ith digit if >=. While this produces valid encodings (ones that follow the rules), not all arbitrary symbol sequences are valid, even if they follow the rules. For example, the sequence mimami would never be generated and wouldn't be decode-able.

const split = new RegExp(`.{2}`, 'g');
const parts = 'mimamonenudidadobebutitetatotukikekakokusisasozezufifafovevuxexu'.match(split);
const partsMap = Object.fromEntries(parts.map((v,i) => ([v,BigInt(i)])));

const encode = (value, maxRepeat = 2, minLength = 31) => {
  value = BigInt(value);
  const digits = [];
  // convert the value to digits
  // the first least significant `maxRepeat` digits use base 32, the rest use base 31
  while(value > 0) {
    const radix = digits.length < maxRepeat ? 32n : 31n;
    digits.push(value % radix);
    value /= radix;
  }
  // add 0 padding
  while(digits.length < minLength) {
    digits.push(0n);
  }
  // adjust digits to prevent sequences longer than `maxRepeat`
  const symbols = []
  digits.forEach((v,i) => {
    symbols.push((i < maxRepeat || v < symbols[i-maxRepeat]) ? v : v+1n);
  });
  // map to symbols and return string
  const str = symbols.map(v => parts[v]).join('');
  return str;
};

const decode = (str, maxRepeat = 2) => {
  // split string into array of symbols
  const symbols = str.match(split);
  // convert symbols to digits
  const digits = symbols.map(v => partsMap[v]).map((v,i,a) => {
    if(i < maxRepeat || v < a[i-maxRepeat]) return v;
    return v-1n;
  });
  // compute the threshold where we transition from base 31 to base 32
  const threshold = digits.length - maxRepeat;
  // convert digits to BigInt
  const results = digits.reverse().reduce(
    (s,v,i) => (s * (i >= threshold ? 32n : 31n) + v)
  , 0n);
  return results;
};

// compute the maximum value that can be encoded using `minLength` number of symbols
const getMax = (maxRepeat = 2, minLength = 31) => 32n ** BigInt(maxRepeat) * 31n ** BigInt(minLength - maxRepeat) - 1n;

// Consoles will print BigInt but Stackoverflow's interpreter 
// doesn't understand them yet so we use `.toString()`.
console.log('limit:', getMax().toString());
console.log(encode(getMax()));

const n1 = 6038940986212279582529645303138677298679151n;
console.log(encode(n1)); // 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi'
const n2 = 0n;
console.log(encode(n2)); // 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima'

const s1 = 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi';
console.log(decode(s1).toString()); // 6038940986212279582529645303138677298679151
const s2 = 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima';
console.log(decode(s2).toString()); // 0

console.log(decode(encode(0n)) == 0n);
console.log(decode(encode(6038940986212279582529645303138677298679151n)) == 6038940986212279582529645303138677298679151n);
.as-console-wrapper { max-height: 100% !important; top: 0; }
Ouroborus
  • 16,237
  • 4
  • 39
  • 62