1

I need to generate millions of redis data with a value size of 1kb to a redis cluster, assuming that only the value type is string. I learned about two options, the first one is to use debug populate to generate a specific amount of data, but it does not set the value size.

127.0.0.1:6379> DEBUG POPULATE 1000000
OK

The second one is to use shell to call redis-cli and I don't know how to generate 1kb data

for i in `seq 1000000`; 
do 
    redis-cli SET key$i val$i ; 
done

I am newbie on this. How do I meet the demand? I really appreciate any help with this.


Try the solution based on Mark Setchell.

#!/bin/bash

# Generate around 32kB (+ around 33% base64 overhead) of random characters
stuff=$(head -c 32000 /dev/urandom | base64)

# Set 100,000 keys to 1kB strings, e.g. SET key32 A87H34..PHNQZ
for ((i=0;i<100;i++)) ; do
   echo SET key$i ${stuff:RANDOM:1024}
done | redis-cli -p 6371 -c --pipe

The following error occurs using the above code

sh fake_data_test.sh 
All data transferred. Waiting for the last reply...
MOVED 13252 172.20.0.33:6379
MOVED 9189 172.20.0.32:6379
ERR syntax error
ERR syntax error
MOVED 13120 172.20.0.33:6379
MOVED 9057 172.20.0.32:6379
ERR syntax error
ERR syntax error
...
ERR syntax error
Last reply received from server.
errors: 100, replies: 100

Then I thought whether it was a value formatting issue, so I put it in double quotes echo SET key$i "${stuff:RANDOM:1024}"

sh fake_data_test.sh 
All data transferred. Waiting for the last reply...
MOVED 13252 172.20.0.33:6379
ERR unknown command `kpshETtdvDBpL1BYimJl3FkpuJMom/heyj02qJwUGUCQvSZODHXHwNGodfVyIR6sWSv8agjlGMtl`, with args beginning with: 
...
ERR unknown command `UmBAaiwqgB25mSDhsK7qrveXhJV0cJCBRaz`, with args beginning with: 
MOVED 9189 172.20.0.32:6379
ERR unknown command ERR unknown command `gRolxGVLUVbnU5I/ykaXPCA+0Nev`, with args beginning with: 
Last reply received from server.
errors: 1397, replies: 1428
for ((i=0;i<100;i++)) ; do
   redis-cli -p 6371 -c SET key$i "${stuff:RANDOM:1024}"
done
// All output ok

I don't know if I'm using pipe in the wrong way

Note: OS is centos7. redis cluster creation via docker-compose. images is redis:4.0.11-alpine

moluzhui
  • 1,003
  • 14
  • 34

1 Answers1

2

Updated Answer

If you are doing this in order to just generate test data, there's another much faster way. You could:

  • flush your Redis, i.e. FLUSHALL,
  • populate Redis with the data as per my original answer
  • make a backup with SAVE
  • get the config directory with CONFIG GET DIR
  • stop Redis
  • tyrn off *"autoupdate"
  • move the backup file in the config dir to replace the current file
  • restart Redis

So, essentially, you empty Redis and set it up how you want it (per my original answer) and back it up. Then, before each test, just replace the main database with the backup file and restart.

Original Answer

There are probably better ways, but (before my morning coffee) here's a method...

First, generate 40kB of random text near the start of your script:

stuff=$(head -c 40000 /dev/urandom | base64)

Now, inside your loop, go to a random offset of 0..32767 in the text and take the following 1024 bytes:

val=${stuff:RANDOM:1024}

In case you wonder, I am trying to avoid expensive creation of processes inside your big loop. So the line val=${...} is a bash "internal" that doesn't create a new process.

Note that if you take a million random samples starting at offsets 0..32768, there will inevitably be repetitions. You could reduce this by taking multiple smaller chunks from different offsets and appending them together. Or perhaps, generate absolutely unique values by prefixing each value with a sequential number and making the strings slightly over 1024 bytes.


Aside, I think you'd be better pipelining some of this, or using Python or some bulk-loading to speed it up.

This code does 100,000 insertions of 1024 byte strings in around 49 seconds for example:

#!/bin/bash

# Generate around 32kB (+ around 33% base64 overhead) of random characters
stuff=$(head -c 32000 /dev/urandom | base64)

# Set 100,000 keys to 1kB strings, e.g. SET key32 A87H34..PHNQZ
for ((i=0;i<100000;i++)) ; do
   echo SET key$i ${stuff:RANDOM:1024}
done | redis-cli --pipe

If you want to ensure the values are unique, and don't mind making each value just over 1024 bytes, replace the line in the loop with:

echo SET key$i "${i}-${stuff:RANDOM:1024}"

If you require exactly 1024 unique bytes, you can use the following at a 10% time penalty:

# Generate value: 8 digits of sequence number, a dash and 1015 random characters
printf -v val "%08d-%s" $i ${stuff:RANDOM:1015}
echo SET key$i $val
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Thank you for your help. I got `ERR syntax error` using `echo SET key$i ${stuff:RANDOM:1024}`, so I added double quotes `"${stuff:RANDOM: 1024}"`, but got the following error `ERR unknown command +Np+P+wa9i/ eLBnSQz3gn8pyau8afxkWlVZhO8tjoyWYlUQmbZ42/NfEzHdNWNFz2GkO5XBv6TLy, whose args start with. ERR Unknown command gY49PMKbETT2ls5KGuDLhmQYcZQSq971Pt/6+c3L\`, whose args starts with: `. – moluzhui Nov 29 '22 at 12:31
  • And I'm using the `docker-compose` of the `redis cluster`, so I need the `-p and -c` parameters, after replacing the for loop with the most violent command `redis-cli -p 6371 -c SET key$1 "${stuff:RANDOM:1024}"`, the execution was successful, is it that the `--pipe` method is not compatible with my particular environment? – moluzhui Nov 29 '22 at 12:32
  • I don't know of any reason it shouldn't work. It would help if you clicked `edit` under your question and added in the exact code you ran, the exact way you ran it and the exact error messages. – Mark Setchell Nov 29 '22 at 13:17
  • I added the extra code at the end of my question. Looking forward to your help – moluzhui Nov 30 '22 at 04:46
  • You seem to be running it with `sh fake_data_test.sh` which is not a good way to run a `bash` script. Try `chmod +x fake_data_test.sh` then execute with `./fake_data_test.sh` Alternatively, run with `bash fake_data_test.sh` – Mark Setchell Nov 30 '22 at 07:22
  • I used method `./fake_data_test.sh` and method `bash fake_data_test.sh`, and I tried to start a redis container to match the `redis --pipe`, but **the error remains the same**. I checked some of the data and found that `the data length is not the expected 1024`, such as the result of `get key6` is `5YebvK6X0`, I think that there are some `special characters` in the generated random string to `truncate` the data and there is an error `ERR unknown command "UmBAaiwqgB25mSDhsK7qrveXhJV0cJCBRaz", with args beginning with:` – moluzhui Dec 01 '22 at 07:27
  • The `base64` command at the end should ensure the string only contains A..Z, a..z, 0..9, `+` and `/` so it should never contain anything *"special"*. – Mark Setchell Dec 01 '22 at 08:01
  • I printed the stuff with `echo $stuff` and found that in addition to the ones you mentioned, it also contains `spaces` and `\n`. After I replaced both with the character s via the `tr` command, it executed successfully. – moluzhui Dec 02 '22 at 07:27
  • Brilliant, well done! Thanks for sharing back. Good luck with your project. – Mark Setchell Dec 02 '22 at 07:54
  • Sorry, I still get an error `MOVED 9057 172.20.0.32:6379` when I try to insert in `redis-cluster`, it seems that the [-c parameter is not supported in the pipe way](https://github.com/redis/redis/issues/6098). But for single node test simulation is perfect. Thank you for your help. – moluzhui Dec 02 '22 at 09:02