-1

I am developing with Duckdb C API. As recommended, I use data chunk and vector to extract value from the database. In duckdb, string is stored as VARCHAR. But I can't extract it and turn to string.

Infomations

duckdb version: 0.8.1

with C API

When I tried to extract value with the type VARCHAR. I use the func duckdb_vector_get_data to get the internal data pointer(which is a void*). But I found this pointer actually points at something that is in VARCHAR format(or duckdb_string_t in duckdb c api). that's an structure of uint32_t length and char inlined[12]. The full string seems to have been compressed into that 12 long char array. So I can't get the real internal string by just convert the pointer to char* or char**. Here's my code.

// start and connect a database befohand
// use func duckdb_query to execute sql statements.
if (duckdb_query(con, "CREATE TABLE strings(i VARCHAR(255));", NULL) == DuckDBError) {
        fprintf(stderr, "Failed to query database1\n");
    goto cleanup;
}
if (duckdb_query(con, "INSERT INTO strings VALUES ('aaaaa'), ('abcdrfghikjlmn'), ('bbbbbbbbbbbb');", NULL) == DuckDBError) {
    fprintf(stderr, "Failed to query database2\n");
    goto cleanup;
}
if (duckdb_query(con, "SELECT * FROM strings", &result) == DuckDBError) {
    fprintf(stderr, "Failed to query database3\n");
    goto cleanup;
}
// use data chunk and vector to extract data
duckdb_data_chunk res_chunk = duckdb_result_get_chunk(result,0);
duckdb_vector vec = duckdb_data_chunk_get_vector(res_chunk,0);
duckdb_logical_type type_l = duckdb_vector_get_column_type(vec);
duckdb_type type = duckdb_get_type_id(type_l);
printf("type id: %d\n", type);
void* pdata = duckdb_vector_get_data(vec);
//the I convert the void* to char*,trying to figure out the real data
char* data = (char*)pdata;
for(idx_t i = 0; i < 48 ; i ++) {
    printf("%d ", *(data+i));
    if((i+1)% 16 == 0)   printf("\n");
    // i will explain this later.
}

here's the result after running. result imag here

type id: 17
5 0 0 0 97 97 97 97 97 0 0 0 0 0 0 0
14 0 0 0 97 98 99 100 -112 -52 115 0 0 0 0 0
12 0 0 0 98 98 98 98 98 98 98 98 98 98 98 98

It's for sure that type id 17 indicates type DUCKDB_TYPE_VARCHAR. The internal data, it looks like every 16 chars form a 'group'. In one group, the first four indicates the length of string, the last 12 stores the real data. But when the length of string over 12. the string seems to be compressed into 6 nums. which pretty like the duckdb type duckdb_string_t refrence

I haven't found any func which can get the real string in VARCHAR format in duckdb's documantation up to now. Am I neglecting or mistaking anything when extracting the VARCHAR? or is there any func can turn VARCHAR to string? Hope someone succeeded extracting VARCHAR by data chunk and vector can help me.

yiyang
  • 3
  • 1
  • Don't post pictures of text, especially as you have posted the same text as text. – Jabberwocky Aug 28 '23 at 16:44
  • Instead of getting data by chunk, could you get by row and column as shown here? https://duckdb.org/docs/api/c/query#duckdb_value – theSparky Aug 28 '23 at 17:11
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Aug 29 '23 at 01:04

1 Answers1

0

The part you've missed is what the duckdb_string_t struct actually is - it's a way to save allocations for every string used.

For short strings (<= 12 chars at the moment), we avoid the extra allocation and embed the value inside the struct. You can determine this with the duckdb_string_is_inlined function, if that's available in the DuckDB version you're using. If it isn't inline, the char array should be cast to a char*, which will point to the actual string, allocated elsewhere.

Hope that helps!

Mause
  • 461
  • 4
  • 9
  • Thanks for the answer. I‘ve fixed it! I indeed mistakend the `duckdb_string_t` structure. It's actually my first time questioning in the community. Thanks a lot again! – yiyang Aug 30 '23 at 13:07