10

I have a program that read urls in a file and does a gethostbyname() on each URL host. This call is quite consuming. I want to cache them.

Is there a very simple map-base code snippet in C out there that I could use to do the caching? (I just don't want to reinvent the wheel).

It has to have the following points :

  • Open-source with a permissive license (think BSD or public domain).
  • Very simple : ideally less than 100 LOC
  • Keys are char* and values void*. No need to copy them.
  • No real need to implement remove(), but contains() is either needed or put() should replace the value.

PS: I tagged it homework, since it could be. I'm just being very lazy and do want to avoid all the common pitfalls I could encounter while reimplementing.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Steve Schnepp
  • 4,620
  • 5
  • 39
  • 54

8 Answers8

10

Here's a very simple and naive one

  • Fixed bucket size
  • No delete operation
  • inserts replaces the key and value, and can optionally free them

:

#include <string.h>
#include <stdlib.h>

#define NR_BUCKETS 1024

struct StrHashNode {
    char *key;
    void *value;
    struct StrHashNode *next;

};

struct StrHashTable {
    struct StrHashNode *buckets[NR_BUCKETS];
    void (*free_key)(char *);
    void (*free_value)(void*);
    unsigned int (*hash)(const char *key);
    int (*cmp)(const char *first,const char *second);
};

void *get(struct StrHashTable *table,const char *key)
{
    unsigned int bucket = table->hash(key)%NR_BUCKETS;
    struct StrHashNode *node;
    node = table->buckets[bucket];
    while(node) {
        if(table->cmp(key,node->key) == 0)
            return node->value;
        node = node->next;
    }
    return NULL;
}
int insert(struct StrHashTable *table,char *key,void *value)
{
    unsigned int bucket = table->hash(key)%NR_BUCKETS;
    struct StrHashNode **tmp;
    struct StrHashNode *node ;

    tmp = &table->buckets[bucket];
    while(*tmp) {
        if(table->cmp(key,(*tmp)->key) == 0)
            break;
        tmp = &(*tmp)->next;
    }
    if(*tmp) {
        if(table->free_key != NULL)
            table->free_key((*tmp)->key);
        if(table->free_value != NULL)
            table->free_value((*tmp)->value);
        node = *tmp;
    } else {
        node = malloc(sizeof *node);
        if(node == NULL)
            return -1;
        node->next = NULL;
        *tmp = node;
    }
    node->key = key;
    node->value = value;

    return 0;
}

unsigned int foo_strhash(const char *str)
{
    unsigned int hash = 0;
    for(; *str; str++)
        hash = 31*hash + *str;
    return hash;
}

#include <stdio.h>
int main(int argc,char *argv[])
{
    struct StrHashTable tbl = {{0},NULL,NULL,foo_strhash,strcmp};

    insert(&tbl,"Test","TestValue");
    insert(&tbl,"Test2","TestValue2");
    puts(get(&tbl,"Test"));
    insert(&tbl,"Test","TestValueReplaced");
    puts(get(&tbl,"Test"));

    return 0;
}
nos
  • 223,662
  • 58
  • 417
  • 506
  • +1: Exactly what I was searching for. I edited the code a little to cope with correct const-ness (key & value). Now my apps starts in less than a sec, instead of the 2 minutes @ 100% cpu :-) – Steve Schnepp Aug 06 '09 at 08:50
6

Christoper Clark's hashtable implementation is very straightforward. It is more than 100 lines, but not by much.

Clark's code seems to have made its way into Google's Conccurrency Library as a parallelization example.

Jan
  • 2,178
  • 3
  • 14
  • 26
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
4

std::map in C++ is a red-black tree under the hood; what about using an existing red-black tree implementation in C? The one I linked is more like 700 LOC, but it's pretty well commented and looks sane from the cursory glance I took at it. You can probably find others; this one was the first hit on Google for "C red-black tree".

If you're not picky about performance you could also use an unbalanced binary tree or a min-heap or something like that. With a balanced binary tree, you're guaranteed O(log n) lookup; with an unbalanced tree the worst case for lookup is O(n) (for the pathological case where nodes are inserted in-order, so you end up with one really long branch that acts like a linked-list), but (if my rusty memory is correct) the average case is still O(log n).

Meredith L. Patterson
  • 4,853
  • 29
  • 30
2

You can try using following implemntation

clib

Avinash
  • 12,851
  • 32
  • 116
  • 186
1

memcached?

Not a code snippet, but a high performance distributed caching engine.

Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
1

Not lazy, deeply sensible to avoid writing this stuff.

How's this library never used it myself but it seems to claim to do what you ask for.

djna
  • 54,992
  • 14
  • 74
  • 117
  • The library seems interesting, but the last update to the website was 2005. It would be OK for a few code lines, but a little too old for a full blown library. – Steve Schnepp Aug 06 '09 at 07:04
  • Well, fundamental algorithms well implemented should not become dated. I would have no concern about using 4 year old libraries of this kind - assuming that they actually worked in the first place. If you ahve the code, then maintenance should not be too much of an issue. – djna Aug 06 '09 at 08:01
1

Dave Hanson's C Interfaces and Implementations includes a nice hash table, as well as many other useful modules. The hash table clocks in at 150 lines, but that's including memory management, a higher-order mapping function, and conversion to array. The software is free, and the book is worth buying.

Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
0

Found an implementation here : c file and h file that's fairly close to what you asked. W3C license

zebrabox
  • 5,694
  • 1
  • 28
  • 32