This will probably sound dumb but I am trying to make an efficient program (time/memory wise)
Efficient program to do what? You never really say what you're trying to do.
studying hash tables I have seen that it's basically an array of linked lists
That's a common implementation, but it doesn't speak to why you'd use a hash table in the first place.
You use a hash table when you're searching for a record based on a non-numeric key (i.e., a string). You feed that key into a hashing function that spits out an integer value, and you use that value to index into the table. So if f("foo")
spits out 3
, that's the table index you use to store your data with the key "foo"
.
No practical hashing function is perfect and different strings can result in the same hash value, known as a collision. Using a linked list is one way to resolve collisions, other ways are to compute a secondary index into the table or just add 1 to the returned index.
Computing a hash from the key is fast relative to a linear or binary search, giving a time complexity of O(1) compared to a linear search time of O(n) and a binary search time of O(log2 n). The tradeoff is that your table isn't ordered in any way - a linear traverse will appear randomly ordered.
Edit
From the comment:
i need to save a text as a bunch of lines, so i know where each line is (line 1 will be the first, duh) and it did seem unnecessary to me to use hash table but the problem is I don't know how many lines it will have so if i make an array of 50 maybe it's not enough, I am wondering if it is better to use a list/hash table/ some other structure or just an array of char arrays (added in post too)
If you just need to store a sequence of strings, you can dynamically allocate an array and then extend that array as necessary. Assuming all the lines are of a known fixed length, you can do something like this (reads the file into memory, dumps the contents to standard output):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LENGTH 80
size_t read_arr( FILE *in, char (**arr)[MAX_LENGTH+1] )
{
size_t size = 1;
size_t read = 0;
*arr = malloc( sizeof **arr * size );
if ( !*arr )
{
return 0;
}
char buf[MAX_LENGTH+1];
while( fgets( buf, sizeof buf, in ) )
{
if ( read == size )
{
char (*tmp)[MAX_LENGTH+1] = realloc( *arr, sizeof **arr * (size * 2) );
if ( tmp )
{
*arr = tmp;
size *= 2;
}
else
{
fprintf( stderr, "Unable to extend array past %zu entries, returning what we have so far.\n", read );
return read;
}
}
strcpy( (*arr)[read++], buf );
}
return read;
}
int main( int argc, char **argv )
{
if ( argc < 2 )
{
fprintf( stderr, "USAGE: %s <file name>\n", argv[0] );
return EXIT_FAILURE;
}
FILE *in = fopen( argv[1], "r" );
if ( !in )
{
fprintf( stderr, "File %s not found\n", argv[0] );
return EXIT_FAILURE;
}
char (*arr)[MAX_LENGTH+1];
size_t n = read_arr( in, &arr );
for ( size_t i = 0; i < n; i++ )
printf( "%s", arr[i] );
free( arr );
return EXIT_SUCCESS;
}
realloc
is a relatively expensive operation, so you don't want to do it for every line in the file. Doubling the array each time minimizes the number of calls, although the tradeoff is the possibility of internal fragmentation (needing 256 rows to store 129 lines, for example). But on average that shouldn't be a problem.
could you tell me what is char (**arr)[MAX_LENGTH+1]
, I've never seen this structure; is it a 2d array?
Yeah, I guess I should explain that.
T (*a)[N];
declares a
as a pointer to an N-element array of T
. An expression of type T [M][N]
will "decay" to type T (*)[N]
(not T **
).
I want to dynamically allocate enough space to store M objects of type T [N]
. So we start with the common idiom
P *p = malloc( sizeof *p * M );
sizeof *p
is equivalent to sizeof (P)
, so we're allocating enough space to store M objects of type P
. Now we replace the type P
with the array type T [N]
, which gives us
T (*p)[N] = malloc( sizeof *p * M );
In this case, sizeof *p
is equivalent to sizeof (T [N])
, so we're allocating enough space to store M N-element arrays of T
.
Since a[i]
is defined as *(a + i)
, the following is true:
(*p)[i] == (*(p + 0))[i] == (p[0])[i] == p[0][i]
So we can index into p
like any other 2D array.
So in the main
function above, I declare arr
as a pointer to a MAX_LENGTH+1
array of char
. Since I want read_arr
to update the value stored in arr
itself (the address to the allocated memory), I need to pass a pointer to arr
. Remember, if you want a function to update one of its parameters, you must pass a pointer to that parameter1, even if that parameter is already a pointer type. If the type of arr
is char (*)[MAX_LENGTH+1]
, then the type of &arr
is char (**)[MAX_LENGTH+1]
, or "pointer to pointer to MAX_LENGTH+1
-element array of char
".
And again, this assumes that all the lines in the file are close to the same length, and that they're all less than some known maximum length. If you have a file where the lines have wildly different lengths, or 99% have length 20 and one or two have length 200, then you'll want to do something else.
- Arrays are weird, but we're not dealing with an array type in this case, we're dealing with a pointer type.