Questions tagged [filehash]

The filehash package implements a simple key-value style database where character string keys are associated with data values that are stored on the disk. A simple interface is provided for inserting, retrieving, and deleting data from the database. Utilities are provided that allow filehash databases to be treated much like environments and lists are already used in R. These utilities permit interactive and exploratory analysis on large datasets.

Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets. Here we take a simple definition of “large dataset” to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible.

The filehash package provides a full read-write implementation of a key-value database for R. The package does not depend on any external packages (beyond those provided in a standard R installation) or software systems and is written entirely in R, making it readily usable on most platforms. The filehash package represents a database as an instance of an S4 class and operates directly on the S4 object via various methods.

Text adapted from: Peng, Roger, "INTERACTING WITH DATA USING THE FILEHASH PACKAGE FOR R" (June 2006). Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 108. http://biostats.bepress.com/jhubiostat/paper108 & http://cran.r-project.org/web/packages/filehash/vignettes/filehash.pdf

21 questions
1
vote
1 answer

Combining MD5 Analysis with Filename in single Output

I am struggling to combine the output from two commands into a single CSV / TXT file. The first command is to recursively search a folder and create an MD5 number for each document. This is then exported to a CSV file that includes the full…
user9524014
0
votes
0 answers

Is there any Python equivalent library as the Filehash in R programming?

I have been using R's filehash library to solve the "out of memory" problem, to store the large datasets in hashfiles, and the load/updating the file when use it. Given that most of the systems are now using SSD harddisk, I found this solution is…
0
votes
0 answers

Fixed file hash with sha512

I am using this code to hash files however the hashes changes each time I run the script... How to have a fixed hash please ? Maybe there is a random seed, i don't want any for this. I just have a list of files in folder and I need their unique and…
acnguy2
  • 21
  • 2
0
votes
1 answer

Generating hashcodes for specific filetypes only with Powershell

I'm a complete beginner to Powershell and scripting, and have been successfully been using Out-GridView to display some properties of the files I have in my directories using the following: dir D:\Folder1\$type -Recurse | Select…
0
votes
2 answers

File md5 hash changes when chunking it (for netty transfer)

Question at the bottom I'm using netty to transfer a file to another server. I limit my file-chunks to 1024*64 bytes (64KB) because of the WebSocket protocol. The following method is a local example what will happen to the file: public static…
Felix Gaebler
  • 702
  • 4
  • 23
-1
votes
1 answer

How to remove a part of a string with batch script?

I want to remove a part of this code: set hash=certutil -hashfile %%A MD5. I mean that I need to remove MD5 hash of cmd.exe: and CertUtil: -hashfile command completed successfully. from the output of this code. my full code ( it is an antivirus but…
Sepehr Movasat
  • 49
  • 1
  • 10
1
2