I have data in the form of json and csv, and sometimes need to provide this to external analysts. They're used to working with sav
files though, so I would like to give them some utility function which will enable them to load the data and work with it in the same way they would if it had been in the form of a sav file.
I'm familiar with python, though I have access to Stata to try something out (tho have never opened it).
An example of the data to be used for this is the following:
import pandas as pd
import numpy as np
variable_value_labels = {
"col_a": {
1: "first thing",
2: "something second",
3: "meaning of three",
},
"col_b": {
1: "No",
2: "Yes",
},
}
column_names_to_labels = {
"col_a": "this is a column here",
"col_b": "and another",
"col_c": "this is a column without variable value labels",
}
N = 10
df = pd.DataFrame(
{
"col_a": np.random.choice(list(variable_value_labels["col_a"].keys()), N),
"col_b": np.random.choice(list(variable_value_labels["col_b"].keys()), N),
"col_c": np.random.rand(N),
}
)
The raw data for the above is:
column names to labels json:
{"col_a": "this is a column here", "col_b": "and another", "col_c": "this is a column without variable value labels"}
variable value labels json:
{"col_a": {"1": "first thing", "2": "something second", "3": "meaning of three"}, "col_b": {"1": "No", "2": "Yes"}}
dataframe csv:
col_a,col_b,col_c
1,1,0.8360787635373775
2,1,0.3373961604172684
1,2,0.6481718720511972
2,1,0.36824153984054797
2,2,0.9571551589530464
3,2,0.14035078041264515
1,1,0.8700872583584364
3,1,0.4736080452737105
1,2,0.8009107519796442
1,2,0.5204774795512048
What I would like is some function / process which enables something such as:
function( dataframe_path, variable_value_labels_path, column_names_to_labels_path ):
return <sav file with above info>
If it's possible to pass it a path to a directory containing these data instead that would also be good.