0

I know my title is confusing in the sense that the tokenize command is specified to a string.

I have many folders that contain massive, separated, ill-named Excel files (most of them are scraped from ahe website). It's inconvenient to select them manually so I need to rely on Stata extended macro function local :dir to read them.

My code looks as follows:

foreach file of local filelist {
    import excel "`file'", clear
    sxpose, clear 
    save "`file'.dta", replace
}

Such code will generate many new dta files and the directory is thus full of these files. I prefer to create a single new data file for the first xlsx file and then append others to it inside the foreach loop. So essentially, there's a if-else inside the loop.

We need an index of the macro filelist just created, so that we can write something like:

token `filelist'  // filelist is created in the former code

if "`i'" == `1' {
   import excel "`file'",clear
}
else {
   append using `i',clear
}

I know my code is inefficient and error-prone: the syntax of expression token 'filelist' is incorrect too (given that filelist is not a string). However, I still want to figure out the basic structure behind my pseudo code.

How could I correct my code and make it work?

Another more efficient approach is highly welcomed.

zlqs1985
  • 509
  • 2
  • 8
  • 25
  • I haven't tried correcting your code, which wouldn't work. But for others I'll underline that `token` is a typo for `tokenize`. More importantly, it is quite unclear why the filelist is stated to be "not string". – Nick Cox Jan 19 '16 at 11:16

1 Answers1

1

Various techniques spring to mind, none of which entails tokenizing.

local count = 1 
foreach file of local filelist {
    import excel "`file'",clear
    sxpose, clear 

    if `count' == 1 save alldata 
    else append using alldata 

    local ++count
}


local allothers "*" 
foreach file of local filelist {
    import excel "`file'",clear
    sxpose, clear 

    `firstonly'   save alldata 
    `allothers'   append using alldata 

    local firstonly "*" 
    local allothers 
}

In the second block, the point is that lines prefixed by * are treated as comments, so any command that * precedes is ignored ("commented out"). The append statement is commented out first time round the loop and the save statement is preceded by an undefined local macro, which evaluates to an empty string, so it is not ignored.

After the first time round the loop, commenting out on append is removed, but placed on the save.

I don't think either of these approaches is more efficient than what you have in mind (works faster, uses less memory, is shorter, or whatever "efficient" means for you). The code clearly does presuppose that you have set up the file list correctly.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Thank you Nick, both method work and the latter one is pretty smart – zlqs1985 Jan 19 '16 at 16:13
  • A further issue is , how can I "safely" bypass some unexpectable error in the foreach loop? Say, the file's name in the dictionary are in Chinese characters.Sometimes, the file names collected (using extended macro function) and those import afterwards can be inconsistent (like 春天璟墅 vs 春天瓃墅. of which 璟 is a rare character )and I want to deal with them manually later on. But the loop block give me errror message "file 32_春天璟墅.xlsx not found" and just quit. Merely add `cap` before `import excel` didn't work. So how can I bypass a whole block of codes in foreach loop? Thank you – zlqs1985 Jan 21 '16 at 04:34
  • I can't reproduce examples with Chinese characters; others may able to help there. `capture` is what springs to mind, but "didn't work" is no guidance at all to what mistake you made in implementing it. Best to ask a new question with explicit code and a reproducible example. – Nick Cox Jan 21 '16 at 09:39