3

I am using the Google Drive for Developers Drive API (V3) Nodejs quickstart.

In particular I am concentrating on the following function. Where I have customized the pageSize to 1 for testing. And am calling my function read(file.name);

    /**
 * Lists the names and IDs of up to 10 files.
 * @param {google.auth.OAuth2} auth An authorized OAuth2 client.
 */
function listFiles(auth) {
  const drive = google.drive({version: 'v3', auth});
  drive.files.list({
    pageSize: 1,   // only find the last modified file in dev folder
    fields: 'nextPageToken, files(id, name)',
  }, (err, res) => {
    if (err) return console.log('The API returned an error: ' + err);
    const files = res.data.files;
    if (files.length) {
      console.log('Files:');
      files.map((file) => {
        console.log(`${file.name} (${file.id})`);
        read(file.name);   // my function here  
      });
    } else {
      console.log('No files found.');
    }
  });
}

// custom code - function to read and output file contents 
function read(fileName) {
  const readableStream = fs.createReadStream(fileName, 'utf8');

  readableStream.on('error', function (error) {
      console.log(`error: ${error.message}`);
  })

  readableStream.on('data', (chunk) => {
      console.log(chunk);
  })
}

This code reads the file from the Google Drive folder that is synced. I am using this local folder for development. I have found the pageSize: 1 parameter produces the last file that has been modified in this local folder. Therefore my process has been:

  • Edit .js code file
  • Make minor edit on testfiles (first txt then gdoc) to ensure it is last modified
  • Run the code

I am testing a text file against a GDOC file. The filenames are atest.txt & 31832_226114__0001-00028.gdoc respectively. The outputs are as follows:

    PS C:\Users\david\Google Drive\Technical-local\gDriveDev> node . gdocToTextDownload.js
Files:
atest.txt (1bm1E4s4ET6HVTrJUj4TmNGaxqJJRcnCC)
atest.txt this is a test file!!


PS C:\Users\david\Google Drive\Technical-local\gDriveDev> node . gdocToTextDownload.js
Files:
31832_226114__0001-00028 (1oi_hE0TTfsKG9lr8Wl7ahGNvMvXJoFj70LssGNFFjOg)
error: ENOENT: no such file or directory, open 'C:\Users\david\Google Drive\Technical-local\gDriveDev\31832_226114__0001-00028'

My question is: Why does the script read the text file but not the gdoc?

At this point I must 'hard code' the gdoc file extension to the file name, in the function call, to produce the required output as per the text file example eg

read('31832_226114__0001-00028.gdoc');

Which is obviously not what I want to do.

I am aiming to produce a script that will download a large number of gdocs that have been created from .jpg files.

------------------------- code completed below ------------------------

/**
 * Lists the names and IDs of pageSize number of files (using query to define folder of files)
 * @param {google.auth.OAuth2} auth An authorized OAuth2 client.
 */
 function listFiles(auth) {
  const drive = google.drive({version: 'v3', auth});
 
 
  drive.files.list({
    corpora: 'user',  
    pageSize: 100,
    // files in a parent folder that have not been trashed 
    // get ID from Drive > Folder by looking at the URL after /folders/ 
    q: `'11Sejh6XG-2WzycpcC-MaEmDQJc78LCFg' in parents and trashed=false`,    
    fields: 'nextPageToken, files(id, name)',
  }, (err, res) => {
    if (err) return console.log('The API returned an error: ' + err);
    const files = res.data.files;
    if (files.length) {

      var ids = [ ];
      var names = [ ];
      files.forEach(function(file, i) {
        ids.push(file.id);
        names.push(file.name);
      });

      ids.forEach((fileId, i) => {
              fileName = names[i];

      downloadFile(drive, fileId, fileName);
      });

    } 
    else 
    {
      console.log('No files found.');
    }
  });
}

/**
 * @param {google.auth.OAuth2} auth An authorized OAuth2 client.
 */ 

function downloadFile(drive, fileId, fileName) {
 
 // make sure you have valid path & permissions. Use UNIX filepath notation.
  
    const filePath = `/test/test1/${fileName}`;

  const dest = fs.createWriteStream(filePath);
  let progress = 0;

  drive.files.export(
    { fileId, mimeType: 'text/plain' },
    { responseType: 'stream' }
  ).then(res => {
    res.data
      .on('end', () => {
        console.log('  Done downloading');

      })  
      .on('error', err => {
        console.error('Error downloading file.');
      })  
      .on('data', d => {
        progress += d.length;
        if (process.stdout.isTTY) {
          process.stdout.clearLine();
          process.stdout.cursorTo(0);
          process.stdout.write(`Downloading ${fileName} ${progress} bytes`);
        }   
      })  
      .pipe(dest);
  }); 
}
Dave
  • 687
  • 7
  • 15
  • 1
    in files response, check `fullFileExtension` and `fileExtension`, is it there? see here: https://developers.google.com/drive/api/v3/reference/files#resource you might add it manually, for example, detect if extension is missing (last `.` match), concatenate filename with `.gdoc` – traynor Dec 29 '21 at 11:41
  • Thanks @traynor Good point. But I think perhaps I'm going in the wrong direction with my attempts to get successful downloads. I am having trouble with https://developers.google.com/drive/api/v3/manage-downloads examples. That is, as for the code, it doesn't work for me. The best I can get, after making changes, gives me a 403 error. With this post I think I was getting 'side-tracked' going the 'long way around'. I think I was just trying to understand the Google environment. – Dave Dec 29 '21 at 19:12

2 Answers2

4

My question is: Why does the script read the text file but not the gdoc?

This is because you're trying to download a Google Workspace document, only files with binary content can be downloaded using drive.files.get method. For Google Workspace documents you need to use drive.files.exports as documented here

From your code, I'm seeing you're only listing the files, you will need to identify the type of file you want to download, you can use the mimeType field to check if you need to use the exports method vs get, for example, a Google Doc mime type is application/vnd.google-apps.document meanwhile a docx file (binary) would be application/vnd.openxmlformats-officedocument.wordprocessingml.document

Check the following working example:

Download a file from Google Drive                                                                                 Run in Fusebit
const fs = require("fs");

const getFile = async (drive, fileId, name) => {
    const res = await drive.files.get({ fileId, alt: "media" }, { responseType: "stream" });

    return new Promise((resolve, reject) => {
        const filePath = `/tmp/${name}`;
        console.log(`writing to ${filePath}`);
        const dest = fs.createWriteStream(filePath);
        let progress = 0;
        res.data
            .on("end", () => {
                console.log(" Done downloading file.");
                resolve(filePath);
            })
            .on("error", (err) => {
                console.error(" Error downloading file.");
                reject(err);
            })
            .on("data", (d) => {
                progress += d.length;
                console.log(` Downloaded ${progress} bytes`);
            })
            .pipe(dest);
    });
};

const fileKind = "drive#file";
let filesCounter = 0;
const drive = googleClient.drive({ version: "v3" });
const files = await drive.files.list();

// Only files with binary content can be downloaded. Use Export with Docs Editors files
// Read more at https://developers.google.com/drive/api/v3/reference/files/get
// In this example, any docx folder will be downloaded in a temp folder.
const onlyFiles = files.data.files.filter(
    (file) =>
        file.kind === fileKind &&
        file.mimeType === "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
);
const numberOfFilesToDownload = onlyFiles.length;
console.log(` About to download ${numberOfFilesToDownload} files`);
for await (const file of onlyFiles) {
    filesCounter++;
    console.log(` Downloading file ${file.name}, ${filesCounter} of ${numberOfFilesToDownload}`);
    await getFile(drive, file.id, file.name);
}


Ruben Restrepo
  • 1,116
  • 5
  • 5
  • I would like to mark this as the correct answer because your code makes sense to me. But this still does not answer the original question of why there is different behavior depending on the file extension. I got around this by supplying the ID for 'drive.files.export' and only using the filename (don't worry about extension) for the file path. I will include my code under the original question. – Dave Jan 14 '22 at 20:37
  • 1
    @Dave I hope my answer was helpful, as I was mentioning, using drive.files.export is something useful if you're dealing with a Google Workspace document (like a document with gdoc extension) vs a binary content file (i.e text file or .docx) if you use drive.files.get it won't work since you need to export it first. That would explain why there is a different behavior (it's expected) – Ruben Restrepo Jan 17 '22 at 17:41
  • Yes. I hear you Rubin. But OP does not contain drive.files.get code. It was a simple test script to read a filename list. Further to your comment re: export. I have completed this and shown the code. So thank you for your comments. – Dave Jan 18 '22 at 19:46
0

The answer (as I see it) is that the nodejs script above is running on Windows and therefore must comply with the native OS/file system inherited via the DOS/NT development of Windows. On the other hand, the gdoc extension is a reference created by the Google Drive sync desktop client. And here is the important distinction. The gdoc extension references a file stored on Google Drive (the reference being in the sync folder on a hard drive and the file being in the cloud on Google Drive) Therefore it's not an extension in the usual sense. The usual sense being where the extension is used by a local application as a valid access/read/write file type. So my test function above function read(fileName) won't be able to read the .gdoc in the same way as the .txt extension.

Therefore the correct way to access files on Google Drive from a local application is to use the file's ID. The filename is just a convenient way of labelling the IDs so that the user can meaningfully compare the downloaded copy of the file with the original on Google Drive.

(Refer to the original question) Using the code under the ---------- code completed below --------- I have added these two functions to Google's Nodejs Quickstart Replacing the function listFiles(auth) and adding function downloadFile(drive, fileId, fileName)

The total script file has been used to download multiple files (more than 50 at a time) to my hard drive. This is a useful piece of code in an OCR setup which has a gscript convert .JPG images of historic Electoral Rolls into readable text. These gdocs are messy (still containing the original image and colored fonts of various formats) In downloading as text files the above script cleans them up. Of course images are removed from text files and the fonts are standardized to just upper/lower case text. So, it's more than just a downloader. It's a filter as well.

I hope this of some use to someone.

Dave
  • 687
  • 7
  • 15