13

The files in Google domain that I administer have gotten into a bad state; there are thousands of files residing in the root directory. I want to identify these files and move them to a folder underneath "My Drive".

When I use the API to list the parents for one of these orphaned files, the result is an empty array. To determine if a file is orphaned, I can iterate over all the files in my domain, and request the list of parents for each. If the list is empty, I know that the file is orphaned.

But this is hideously slow.

Is there anyway to use the Drive API to search for files that have no parents?

The "parents" field for the q parameter doesn't seem to be useful for this, as it's only possible to specify that the parents list contains some ID.

Update:

I'm trying to find a quick way to locate items that are truly at the root of the document hierarchy. That is, they are siblings of "My Drive", not children of "My Drive".

FishesCycle
  • 1,011
  • 1
  • 10
  • 24
  • 3
    This sounds like a bug, we should not allow files not to have any parents. – Burcu Dogan Sep 04 '13 at 09:00
  • 3
    The drive UI explicitly lets you move files into this situation, but advises against it. It would be great to be able to query for such files. – 0E322070 Jan 21 '14 at 14:59
  • Did you ever find a solution? – casolorz Jan 24 '18 at 18:15
  • @Peter Alfvin Unfortunately, in the current stage, the files without the parent folders cannot be directly retrieved using Drive API, yet. So how about these 2 workarounds? 1. Retrieve all files, and retrieve files without the parent folders from the retrieved all files. 2.Retrieve all folders, and retrieve files which are not included in all folders. These can be achieved using [the files.list method](https://developers.google.com/drive/api/v3/reference/files/list). If this is not the method you want, I apologize. By the way, can I ask you about the language you want to use? – Tanaike Apr 10 '19 at 00:19
  • @Tanaike Unfortunately, the `list` method does not allow you to retrieve `parents` information. You have to use `get` in order to obtain parents information, so the first workaround will not work. The second method will not because files can have parents which you don't have access to, so you can't derive the non-parent class of files by enumerating all the folders you do have access to. Regarding language, I happen to be using Javascript in the context where I wanted to use this, but I'm not using the Drive library. I'm just making REST calls. – Peter Alfvin Apr 13 '19 at 23:28
  • @Peter Alfvin Thank you for replying. About ``the list method does not allow you to retrieve parents information.``, although I'm not sure whether I could correctly understand about your current situation, when the files.list method is used with the fields of ``files(id,parents)``, the files without the parents don't have the property of ``parents``. By this, I confirm whether the file has the parents. Can I ask you about the situation of ``the list method does not allow you to retrieve parents information.``? I would like to correctly understand about your issue. – Tanaike Apr 14 '19 at 01:06
  • @Tanaike When I specify `parents` in the `fields` parameter in a `list` call, I get a 400 return with the error message `"Invalid field selection parents"`. – Peter Alfvin Apr 15 '19 at 12:33
  • @Peter Alfvin Thank you for replying. Unfortunately, I couldn't image the original request from the error message of ``"Invalid field selection parents"``. I apologize for my poor skill. Can you provide the detail information for replicating your issue? I would like to confirm it and think of about the solution. – Tanaike Apr 15 '19 at 22:32
  • @BurcuDogan Files and folders shared with you appear as having no parents. That's by design. – rustyx Sep 04 '20 at 19:29

6 Answers6

6

In Java:

List<File> result = new ArrayList<File>();
Files.List request = drive.files().list();
request.setQ("'root'" + " in parents");

FileList files = null;
files = request.execute();

for (com.google.api.services.drive.model.File element : files.getItems()) {
    System.out.println(element.getTitle());
}

'root' is the parent folder, if the file or folder is in the root

Jasper Duizendstra
  • 2,587
  • 1
  • 21
  • 32
  • 1
    This finds files and folders in "My Drive", which is not actually the root, though, confusingly, the "My Drive" folder has the property isRoot = true. I'm trying to find a way to quickly locate items in the actual root of the document hierarchy, i.e. siblings of "My Drive". I've updated my question to reflect this. – FishesCycle Dec 26 '12 at 23:26
  • @Jasper can you give me the link to library you used cause i can't find any execute() function – Prakhar Aug 14 '14 at 11:20
  • You can obtain the request using `com.google.api.services.drive.Drive`, which can be created using `com.google.api.services.drive.Drive.Builder.Builder(HttpTransport, JsonFactory, HttpRequestInitializer)`. – Bruno Medeiros Oct 04 '16 at 14:59
1

Brute, but simple and it works..

    do {
        try {
            FileList files = request.execute();

            for (File f : files.getItems()) {
                if (f.getParents().size() == 0) {
                        System.out.println("Orphan found:\t" + f.getTitle());

                orphans.add(f);
                }
            }

            request.setPageToken(files.getNextPageToken());
        } catch (IOException e) {
            System.out.println("An error occurred: " + e);
            request.setPageToken(null);
        }
    } while (request.getPageToken() != null
            && request.getPageToken().length() > 0);
Feiteira
  • 811
  • 8
  • 9
0

The documentation recommends following query: is:unorganized owner:me.

Petr Kozelka
  • 7,670
  • 2
  • 29
  • 44
0

The premise is:

  • List all files.
  • If a file has no 'parents' field, it means it's an orphan file.
  • So, the script deletes them.

Before to start you need:

Ready for copy paste demo

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']

def callback(request_id, response, exception):
    if exception:
        print("Exception:", exception)

def main():
    """
   Description:
   Shows basic usage of the Drive v3 API to delete orphan files.
   """

    """ --- CHECK CREDENTIALS --- """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    """ --- OPEN CONNECTION --- """
    service = build('drive', 'v3', credentials=creds)

    page_token = ""
    files = None
    orphans = []
    page_size = 100
    batch_counter = 0

    print("LISTING ORPHAN FILES")
    print("-----------------------------")
    while (True):
        # List
        r = service.files().list(pageToken=page_token,
                                 pageSize=page_size,
                                 fields="nextPageToken, files"
                                 ).execute()
        page_token = r.get('nextPageToken')
        files = r.get('files', [])

        # Filter orphans
        # NOTE: (If the file has no 'parents' field, it means it's orphan)
        for file in files:
            try:
                if file['parents']:
                    print("File with a parent found.")
            except Exception as e:
                print("Orphan file found.")
                orphans.append(file['id'])

        # Exit condition
        if page_token is None:
            break

    print("DELETING ORPHAN FILES")
    print("-----------------------------")
    batch_size = min(len(orphans), 100)
    while(len(orphans) > 0):
        batch = service.new_batch_http_request(callback=callback)
        for i in range(batch_size):
            print("File with id {0} queued for deletion.".format(orphans[0]))
            batch.add(service.files().delete(fileId=orphans[0]))
            del orphans[0]
        batch.execute()
        batch_counter += 1
        print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
                                                             batch_size))


if __name__ == '__main__':
    main()

This method won't delete files in the root directory, as they have the 'root' value for the field 'parents'. If not all your orphan files are listed, it means they are being automatically deleted by google. This process might take up to 24h.

Adrian Lopez
  • 2,601
  • 5
  • 31
  • 48
0

Adreian Lopez, thanks for your script. It really saved me a lot of manual work. Below are the steps that I followed to implement your script:

  1. Created a folder c:\temp\pythonscript\ folder

  2. Created OAuth 2.0 Client ID using https://console.cloud.google.com/apis/credentials and downloaded the credentials file to c:\temp\pythonscript\ folder.

  3. Renamed the above client_secret_#######-#############.apps.googleusercontent.com.json as credentials.json

  4. Copied the Adreian Lopez's python's script and saved it as c:\temp\pythonscript\deleteGoogleDriveOrphanFiles.py

  5. Go to "Microsoft Store" on Windows 10 and install Python 3.8

  6. Open the Command Prompt and enter: cd c:\temp\pythonscript\

  7. run pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

  8. run python deleteGoogleDriveOrphanFiles.py and follow the steps on the screen to create c:\temp\pythonscript\token.pickle file and start deleting the orphan files. This step can take quite a while.

  9. Verify the https://one.google.com/u/1/storage

  10. Rerun step 8 again as necessary.

-1

Try to use this in your query:

'root' in parents