[Fixed]-How to upload folder on Google Cloud Storage using Python API

17πŸ‘

This works for me. Copy all content from a local directory to a specific bucket-name/full-path (recursive) in google cloud storage:

import glob
from google.cloud import storage

def upload_local_directory_to_gcs(local_path, bucket, gcs_path):
    assert os.path.isdir(local_path)
    for local_file in glob.glob(local_path + '/**'):
        if not os.path.isfile(local_file):
           upload_local_directory_to_gcs(local_file, bucket, gcs_path + "/" + os.path.basename(local_file))
        else:
           remote_path = os.path.join(gcs_path, local_file[1 + len(local_path):])
           blob = bucket.blob(remote_path)
           blob.upload_from_filename(local_file)


upload_local_directory_to_gcs(local_path, bucket, BUCKET_FOLDER_DIR)
πŸ‘€Maor88

10πŸ‘

A version without a recursive function, and it works with β€˜top level files’ (unlike the top answer):

import glob
import os 
from google.cloud import storage

GCS_CLIENT = storage.Client()
def upload_from_directory(directory_path: str, dest_bucket_name: str, dest_blob_name: str):
    rel_paths = glob.glob(directory_path + '/**', recursive=True)
    bucket = GCS_CLIENT.get_bucket(dest_bucket_name)
    for local_file in rel_paths:
        remote_path = f'{dest_blob_name}/{"/".join(local_file.split(os.sep)[1:])}'
        if os.path.isfile(local_file):
            blob = bucket.blob(remote_path)
            blob.upload_from_filename(local_file)
πŸ‘€Frederik Bode

4πŸ‘

A folder is a cataloging structure containing references to files and directories. The library will not accept a folder as an argument.

As far as I understand, your use case is to make an upload to GCS preserving a local folder structure. To accomplish that you can use the os python module and make a recursive function (e.g process_folder) that will take path as an argument. This logic can be used for the function:

  1. Use os.listdir() method to get a list of objects within the source path (will return both files and folders).
  2. Iterate over a list from step 1 to separate files from folders via os.path.isdir() method.
  3. Iterate over files and upload them with adjusted path (e.g. path+ β€œ/β€œ + file_name).
  4. Iterate over folders making a recursive call (e.g. process_folder(path+folder_name)).

It’ll be necessary to work with two paths:

  1. Real system path (e.g. β€œ/Users/User/…/upload_folder/folder_name”) used with os module.
  2. Virtual path for GCS file uploads (e.g. β€œupload”+”/β€œ + folder_name + ”/β€œ + file_name).

Don’t forget to implement exponential backoff referenced at [1] to deal with 500 errors. You can use a Drive SDK example at [2] as a reference.

[1] – https://developers.google.com/storage/docs/json_api/v1/how-tos/upload#exp-backoff
[2] – https://developers.google.com/drive/web/handle-errors

πŸ‘€Nikita Uchaev

1πŸ‘

I assume the sheer filename = "D:\foldername" is not enough info about the source code. Neither am I sure that this is even possible.. via the web interface you can also just upload files or create folders where you then upload the files.

You could save the folders name, then create it (I’ve never used the google-app-engine, but I guess that should be possible) and then upload the contents to the new folder

0πŸ‘

Refer –
https://hackersandslackers.com/manage-files-in-google-cloud-storage-with-python/

from os import listdir
from os.path import isfile, join

...

def upload_files(bucketName):
    """Upload files to GCP bucket."""
    files = [f for f in listdir(localFolder) if isfile(join(localFolder, f))]
    for file in files:
        localFile = localFolder + file
        blob = bucket.blob(bucketFolder + file)
        blob.upload_from_filename(localFile)
    return f'Uploaded {files} to "{bucketName}" bucket.'
πŸ‘€Ganesh Kharad

0πŸ‘

The solution can also be used for windows systems. Simply provide the folder name to upload the destination bucket name.Additionally, it can handle any level of subdirectories in a folder.

import os
from google.cloud import storage
storage_client = storage.Client()
def upload_files(bucketName, folderName):
"""Upload files to GCP bucket."""
bucket = storage_client.get_bucket(bucketName)
for path, subdirs, files in os.walk(folderName):
    for name in files:
        path_local = os.path.join(path, name)
        blob_path = path_local.replace('\\','/')
        blob = bucket.blob(blob_path)
        blob.upload_from_filename(path_local)

0πŸ‘

I just came across the gcsfs library which seems to be also about better interfaces

You could copy an entire directory into a gcs location like this:


def upload_to_gcs(src_dir: str, gcs_dst: str):
    fs = gcsfs.GCSFileSystem()
    fs.put(src_dir, gcs_dst, recursive=True)
πŸ‘€Tsvi Sabo

0πŸ‘

Here is my recursive implementation . we need to create a file named gdrive_utils.py and write the following.


from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from apiclient.http import MediaFileUpload, MediaIoBaseDownload
import pickle
import glob
import os


# The following scopes are required for access to google drive.
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly',
          'https://www.googleapis.com/auth/drive.metadata',
          'https://www.googleapis.com/auth/drive',
          'https://www.googleapis.com/auth/drive.file',
          'https://www.googleapis.com/auth/drive.appdata']


def get_gdrive_service():
    """
    Tries to authenticate using a token. If token expires or not present creates one.
    :return: Returns authenticated service object
    :rtype: object
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'keys/client-secret.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    # return Google Drive API service
    return build('drive', 'v3', credentials=creds)


def createRemoteFolder(drive_service, folderName, parent_id):
    # Create a folder on Drive, returns the newely created folders ID
    body = {
        'name': folderName,
        'mimeType': "application/vnd.google-apps.folder",
        'parents': [parent_id]
    }

    root_folder = drive_service.files().create(body = body, supportsAllDrives=True, fields='id').execute()
    return root_folder['id']


def upload_file(drive_service, file_location, parent_id):
    # Create a folder on Drive, returns the newely created folders ID
    body = {
        'name': os.path.split(file_location)[1],
        'parents': [parent_id]
    }

    media = MediaFileUpload(file_location,
                            resumable=True)

    file_details = drive_service.files().create(body = body,
                                                media_body=media,
                                                supportsAllDrives=True,
                                                fields='id').execute()
    return file_details['id']


def upload_file_recursively(g_drive_service, root, folder_id):

    files_list = glob.glob(root)
    if files_list:
        for file_contents in files_list:
            if os.path.isdir(file_contents):
                # create new _folder
                new_folder_id = createRemoteFolder(g_drive_service, os.path.split(file_contents)[1],
                                                   folder_id)
                upload_file_recursively(g_drive_service, os.path.join(file_contents, '*'), new_folder_id)
            else:
                # upload to given folder id
                upload_file(g_drive_service, file_contents, folder_id)

After that use the following

import os

from gdrive_utils import createRemoteFolder, upload_file_recursively, get_gdrive_service

g_drive_service = get_gdrive_service()
FOLDER_ID_FOR_UPLOAD = "<replace with folder id where you want upload>"
main_folder_id = createRemoteFolder(g_drive_service, '<name_of_main_folder>', FOLDER_ID_FOR_UPLOAD)

And finally use this

upload_file_recursively(g_drive_service, os.path.join("<your_path_>", '*'), main_folder_id)

0πŸ‘

Another option is to use gsutil, the command-line tool for interacting with Google Cloud:

gsutil cp -r ./my/local/directory gs://my_gcp_bucket/foo/bar

The -r flag tells gsutil to copy recursively. Link gsutil to documentation.

Invoking gsutil in Python can be done like this:

import subprocess

subprocess.check_call('gsutil cp -r ./my/local/directory gs://my_gcp_bucket/foo/bar')
πŸ‘€Toldry

Leave a comment