[Answered ]-How to cache the data pulled from BigQuery in Django?

1👍

The are many way to cache data in Django, here i’ll show you the most fast and efficient one according to Django Doc : memcached

  1. Download and Install memcached : If you are on Linux, download from https://memcached.org/downloads.
    ./configure && make && sudo make install to install.
    After installing Memcached, open a shell and start it using the following command: memcached -l 127.0.0.1:11211

  2. Install a python bindings : After installing Memcached, you have to install its Python bindings. You cane use pymemcache, pip install pymemcache.

  3. Add memcached settings to your project :

    CACHES = {
      'default': {
        'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
        'LOCATION': '127.0.0.1:11211',
      }
    }
    
  4. Cache levels : Django provides the following levels of caching, listed here by ascending order of granularity:

  • Low-level cache API: Provides the highest granularity. Allows you to cache specific queries or calculations.
  • Template cache: Allows you to cache template fragments.
  • Per-view cache: Provides caching for individual views.
  • Per-site cache: The highest-level cache. It caches your entire site.
  1. Using Low-Level cache API in your example :

views.py

from django.core.cache import cache

def index(request):
    # First, try to retrieve the cached data if exist
    cached_data_json = cache.get('cached_data_json')
    
    # If the data has not been cached yet, generate and cache it
    if not cached_data_json:
        client = bigquery.Client()
        dataset_id = "project-id.dataset-id"
        tables = client.list_tables(dataset_id)
        tables_list = [table.table_id for table in tables]

        data_list = []
        for table_id in tables_list:
          query_string = f"""
            SELECT *
            FROM `project-id.dataset-id.{table_id}`
          """

          query_job = (
            client.query(query_string)
            .result()
          )

       records = [dict(row) for row in query_job]
       data_list.extend(records)

       df = pd.DataFrame(data_list)
       ...
       # Data manipulation syntax here using pandas dataframe
       ...
       data_json = df.to_json(orient="records")
       # Cache the data
       cache.set('cached_data_json', data_json)

    context = {'data_json': data_json}

    return render(request, 'template_name', context)

This is how it’s work: if the cached_data_json is present in the cache then use it, else, make a new computation and store it in the cache as cached_data_json.

OPTIONAL : You can use django-memcache-status in order to monitor Memcached.

Leave a comment