Sorting a `dictionary` in python

Many people think that sorting is not possible for dictionary, because they are not able to find a `.sort()` as in list for dictionary. Actually there is a function called `sorted()` which can be used to sort dictionary based on keys as well as values. Check the official python documentation for more details.

#! /usr/bin/python

a = {'one': 1, 'three': 3, 'four': 4, 'two': 2}
sorted(a.items(), key=lambda x: x[1])

# output [('one', 1), ('two', 2), ('three', 3), ('four', 4)]

Counting hashable objects

How can we count the number of occurance of alphabets in a string? As soon as this question hits, many of us starts to think about loops… But there is a better solution in python called ‘collections.Counter’

#! /usr/bin/python

import collections
the_string = 'sfsdfsdfsdfdsfsdfsdfsdfsdf'
results = collections.Counter(the_string)

# output
# Counter({'s': 9, 'f': 9, 'd': 8})

Check the official python documentation for more details.

Fetching huge datasets using Iterator protocol

Some times you want to retrieve a huge dataset and iterate/loop over it to perform some operation. Fetching them all into memory at once can lock up the server processes. To avoid such problems and to iterate over the dataset you may use pythons `Iterator Protocol`. Read the following to have an better understanding about iterators,

While iterating over the dataset, actually python is calling the `__next__` method of the iterable. So if you want to fetch a huge dataset you can do it in several small batches/chunks with the help of `__next__`. First fetch a small portion of the dataset and when the end of the small portion is reached, fetch the next small section. Like that you can fetch and process the entire dataset without any complexity.

The advantage of using `Iterator Protocol` is programmer can interpret the iteration as a single loop. He should not worry about fetching results in small chunks, Its been taken care by the `Iterator`.

#! /usr/bin/python


class QueryIterator(object):
    query = None
    results = None

    def __init__(self, query=None):
        self.query = query

    def __iter__(self):
        return self

    def next(self):
        try:
            """
            Logic to return next entry in self.results
            """
            pass
        except StopIteration:
            """
            Logic to populate results again ( eg: call populate_date() )
            and return the next entry in self.results
            """
            pass

    def populate_data(self):
        """
        Logic to execute query in small batches/chunks
        and store results in self.results
        """
        pass

Serializing Objects of depth `N`

Many of the programmers might have faced the problem of serializing objects of depth `N` or serializing nested objects. Personally i have also faced such a situation and here is the solution which i found. There may be better implementations of serializing nested objects, but i think this might be helpful for some novices like me.

#! /usr/bin/python
import json


class Reference(object):
    pass


class Engineering(Reference):
   subject = 'ECE'
   college = 'Jyothi Engineering College'

   _serializable_fields = ['subject', 'college']


class Programmer(Reference):
   designation = 'Software Engineer'
   company = 'Scintilla Technologies'

   _serializable_fields = ['designation', 'company']


class Individual(object):
   name = 'Akhil Lawrence'
   age = 23
   job = Programmer()
   education = Engineering()

   _serializable_fields = ['name', 'age', 'job', 'education']


def Serialize(obj, excludes=[]):
   response = dict()
   for attr_name in obj._serializable_fields:
       if attr_name in excludes:
           continue
       else:
           attr = getattr(obj, attr_name)
           if attr and not isinstance(attr, Reference):
               response.update({attr_name: attr})
           elif attr and isinstance(attr, Reference):
               response.update({attr_name: Serialize(attr)})
   return response


def Serializer(objs, excludes=[]):
   response = []
   for obj in objs:
       response.append(Serialize(obj, excludes))
   return response


if __name__ == '__main__':
   objs = [Individual(), Individual()]
   excludes = []
   print json.dumps(Serializer(objs, excludes), indent=4)