[Answer]-CSV Parsing for Book

1đź‘Ť

âś…

First, hopefully you’re already using the csv module rather than trying to parse it manually.

Second, it’s not entirely clear from your question, but it sounds like you’re trying to build up a simple tree structure from the data as you read it.

So, something like this?

with open('book.csv') as book:
    chapters = collections.defaultdict(collections.defaultdict(list))
    book.readline() # to skip the headers
    for chapter_name, section_name, lesson_name in csv.reader(book):
        chapters[chapter_name][section_name].append(lesson_name)

Of course that’s assuming you want an “associative tree”—a dict of dicts. A more normal linear tree, like a list of lists, or an implicit tree in the form of “parent pointers”, is even simpler.

For example, let’s say you have classes defined like this:

class Chapter(object):
    def __init__(self, name):
        self.name = name

class Section(object):
    def __init__(self, chapter, name):
        self.chapter = chapter
        self.name = name

class Lesson(object):
    def __init__(self, section, name):
        self.section = section
        self.name = name

And you want a dict for each, mapping names to objects. So:

with open('book.csv') as book:
    chapters, sections, lessons = {}, {}, {}
    book.readline() # to skip the headers
    for chapter_name, section_name, lesson_name in csv.reader(book):
        chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
        section = sections.setdefault(section_name, Section(chapter, section_name))
        lesson = lessons.setdefault(lesson_name, Lesson(section, lesson_name))

Now, you can pick a random lesson, and print its chapter and section:

lesson = random.choice(lessons.values())
print('Chapter {}, Section {}: Lesson {}'.format(lesson.section.chapter.name,
                                                 lesson.section.name, lesson.name))

One last thing to keep in mind: In this example, the parent references don’t cause any circular references, because the parents don’t have references to their children. But what if you need that?

class Chapter(object):
    def __init__(self, name):
        self.name = name
        self.sections = {}

class Section(object):
    def __init__(self, chapter, name):
        self.chapter = chapter
        self.name = name
        self.lessons = {}

# ...

chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
section = sections.setdefault(section_name, Section(chapter, section_name))
chapters[section_name] = section

So far, so good… but what happens when you’re done with all those objects? They have circular references, which can cause problems for garbage collection. Not insurmountable problems, but it does mean that objects won’t get collected as quickly in most implementations. For example, in CPython, things normally get collected as soon as the last reference goes out of scope—but if you have circular references, that never happens, so nothing gets collected until the next pass of the cycle detector. The solution to this is to use a weakref for the parent pointer (or a collection of weakrefs to the children).

👤abarnert

Leave a comment