1đź‘Ť
First, hopefully you’re already using the csv
module rather than trying to parse it manually.
Second, it’s not entirely clear from your question, but it sounds like you’re trying to build up a simple tree structure from the data as you read it.
So, something like this?
with open('book.csv') as book:
chapters = collections.defaultdict(collections.defaultdict(list))
book.readline() # to skip the headers
for chapter_name, section_name, lesson_name in csv.reader(book):
chapters[chapter_name][section_name].append(lesson_name)
Of course that’s assuming you want an “associative tree”—a dict
of dict
s. A more normal linear tree, like a list
of list
s, or an implicit tree in the form of “parent pointers”, is even simpler.
For example, let’s say you have classes defined like this:
class Chapter(object):
def __init__(self, name):
self.name = name
class Section(object):
def __init__(self, chapter, name):
self.chapter = chapter
self.name = name
class Lesson(object):
def __init__(self, section, name):
self.section = section
self.name = name
And you want a dict
for each, mapping names to objects. So:
with open('book.csv') as book:
chapters, sections, lessons = {}, {}, {}
book.readline() # to skip the headers
for chapter_name, section_name, lesson_name in csv.reader(book):
chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
section = sections.setdefault(section_name, Section(chapter, section_name))
lesson = lessons.setdefault(lesson_name, Lesson(section, lesson_name))
Now, you can pick a random lesson, and print its chapter and section:
lesson = random.choice(lessons.values())
print('Chapter {}, Section {}: Lesson {}'.format(lesson.section.chapter.name,
lesson.section.name, lesson.name))
One last thing to keep in mind: In this example, the parent references don’t cause any circular references, because the parents don’t have references to their children. But what if you need that?
class Chapter(object):
def __init__(self, name):
self.name = name
self.sections = {}
class Section(object):
def __init__(self, chapter, name):
self.chapter = chapter
self.name = name
self.lessons = {}
# ...
chapter = chapters.setdefault(chapter_name, Chapter(chapter_name))
section = sections.setdefault(section_name, Section(chapter, section_name))
chapters[section_name] = section
So far, so good… but what happens when you’re done with all those objects? They have circular references, which can cause problems for garbage collection. Not insurmountable problems, but it does mean that objects won’t get collected as quickly in most implementations. For example, in CPython, things normally get collected as soon as the last reference goes out of scope—but if you have circular references, that never happens, so nothing gets collected until the next pass of the cycle detector. The solution to this is to use a weakref
for the parent pointer (or a collection of weakref
s to the children).