[Answered ]-Parsing XML into a dictionary of lists Python/Django

2👍

So, for what I gather in the expected results, it looks like you just want to get the information about nodes that are strictly BaseCategory, right? In the XML that was provided in the edit, you have two of those.

You should see the XML as a tree of nodes. In the example, you have something like:

                     FormInstance  # this is the root
                      /         \
                     /           \
             BaseCategory       BaseCategory
             (name:Sales)    (name:Information)
                                    \
                                     \
                                  MainCategory
                                (name:Address 3)
                                        \
                                         \
                                      Subcategory
                                  (name:Street Number 2)

But you only need the information in the BaseCategory elements, right?

You could just position yourself in the root (which… well… is what xml.fromstring does anyway) iterate over its BaseCategory nodes, get the items you need from those BaseCategory nodes and put them in your list of dictionaries.

Something like:

import pprint
from xml.etree import ElementTree

with open("sample_xml.xml", 'r') as f:
    data = f.read()
    xml_data = ElementTree.fromstring(data)

base_categories = xml_data.findall("./BaseCategory")
print("Found %s base_categories." % len(base_categories))
list_dict = []
for base_category in base_categories:
    list_dict.append({
        "name": base_category.find("Name").text,
        "id": int(base_category.find("base_id").text),
        "position": int(base_category.find("position").text),
        "order_by_type": (True if base_category.find("order_by_type").text.lower() == "true"
                          else False),
        "order_by_asc": (True if base_category.find("order_by_asc").text.lower() == "true"
                         else False),
    })

print("list_dict=%s" % (pprint.pformat(list_dict)))

Which outputs:

Found 2 base_categories.
list_dict=[{'id': 1,
  'name': 'Sales',
  'order_by_asc': True,
  'order_by_type': True,
  'position': 10},
 {'id': 2,
  'name': 'Information',
  'order_by_asc': True,
  'order_by_type': True,
  'position': 20}]

The idea is that a BaseCategory item is something that can be seen as a self-contained record (like a dict, if it helps you see it) that can contain (in it) the following attributes:

  • A string with the name in Name
  • A numeric id in base_id
  • A numeric position
  • A boolean order_by_type
  • A boolean order_by_asc
  • Another object MainCategory with its own fields…

So every time you position yourself in one of these BaseCategory nodes, you just gather the interesting fields that it has and put them in dictionaries.

When you do:

base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')

You are treating those element (base_id, position…) almost as independent elements, which is not exactly what you have in your XML.

However, if you are absolutely certain that all those lists (base_cats, base_cats_id, base_position…) do contain the same number of items, you can still re-build your dictionary, using the lenght of one of them (in the example below len(base_cats), but it could’ve been len(base_cats_id), len(base_position)… since all those lists have the same length) to iterate through all the lists in the same step:

base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')

list_dict = []
for i in range(len(base_cats)):
    list_dict.append({
        "name": base_cats[i].text,
        "id": int(base_cats_id[i].text),
        "position": int(base_postion[i].text),
        "order_by_type": True if base_order_by_type[i].text.lower() == "true" else False,
        "order_by_asc": True if base_order_by_asc[i].text.lower() == "true" else False,
    })
print("list_dict=%s" % (pprint.pformat(list_dict)))
👤Savir

Leave a comment