2👍
So, for what I gather in the expected results, it looks like you just want to get the information about nodes that are strictly BaseCategory
, right? In the XML that was provided in the edit, you have two of those.
You should see the XML as a tree of nodes. In the example, you have something like:
FormInstance # this is the root
/ \
/ \
BaseCategory BaseCategory
(name:Sales) (name:Information)
\
\
MainCategory
(name:Address 3)
\
\
Subcategory
(name:Street Number 2)
But you only need the information in the BaseCategory
elements, right?
You could just position yourself in the root
(which… well… is what xml.fromstring
does anyway) iterate over its BaseCategory
nodes, get the items you need from those BaseCategory
nodes and put them in your list of dictionaries.
Something like:
import pprint
from xml.etree import ElementTree
with open("sample_xml.xml", 'r') as f:
data = f.read()
xml_data = ElementTree.fromstring(data)
base_categories = xml_data.findall("./BaseCategory")
print("Found %s base_categories." % len(base_categories))
list_dict = []
for base_category in base_categories:
list_dict.append({
"name": base_category.find("Name").text,
"id": int(base_category.find("base_id").text),
"position": int(base_category.find("position").text),
"order_by_type": (True if base_category.find("order_by_type").text.lower() == "true"
else False),
"order_by_asc": (True if base_category.find("order_by_asc").text.lower() == "true"
else False),
})
print("list_dict=%s" % (pprint.pformat(list_dict)))
Which outputs:
Found 2 base_categories.
list_dict=[{'id': 1,
'name': 'Sales',
'order_by_asc': True,
'order_by_type': True,
'position': 10},
{'id': 2,
'name': 'Information',
'order_by_asc': True,
'order_by_type': True,
'position': 20}]
The idea is that a BaseCategory
item is something that can be seen as a self-contained record (like a dict, if it helps you see it) that can contain (in it) the following attributes:
- A string with the name in
Name
- A numeric id in
base_id
- A numeric
position
- A boolean
order_by_type
- A boolean
order_by_asc
- Another object
MainCategory
with its own fields…
So every time you position yourself in one of these BaseCategory
nodes, you just gather the interesting fields that it has and put them in dictionaries.
When you do:
base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')
You are treating those element (base_id
, position
…) almost as independent elements, which is not exactly what you have in your XML.
However, if you are absolutely certain that all those lists (base_cats
, base_cats_id
, base_position
…) do contain the same number of items, you can still re-build your dictionary, using the lenght of one of them (in the example below len(base_cats)
, but it could’ve been len(base_cats_id)
, len(base_position)
… since all those lists have the same length) to iterate through all the lists in the same step:
base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')
list_dict = []
for i in range(len(base_cats)):
list_dict.append({
"name": base_cats[i].text,
"id": int(base_cats_id[i].text),
"position": int(base_postion[i].text),
"order_by_type": True if base_order_by_type[i].text.lower() == "true" else False,
"order_by_asc": True if base_order_by_asc[i].text.lower() == "true" else False,
})
print("list_dict=%s" % (pprint.pformat(list_dict)))