Hi there, let’s say we have an XML file like this:
songs.xml
<?xml version="1.0"?>
<genre catalogue="Pop">
<song title="Song 1">
<artist>Artist 1</artist>
<year>2000</year>
<album>Album 1</album>
</song>
<song title="Song 2">
<artist>Artist 2</artist>
<year>2010</year>
<album>Album 2</album>
</song>
<song title="Song 3">
<artist>Artist 3</artist>
<year>2015</year>
<album>Album 3</album>
</song>
</genre>
So all we need to provide the name of the xml file in the Python script below:
main.py
import pandas as pd
import xml.etree.ElementTree as ET
filename = 'songs'
tree = ET.parse(filename+'.xml')
root = tree.getroot()
element_dict = {}
for elem in root.iter():
element_dict[elem.tag]=[]
for elem in root.iter():
if elem.text=='\n':
element_dict[elem.tag].append(elem.attrib)
else:
element_dict[elem.tag].append(elem.text)
def make_list(dict_list, placeholder):
lmax = 0
for lname in dict_list.keys():
lmax = max(lmax, len(dict_list[lname]))
for lname in dict_list.keys():
ll = len(dict_list[lname])
if ll < lmax:
dict_list[lname] += [placeholder] * (lmax - ll)
return dict_list
ans = make_list(element_dict,-1)
df = pd.DataFrame(ans)
print(df)
df.to_csv(filename+".csv")
The above make_list function will take input dictionary as:
{'genre': [{'catalogue': 'Pop'}], 'song': [{'title': 'Song 1'}, {'title': 'Song 2'}, {'title': 'Song 3'}], 'artist': ['Artist 1', 'Artist 2', 'Artist 3'], 'year': ['2000', '2010', '2015'], 'album': ['Album 1', 'Album 2', 'Album 3']}
And output a dictionary with unknown information replaced by the placeholder like this:
{'genre': [{'catalogue': 'Pop'}, -1, -1], 'song': [{'title': 'Song 1'}, {'title': 'Song 2'}, {'title': 'Song 3'}], 'artist': ['Artist 1', 'Artist 2', 'Artist 3'], 'year': ['2000', '2010', '2015'], 'album': ['Album 1', 'Album 2', 'Album 3']}
output of main.py
Run the main.py as
python3 main.py
output
A csv file will be generated (songs.csv) at the same location or folder as main.py file and you will see this output:
genre song artist year album
0 {'catalogue': 'Pop'} {'title': 'Song 1'} Artist 1 2000 Album 1
1 -1 {'title': 'Song 2'} Artist 2 2010 Album 2
2 -1 {'title': 'Song 3'} Artist 3 2015 Album 3
So, next time no matter how complex is your XML file, you can still get the data out of it, and use it for the applications of Artificial Intelligence. Enjoyed this post? Never miss out on future posts by following us