i wondering whether there easy way parse yaml document consisting of list of items python generator using pyyaml.
for example, given file
# foobar.yaml --- - foo: ["bar", "baz", "bah"] something_else: blah - bar: yet_another_thing
i'd able
for item in yaml.load_as_generator(open('foobar.yaml')): # not exist print(str(item))
i know there yaml.load_all, can achieve similar functionality, need treat each record own document. reason why i'm asking because have big files i'd convert yaml , parse low memory footprint.
i took @ pyyaml events api scared me =)
i can understand events api scares you, , bring much. first of need keep track of depth (because have top level complex sequence items, "bar", "baz" etc. and, having cut low level sequence event elements correctly have feed them composer create nodes (and python objects), not trivial either.
but since yaml uses indentation, scalars spanning multiple lines, can use simple line based parser recognises each sequence element starts , feed normal load()
function 1 @ time:
#/usr/bin/env python import ruamel.yaml def list_elements(fp, depth=0): buffer = none in_header = true list_element_match = ' ' * depth + '- ' line in fp: if line.startswith('---'): in_header = false continue if in_header: continue if line.startswith(list_element_match): if buffer none: buffer = line continue yield ruamel.yaml.load(buffer)[0] buffer = line continue buffer += line if buffer: yield ruamel.yaml.load(buffer)[0] open("foobar.yaml") fp: element in list_elements(fp): print(str(element))
resulting in:
{'something_else': 'blah', 'foo': ['bar', 'baz', 'bah']} {'bar': 'yet_another_thing'}
i used enhanced version of pyyaml, ruamel.yaml here (of author), pyyaml should work in same way.