Skip to content Skip to sidebar Skip to footer

Pyyaml Interprets String As Timestamp

It looks as though PyYAML interprets the string 10:01 as a duration in seconds: import yaml >>> yaml.load('time: 10:01') {'time': 601} The official documentation does not

Solution 1:

Put it in quotes:

>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}

This tells YAML that it is a literal string, and inhibits attempts to treat it as a numeric value.

Solution 2:

Since you are using a parser for YAML 1.1, you should expect what is indicated in the specification (example 2.19) to be implemented:

sexagesimal: 3:25:45

The sexagesimals are further explained here:

Using “:” allows expressing integers in base 60, which is convenient for time and angle values.

Not every detail that is implemented in PyYAML is in the documentation that you refer to, you should only see that as an introduction.


You are not the only one that found this interpretation confusing, and in YAML 1.2 sexagesimals were dropped from the specification. Although that specification has been out for about eight years, the changes have never been implemented in PyYAML.

The easiest way to solve this is to upgrade to ruamel.yaml (disclaimer: I am the author of that package), you'll get the YAML 1.2 behaviour (unless you explicitly specify you want to use YAML 1.1) that interprets 10:01 as a string:

from ruamel import yaml

import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)

data = yaml.load("time: 10:01")
print(data)

which gives:

{'time': '10:01'}

The warnings.filter is only necessary because you use .load() instead of .safe_load(). The former is unsafe and can lead to a wiped disk, or worse, when used on uncontrolled YAML input. There is seldom a reason not to use .safe_load().

Solution 3:

If you wish to monkeypatch the pyyaml library so it does not have this behavior (since there is no neat way to do this), for a resolver of your choice, the code below works. The problem is that the regex that is used for int includes some code to match timestamps even though it looks like there's no spec for this behavior, it was just deemed as a "good practice" for strings like 30:00 or 40:11:11:11:11 to be treated as integers.

import yaml
import re

defpartition_list(somelist, predicate):
    truelist = []
    falselist = []
    for item in somelist:
        if predicate(item):
            truelist.append(item)
        else:
            falselist.append(item)
    return truelist, falselist

@classmethoddefinit_implicit_resolvers(cls):
    """ 
    creates own copy of yaml_implicit_resolvers from superclass
    code taken from add_implicit_resolvers; this should be refactored elsewhere
    """ifnot'yaml_implicit_resolvers'in cls.__dict__:
        implicit_resolvers = {}
        for key in cls.yaml_implicit_resolvers:
            implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
        cls.yaml_implicit_resolvers = implicit_resolvers

@classmethoddefremove_implicit_resolver(cls, tag, verbose=False):
    cls.init_implicit_resolvers()
    removed = {}
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
        if vremoved:
            cls.yaml_implicit_resolvers[key] = v2
            removed[key] = vremoved
    return removed

@classmethoddef_monkeypatch_fix_int_no_timestamp(cls):
    bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vcopy = v[:]
        n = 0for k in xrange(len(v)):
            if v[k][0] == 'tag:yaml.org,2002:int'and bad in v[k][1].pattern:
                n += 1
                p = v[k][1]
                p2 = re.compile(p.pattern.replace(bad,''), p.flags)
                vcopy[k] = (v[k][0], p2)    
        if n > 0:
            cls.yaml_implicit_resolvers[key] = vcopy

yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp

Then if you do this:

classMyResolver(yaml.resolver.Resolver):
    pass

t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()

classMyLoader(yaml.SafeLoader, MyResolver):
    pass

text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)

then it prints

{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}

showing that the default yaml behavior has been left unchanged but your private loader class handles these strings sanely.

Post a Comment for "Pyyaml Interprets String As Timestamp"