Skip to content Skip to sidebar Skip to footer

Parse Date String To Datetime With Timezone

I have a string: r = 'Thu Dec 17 08:56:41 CST 2020' Here CST represent China central time('Asia/Shanghai'). I wanted to parse it to datetime...I am doing something like from datep

Solution 1:

Using dateutil.parser you can directly parse your date correctly.

Note that CST is an ambiguous timezone, so you need to specify which one you mean. You can either do this directly in the tzinfos parameter of the parse() call or you can define a dictionary that has mappings for timezones and pass this. In this dict, you can either specify the offset, e.g.

timezone_info = {
        "CDT": -5 * 3600,
        "CEST": 2 * 3600,
        "CST": 8 * 3600
}

parser.parse(r, tzinfos=timezone_info)

or (using gettz) directly specify a timezone:

timezone_info = {
        "CDT": gettz("America/Chicago"),
        "CEST": gettz("Europe/Berlin"),
        "CST": gettz("Asia/Shanghai")
}

parser.parse(r, tzinfos=timezone_info)

See also the dateutil.parser documentation and the answers to this SO question.

Be aware that the latter approach is tricky if you have a location with daylight saving time! Depending on the date you apply it to, gettz("America/Chicago") will have UTC-5 or UTC-6 as a result (as Chicago switches between Central Standard Time and Central Daylight Time). So depending on your input data, the second example may actually not really be correct and yield the wrong outcome! Currently, China observes China Standard Time (CST) all year, so for your use case it makes no difference (may depend on your date range though).

Overall:

from dateutil import parser
from dateutil.tz import gettz

timezone_info = {"CST": gettz("Asia/Shanghai")}

r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, tzinfos=timezone_info)

print(d)
print(d.strftime('%Y-%m-%d %H:%M:%S %Z%z'))

gets you

2020-12-17 08:56:41+08:00
2020-12-17 08:56:41 CST+0800

EDIT: Printing the human readable timezone name instead of the abbreviated one name is just a little more complicated with this approach, as dateutil.tz.gettz() gets you a tzfile that has no attribute which has just the name. However, you can obtain it via the protected _filename using split():

print(d.strftime('%Y-%m-%d %H:%M:%S') + " in " + "/".join(d.tzinfo._filename.split('/')[-2:]))

yields

2020-12-17 08:56:41+08:00 in Asia/Shanghai

This of course only works if you used gettz() to set the timezone in the first place.

EDIT 2: If you know that all your dates are in CST anyway, you can also ignore the timezone when parsing. This gets you naive (or unanware) datetimes which you can then later add a human readable timezone to. You can do this using replace() and specify the timezone either as shown above using gettz() or using timezone(() from the pytz module:

from dateutil import parser
from dateutil.tz import gettz
import pytz

r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, ignoretz=True)

d_dateutil = d.replace(tzinfo=gettz('Asia/Shanghai'))
d_pytz = d.replace(tzinfo=pytz.timezone('Asia/Shanghai'))

Note that depending on which module you use to add the timezone information, the class of tzinfo differs. For the pytz object, there is a more direct way of accessing the timezone in human readable form:

print(type(d_dateutil.tzinfo))
print("/".join(d_dateutil.tzinfo._filename.split('/')[-2:]))

print(type(d_pytz.tzinfo))
print(d_pytz.tzinfo.zone)

produces

<class 'dateutil.tz.tz.tzfile'>
Asia/Shanghai
<class 'pytz.tzfile.Asia/Shanghai'>
Asia/Shanghai

Solution 2:

from datetime import datetime
import pytz

# The datetime string you have
r = "Thu Dec 17 08:56:41 CST 2020"

# The time-zone string you want to use
offset_string = 'Asia/Shanghai'

# convert the time zone string into offset from UTC
#    a. datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds() --- returns seconds offset from UTC
#    b. convert seconds into hours (decimal) --- divide by 60 twice
#    c. remove the decimal point, we want the structure as: +0800
offset_num_repr = '+{:05.2f}'.format(datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds()/60/60).replace('.', '')
print('Numeric representation of the offset: ', offset_num_repr)

# replace the CST 2020 with numeric timezone offset
#    a. replace it with the offset computed above
updated_datetime = str(r).replace('CST', offset_num_repr)
print('\t    Modified datetime string: ', updated_datetime)

# Now parse your string into datetime object
r = datetime.strptime(updated_datetime, "%a %b %d %H:%M:%S %z %Y")
print('\tFinal parsed datetime object: ', r)

Should produce:

Numeric representation of the offset:  +0800
            Modified datetime string:  Thu Dec 17 08:56:41 +0800 2020
        Final parsed datetime object:  2020-12-17 08:56:41+08:00

Post a Comment for "Parse Date String To Datetime With Timezone"