Pandas Merging/Join Dataframes
Solution 1:
Well, I'm going to go out on a ledge and show you how to merge with suffixes:
# First import our libraries
>>> import pandas as pd
>>> import numpy as np
# Then create our dataframes
>>> df_A = pd.DataFrame(np.random.rand(3,2),columns=['times','spots'])
>>> df_B = pd.DataFrame(np.random.rand(3,2),columns=['times','spots'])
# Set default values
>>> df_A['times'] = [1,2,3]
>>> df_B['times'] = [1,2,3]
>>> df_A['spots'] = [44,55,66]
>>> df_B['spots'] = [77,88,99]
# Here is what both dataframes contain
>>> df_A
times spots
0 1 44
1 2 55
2 3 66
>>> df_B
times spots
0 1 77
1 2 88
2 3 99
# Now the merge -- note: this does not affect the first dataframe in place.
# It will create a new dataframe. You can overwrite the
# first if you set the result to df_A instead of df_merged.
# Note the use of the keyword, suffixes. In the event that the same column names exist
# in both dataframes (that aren't being merged on) Pandas will need to differentiate
# between them. By default same column names will result in a '_x' will be appended to
# the left dataframe column name, and a '_y' to the right dataframe column name
# [order is set by the first two arguments in the merge function].
# The suffixes keyword allows the user to override this behaviour with their
# own version of '_x' and '_y'.
>>> df_merged = pd.merge(df_A,df_B,how='inner',on=['times'],suffixes=['_A','_B'])
>>> df_merged
times spots_A spots_B
0 1 44 77
1 2 55 88
2 3 66 99
Now it seems from your question that you didn't want to modify the first dataframes column name for spots. This can be achieved in the same fashion, except instead of using suffixes=['_A','_B']
use suffixes=['','_B']
. This essentially sets the left dataframe column suffix to nothing, so it stays the same:
>>> df_merged = pd.merge(df_A,df_B,how='inner',on=['times'],suffixes=['','_B'])
>>> df_merged
times spots spots_B
0 1 44 77
1 2 55 88
2 3 66 99
Voila! I hope this helped. If I misunderstood, and you were actually looking for the interpolated values between A and B, let me know and I will edit this answer.
* Edit 1 *
Taking into consideration your last comment, here is what I believe you are trying to accomplish. Below I will show you how to extend the merging with suffixes and then fill NaNs in spots_B with interpolated values using the 'time' interpolation method
# Start by creating out datetimes to set for the times column
>>> times_A = []
>>> times_B = []
>>> for i in range(1,4):
... times_A.append(datetime.datetime(year=2011,month=5,day=i))
...
>>> for i in range(1,6,2):
... times_B.append(datetime.datetime(year=2011,month=5,day=i))
...
# times_A: May 1st, 2011 - May 3rd, 2011
>>> times_A
[datetime.datetime(2011, 5, 1, 0, 0), datetime.datetime(2011, 5, 2, 0, 0), datetime.datetime(2011, 5, 3, 0, 0)]
# times_B: May 1st 2011, May 3rd 2011, May 5th 2011
>>> times_B
[datetime.datetime(2011, 5, 1, 0, 0), datetime.datetime(2011, 5, 3, 0, 0), datetime.datetime(2011, 5, 5, 0, 0)]
# So now times_B is missing May 2nd, and has an extra time, May 5th.
>>> df_A['times'] = times_A
>>> df_B['times'] = times_B
>>> df_A['spots'] = [44,55,66]
>>> df_B['spots'] = [44,66,88]
>>> df_A
times spots
0 2011-05-01 00:00:00 44
1 2011-05-02 00:00:00 55
2 2011-05-03 00:00:00 66
>>> df_B
times spots
0 2011-05-01 00:00:00 44
1 2011-05-03 00:00:00 66
2 2011-05-05 00:00:00 88
# Now it appears you only care about the times in df_A - so
# left merge df_A with df_B (include all times from df_A and
# try to merge with df_B or NaN). Below the date May 5th was dropped.
>>> df_merged = pd.merge(df_A,df_B,how='left',on=['times'],suffixes=['','_B'])
>>> df_merged
times spots spots_B
0 2011-05-01 00:00:00 44 44
1 2011-05-02 00:00:00 55 NaN
2 2011-05-03 00:00:00 66 66
# Here is the important part:
# Since it appears that your data is going to be a time series
# you will need to set your dataframe index to be the times column.
>>> df_merged = df_merged.set_index(['times'])
>>> df_merged
spots spots_B
times
2011-05-01 44 44
2011-05-02 55 NaN
2011-05-03 66 66
# With the times as index we can use the appropriate
# interpolation method for best results
>>> df_merged['spots_B'] = df_merged['spots_B'].interpolate(method='time')
>>> df_merged
spots spots_B
times
2011-05-01 44 44
2011-05-02 55 55
2011-05-03 66 66
Post a Comment for "Pandas Merging/Join Dataframes"