Transforming A Cassandra Orderedmapserializedkey To A Python Dictionary
Solution 1:
I think an ultimate solution could be to store OrderedMapSerializedKey
Cassandra structure as a dict
in your dataframe column then you could transfer this value / column to anyone you want. Ultimate because you may not know the actual keys in Cassandra rows (maybe different keys are inserted into rows).
So here the solution I've tested, you only have to improve the pandas_factory
funciton:
EDIT:
In previous solution I replaced only the first (0th) row of Cassandra dataset (rows
are list of tuples where every tuple is a row in Cassandra)
from cassandra.util import OrderedMapSerializedKey
def pandas_factory(colnames, rows):
# Convert tuple items of'rows'into list (elements of tuples cannot be replaced)
rows= [list(i) for i inrows]
# Convertonly'OrderedMapSerializedKey' type list elements into dict
for idx_row, i_row in enumerate(rows):
for idx_value, i_value in enumerate(i_row):
if type(i_value) is OrderedMapSerializedKey:
rows[idx_row][idx_value] = dict(rows[idx_row][idx_value])
return pd.DataFrame(rows, columns=colnames)
You have to insert some automatic check whether there is minimum one value before / after the Cassandra map field or manually modify above script accordingly.
Nice day!
Solution 2:
Following strategy aims at separating stages of data conversion and pandas ingestion.
To obtain a list of dictionaries from a cassandra request, you have to use a specific row_factory
:
from cassandra.query import (
dict_factory,
SimpleStatement
)
from cassandra.cluster import (
Cluster,
ExecutionProfile,
EXEC_PROFILE_DEFAULT
)
profile = ExecutionProfile(
row_factory=dict_factory
)
hosts = ["127.0.0.1"]
port = 9042
cluster = Cluster(
hosts,
port=port,
execution_profiles={EXEC_PROFILE_DEFAULT: profile}
)
Then get the data using that cluster :
src_keyspace = "your_keyspace"
src_tbl = "your_table"
N_ROWS = 100with cluster.connect(src_keyspace) as cass_session:
res = cass_session.execute(
SimpleStatement("SELECT * FROM {} LIMIT {}".format(src_tbl,
N_ROWS))
)
Then, convert the remaining OrderedMapSerializedKey
to dict
:
rows_as_dict = [
{ key: (val ifnotisinstance(val, OrderedMapSerializedKey)
elsedict(val)) for key, val in row.items() }
for row in res.current_rows
]
Then simply use pandas.DataFrame.from_dict
Solution 3:
It can be cast with the built-in function dict() in Python
Post a Comment for "Transforming A Cassandra Orderedmapserializedkey To A Python Dictionary"