Skip to content Skip to sidebar Skip to footer

Transforming A Cassandra Orderedmapserializedkey To A Python Dictionary

I have a column in Cassandra composed of a map of lists which when queried with the Python driver it returns an OrderedMapSerializedKey structure. This structure is a map of lists.

Solution 1:

I think an ultimate solution could be to store OrderedMapSerializedKey Cassandra structure as a dict in your dataframe column then you could transfer this value / column to anyone you want. Ultimate because you may not know the actual keys in Cassandra rows (maybe different keys are inserted into rows).

So here the solution I've tested, you only have to improve the pandas_factory funciton:


EDIT:

In previous solution I replaced only the first (0th) row of Cassandra dataset (rows are list of tuples where every tuple is a row in Cassandra)

from cassandra.util import OrderedMapSerializedKey

def pandas_factory(colnames, rows):

    # Convert tuple items of'rows'into list (elements of tuples cannot be replaced)
    rows= [list(i) for i inrows]

    # Convertonly'OrderedMapSerializedKey' type list elements into dict
    for idx_row, i_row in enumerate(rows):

        for idx_value, i_value in enumerate(i_row):

            if type(i_value) is OrderedMapSerializedKey:

                rows[idx_row][idx_value] = dict(rows[idx_row][idx_value])

    return pd.DataFrame(rows, columns=colnames)

You have to insert some automatic check whether there is minimum one value before / after the Cassandra map field or manually modify above script accordingly.

Nice day!

Solution 2:

Following strategy aims at separating stages of data conversion and pandas ingestion.

To obtain a list of dictionaries from a cassandra request, you have to use a specific row_factory :

from cassandra.query import (
    dict_factory,
    SimpleStatement
    )

from cassandra.cluster import (
    Cluster,
    ExecutionProfile,
    EXEC_PROFILE_DEFAULT
    )

profile = ExecutionProfile(
    row_factory=dict_factory
    )

hosts = ["127.0.0.1"]
port = 9042

cluster = Cluster(
    hosts,
    port=port,
    execution_profiles={EXEC_PROFILE_DEFAULT: profile}
    )

Then get the data using that cluster :

src_keyspace = "your_keyspace"
src_tbl = "your_table"
N_ROWS = 100with cluster.connect(src_keyspace) as cass_session:

    res = cass_session.execute(
        SimpleStatement("SELECT * FROM {} LIMIT {}".format(src_tbl,
                                                           N_ROWS))
        )

Then, convert the remaining OrderedMapSerializedKey to dict :

    rows_as_dict = [
        { key: (val ifnotisinstance(val, OrderedMapSerializedKey)
                elsedict(val)) for key, val in row.items() }
                    for row in res.current_rows
                    ]

Then simply use pandas.DataFrame.from_dict

Solution 3:

It can be cast with the built-in function dict() in Python

Post a Comment for "Transforming A Cassandra Orderedmapserializedkey To A Python Dictionary"