Skip to content Skip to sidebar Skip to footer

Spark 2.2 Thrift Server Error On Dataframe Numberformatexception When Query Hive Table

I have Hortonworks HDP 2.6.3 running Spark2 (v2.2). My test case is very simple: Create a Hive table with some random values. Hive at port 10000 Turn on Spark Thrift server at 100

Solution 1:

Even though you are creating the hive table with specific datatype, The underlying data in the table when you inserted is stored as String format.

So when the spark is trying to read the data, it uses the metastore to find the data types. It is present as int in the hive metastore and as string in the file and it is throwing the cast exception.

Solutions

Create the table as string and read from spark it will work.

createtable test1 (id String, desc String);

If you want data type preserved, then specify the one of the file formats such as orc or parquet which creating the table and then insert it. You can able to read the file from spark without exceptions

createtable test1 (id int, descvarchar(40) STORED AS ORC);

Now question is wwhy hive able to read it? Hive has good cast options avialable while spark doesn't.

Solution 2:

Just solved this after looking up on the internet for hours!!!!(No internet didn't help)

Since I am still a noob in Scala/Hive, I can't provide a good explanation. But, I solved it by adding some external jars(Hive JDBC) files provided by AWS. And I also changed in the driver option to "com.amazon.hive.jdbc41.HS2Driver".

Go to the following link to download the drivers and also see the sample code.

http://awssupportdatasvcs.com/bootstrap-actions/Simba/latest/

Post a Comment for "Spark 2.2 Thrift Server Error On Dataframe Numberformatexception When Query Hive Table"