Pandas Fails With Correct Data Type While Reading A Sas File
Solution 1:
SAS represents all numbers as 64bit (8 byte) floating point numbers. But you can save disk space by telling it to store less than 8 bytes. The dataset you posted did this for CYL and WGT.
When SAS reads the dataset back from disk to use it sets the missing least significant bytes to binary zeros. Apparently read_sas
didn't understand this and instead of setting the missing bytes to binary zeros it did something else. Hence the seemingly random data.
The first value of CYL is 8
which in IEEE floating point would be the hexcode
40 20 00 00 00 00 00 00
The value you displayed of 8.00046
would be this value instead.
40 20 00 06 07 80 FD C1
Solution 2:
Finally solved the issue. Well, that seems definitely pandas' bug. I used directly the .sas7bdat library by typing this(installing):
pip install sas7bdat
Then I run the following code:
import sas7bdat
from sas7bdat import *
file_name = file_path + "cars.sas7bdat"
foo = SAS7BDAT(file_name)
my_df = foo.to_data_frame()
my_df = my_df.head()
print(my_df)
After running the above code, I get the following output in Python:
So, I get the output with correct data types displayed.
Hope pandas developers find out a solutions for the mentioned bug above.
Post a Comment for "Pandas Fails With Correct Data Type While Reading A Sas File"