Skip to content Skip to sidebar Skip to footer

Using Pd.read_sql() To Extract Large Data (>5 Million Records) From Oracle Database, Making The Sql Execution Very Slow

Initially tried using pd.read_sql(). Then I tried using sqlalchemy, query objects but none of these methods are useful as the sql getting executed for long time and it never ends

Solution 1:

And yet another possibility to adjust the array size without needing to create oraaccess.xml as suggested by Chris. This may not work with the rest of your code as is, but it should give you an idea of how to proceed if you wish to try this approach!

classConnection(cx_Oracle.Connection):def__init__(self):
        super(Connection, self).__init__("user/pw@dsn")

    defcursor(self):
        c = super(Connection, self).cursor()
        c.arraysize = 5000return c

engine = sqlalchemy.create_engine(creator=Connection)
pandas.read_sql(sql, engine)

Solution 2:

Here's another alternative to experiment with.

Set a prefetch size by using the external configuration available to Oracle Call Interface programs like cx_Oracle. This overrides internal settings used by OCI programs. Create an oraaccess.xml file:

<?xml version="1.0"?><oraaccessxmlns="http://xmlns.oracle.com/oci/oraaccess"xmlns:oci="http://xmlns.oracle.com/oci/oraaccess"schemaLocation="http://xmlns.oracle.com/oci/oraaccess
  http://xmlns.oracle.com/oci/oraaccess.xsd"><default_parameters><prefetch><rows>1000</rows></prefetch></default_parameters></oraaccess>

If you use tnsnames.ora or sqlnet.ora for cx_Oracle, then put the oraaccess.xml file in the same directory. Otherwise, create a new directory and set the environment variable TNS_ADMIN to that directory name.

cx_Oracle needs to be using Oracle Client 12c, or later, libraries.

Experiment with different sizes.

See OCI Client-Side Deployment Parameters Using oraaccess.xml.

Post a Comment for "Using Pd.read_sql() To Extract Large Data (>5 Million Records) From Oracle Database, Making The Sql Execution Very Slow"