Tagged "pyspark"

Connect Spark to Oracle using JDBC and pyspark

1. Drivers To use the Java Database Connectivity (JDBC) API we’ll need to provide our application with the correct drivers. The drivers can be found on this Oracle repository. You’ll need to know the version of the database you want to connect to. Try using this SQL statement: SELECT * FROM V$VERSION To provide the downloaded drivers to your application, add this when evoking it: --driver-class-path oracle/ojdbc8.jar 2. Reading using JDBC To fill a Spark dataFrame directly from a database using JDBC we will need a dataFrameReader object: