1. Ordinary file output method
Method 1: Given the type and address of the output data source
("json").save(path) ("csv").save(path) ("parquet").save(path)
Method 2: Directly call the method corresponding to the data source type
(path) (path) (path)
append: append mode, continue to append when data exists
overwrite: Overwrite mode, when data exists, overwrite the previous data and store the current latest data;
error/errorifexists: If the target exists, an error will be reported, the default mode
ignore: Ignore, do nothing when the data exists
Code writing template:
(saveMode="append").format("csv").save(path)
Code demonstration of ordinary file output format:
import os from import SparkSession if __name__ == '__main__': # Configure the environment ['JAVA_HOME'] = 'C:/Program Files/Java/jdk1.8.0_241' # The path to configure Hadoop is the path that was decompressed in the previous ['HADOOP_HOME'] = 'D:/hadoop-3.3.1' # Configure the path to the Python parser in the base environment ['PYSPARK_PYTHON'] = 'C:/ProgramData/Miniconda3/' # Configure the path to the Python parser in the base environment ['PYSPARK_DRIVER_PYTHON'] = 'C:/ProgramData/Miniconda3/' spark = ("local[2]").appName("").config( "", 2).getOrCreate() df = ("../../datas/") # Get the name of the oldest person ("persons") rsDf = (""" select name,age from persons where age = (select max(age) from persons) """) # Print the result to the console #("console").save() #("../../datas/result",mode="overwrite") #(saveMode='overwrite').format("json").save("../../datas/result") #(saveMode='overwrite').format("csv").save("../../datas/result1") #(saveMode='overwrite').format("parquet").save("../../datas/result2") #(saveMode='append').format("csv").save("../../datas/result1") # text Save path as hdfs directly reported an error, not supported #(saveMode='overwrite').text("hdfs://bigdata01:9820/result") #("hdfs://bigdata01:9820/result",mode="overwrite") ("hdfs://bigdata01:9820/result", mode="overwrite") ()
2. Save to the database
Code demo:
import os # Import pyspark modulefrom pyspark import SparkContext, SparkConf from import SparkSession if __name__ == '__main__': # Configure the environment ['JAVA_HOME'] = 'D:\Download\Java\JDK' # The path to configure Hadoop is the path that was decompressed in the previous ['HADOOP_HOME'] = 'D:\\bigdata\hadoop-3.3.1\hadoop-3.3.1' # Configure the path to the Python parser in the base environment ['PYSPARK_PYTHON'] = 'C:/ProgramData/Miniconda3/' # Configure the path to the Python parser in the base environment ['PYSPARK_DRIVER_PYTHON'] = 'C:/ProgramData/Miniconda3/' spark = ('local[*]').appName('').config("", 2).getOrCreate() df5 = ("csv").option("sep", "\t").load("../../datas/zuoye/")\ .toDF('eid','ename','salary','sal','dept_id') ('emp') rsDf = ("select * from emp") ("jdbc") \ .option("driver", "") \ .option("url", "jdbc:mysql://bigdata01:3306/mysql") \ .option("user", "root") \ .option("password", "123456") \ .option("dbtable", "emp1") \ .save(mode="overwrite") () # After use,Remember to close
3. Save to hive
Code demonstration:
import os # Import pyspark modulefrom pyspark import SparkContext, SparkConf from import SparkSession if __name__ == '__main__': # Configure the environment ['JAVA_HOME'] = 'D:\Download\Java\JDK' # The path to configure Hadoop is the path that was decompressed in the previous ['HADOOP_HOME'] = 'D:\\bigdata\hadoop-3.3.1\hadoop-3.3.1' # Configure the path to the Python parser in the base environment ['PYSPARK_PYTHON'] = 'C:/ProgramData/Miniconda3/' # Configure the path to the Python parser in the base environment ['PYSPARK_DRIVER_PYTHON'] = 'C:/ProgramData/Miniconda3/' ['HADOOP_USER_NAME'] = 'root' spark = SparkSession \ .builder \ .appName("HiveAPP") \ .master("local[2]") \ .config("", 'hdfs://bigdata01:9820/user/hive/warehouse') \ .config('', 'thrift://bigdata01:9083') \ .config("", 2) \ .enableHiveSupport() \ .getOrCreate() df5 = ("csv").option("sep", "\t").load("../../datas/zuoye/") \ .toDF('eid', 'ename', 'salary', 'sal', 'dept_id') ('emp') rsDf = ("select * from emp") ("") () # After use,Remember to close
This is the end of this article about the way SparkSql output data. For more related content on SparkSql output data, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!