Code
from pyspark.sql import SparkSession
from pyspark.sql import Row
spark = SparkSession.builder \
.appName("1_test_dataframe") \
.master('spark://spark-master:17077') \
.getOrCreate()
sc = spark.sparkContext
data = [Row(id = 0, name = 'a', age = 12, type = 'A', score = 90, year = 2012),
Row(id = 1, name = 'a', age = 15, type = 'B', score = 80, year = 2013),
Row(id = 2, name = 'b', age = 15, type = 'B', score = 80, year = 2014),
Row(id = 3, name = 'b', age = 21, type = 'F', score = 50, year = 2015),
Row(id = 4, name = 'c', age = 15, type = 'C', score = 70, year = 2016),
Row(id = 5, name = 'c', age = 33, type = 'F', score = 50, year = 2017)]
spark_df = sc.parallelize(data).toDF()
spark_df.createOrReplaceTempView("my_table")
spark_sql = spark.sql("SELECT * FROM my_table")
spark_sql.show()
spark.stop()
Result
+---+----+---+----+-----+----+
| id|name|age|type|score|year|
+---+----+---+----+-----+----+
| 0| a| 12| A| 90|2012|
| 1| a| 15| B| 80|2013|
| 2| b| 15| B| 80|2014|
| 3| b| 21| F| 50|2015|
| 4| c| 15| C| 70|2016|
| 5| c| 33| F| 50|2017|
+---+----+---+----+-----+----+
'Data Engineering > Spark' 카테고리의 다른 글
[Spark] To set up memory in a spark session (0) | 2022.06.04 |
---|---|
[Spark] How to use the Global Temporary View (0) | 2022.06.04 |
[Spark] Compare user settings in pyspark code (0) | 2022.06.04 |
[Spark] To output the default settings for a Spark Session (0) | 2022.06.04 |
[Spark] To output spark settings from pyspark code (0) | 2022.06.04 |