data = {
'parent': [{
'id': 'id_1',
'category': 'category_1',
}, {
'id': 'id_2',
'category': 'category_2',
}]
}
df = spark.createDataFrame([data])
df.printSchema()
df.show(truncate=False)
df = df.select(explode(df.parent))
df.printSchema()
df.show(truncate=False)
root
|-- parent: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
+----------------------------------------------------------------------------+
|parent |
+----------------------------------------------------------------------------+
|[{category -> category_1, id -> id_1}, {category -> category_2, id -> id_2}]|
+----------------------------------------------------------------------------+
root
|-- col: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
+------------------------------------+
|col |
+------------------------------------+
|{category -> category_1, id -> id_1}|
|{category -> category_2, id -> id_2}|
+------------------------------------+
'Data Engineering > Spark' 카테고리의 다른 글
[Spark] Row 함수를 이용해서 Pyspark dataframe 만드는 방법 (0) | 2023.01.14 |
---|---|
[Spark] pandas dataframe을 pyspark dataframe로 변환하는 방법 (0) | 2023.01.14 |
[Spark] TypeError: Can not infer schema for type: <class 'str'> 해결 방법 (0) | 2022.12.16 |
[Spark] Pyspark json List를 처리하는 방법 (0) | 2022.12.16 |
[Spark] Pyspark List+Json 확인하는 방법 (0) | 2022.12.16 |