data = {
'parent': [{
'id': 'id_1',
'category': 'category_1',
}, {
'id': 'id_2',
'category': 'category_2',
}]
}
df = spark.createDataFrame(data)
df.printSchema()
Fail to execute line 49: df = spark.createDataFrame(data)
Traceback (most recent call last):
File "/tmp/python16708257068745741506/zeppelin_python.py", line 162, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 49, in <module>
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 675, in createDataFrame
return self._create_dataframe(data, schema, samplingRatio, verifySchema)
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 700, in _create_dataframe
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 512, in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 439, in _inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in data))
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 439, in <genexpr>
schema = reduce(_merge_type, (_infer_schema(row, names) for row in data))
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/types.py", line 1067, in _infer_schema
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'str'>
해결
data = {
'parent': [{
'id': 'id_1',
'category': 'category_1',
}, {
'id': 'id_2',
'category': 'category_2',
}]
}
df = spark.createDataFrame([data])
df.printSchema()
df.show(truncate=False)
root
|-- parent: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
+----------------------------------------------------------------------------+
|parent |
+----------------------------------------------------------------------------+
|[{category -> category_1, id -> id_1}, {category -> category_2, id -> id_2}]|
+----------------------------------------------------------------------------+
'Data Engineering > Spark' 카테고리의 다른 글
[Spark] pandas dataframe을 pyspark dataframe로 변환하는 방법 (0) | 2023.01.14 |
---|---|
[Spark] Pyspark dataframe 안의 List 처리하는 방법 (0) | 2022.12.16 |
[Spark] Pyspark json List를 처리하는 방법 (0) | 2022.12.16 |
[Spark] Pyspark List+Json 확인하는 방법 (0) | 2022.12.16 |
[Spark] Pyspark 간단한 StructType 사용하는 방법 (0) | 2022.12.16 |