![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FZN8a7%2FbtrUmDwBnEK%2Ftlp8RczIK4aJR1nGfXhgv0%2Fimg.jpg)
데이터 엔지니어
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FCEMzl%2FbtrTP339RcC%2FubKDXuz9uJIvB5ViK6Fj00%2Fimg.png)
[Spark] Pyspark dataframe 안의 List 처리하는 방법
data = { 'parent': [{ 'id': 'id_1', 'category': 'category_1', }, { 'id': 'id_2', 'category': 'category_2', }] } df = spark.createDataFrame([data]) df.printSchema() df.show(truncate=False) df = df.select(explode(df.parent)) df.printSchema() df.show(truncate=False) root |-- parent: array (nullable = true) | |-- element: map (containsNull = true) | | |-- key: string | | |-- value: string (valueCont..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcdna6B%2FbtrTPzP3JfT%2F0tAts8nY44dkN83wp1zMVk%2Fimg.png)
[Spark] TypeError: Can not infer schema for type: <class 'str'> 해결 방법
data = { 'parent': [{ 'id': 'id_1', 'category': 'category_1', }, { 'id': 'id_2', 'category': 'category_2', }] } df = spark.createDataFrame(data) df.printSchema() Fail to execute line 49: df = spark.createDataFrame(data) Traceback (most recent call last): File "/tmp/python16708257068745741506/zeppelin_python.py", line 162, in exec(code, _zcUserQueryNameSpace) File "", line 49, in File "/usr/local..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FkXZbJ%2FbtrTRHld7hp%2Fkk39FKZIukus47xqGh0u8K%2Fimg.png)
[Spark] Pyspark json List를 처리하는 방법
data = [{ 'id': 'id_1', 'category': 'category_1' }, { 'id': 'id_2', 'category': 'category_2' }] schema = MapType(StringType(), StringType()) df = spark.createDataFrame(data, schema) df.printSchema() df.show(truncate=False) df.withColumn('id', df.value.id).withColumn('category', df.value.category).drop('value').show()
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb3FNCm%2FbtrTRG0Ue4E%2FJL1HDyASvxQkT4YCpY49gk%2Fimg.png)
[Spark] Pyspark List+Json 확인하는 방법
data = [{ 'id': 'id_1', 'category': 'category_1' }, { 'id': 'id_2', 'category': 'category_2' }] df = spark.createDataFrame(data) df.printSchema() df.show() schema = StructType([ StructField('id', StringType()), StructField('category', StringType()) ]) df = spark.createDataFrame(data, schema) df.printSchema() df.show()
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd99qYb%2FbtrTRp59z0z%2FktWipVqek10GnE84DR70h0%2Fimg.png)
[Spark] Pyspark 간단한 StructType 사용하는 방법
data = { 'category': 'category_1', 'id': 'id_1' } df = spark.createDataFrame([data]) df.printSchema() df.show() schema = StructType([ StructField('category', StringType()), StructField('id', StringType()) ]) df = spark.createDataFrame([data], schema) df.printSchema() df.show()