데이터 엔지니어

Data Engineering/Spark

[Spark] Pyspark dataframe 안의 List 처리하는 방법

data = { 'parent': [{ 'id': 'id_1', 'category': 'category_1', }, { 'id': 'id_2', 'category': 'category_2', }] } df = spark.createDataFrame([data]) df.printSchema() df.show(truncate=False) df = df.select(explode(df.parent)) df.printSchema() df.show(truncate=False) root |-- parent: array (nullable = true) | |-- element: map (containsNull = true) | | |-- key: string | | |-- value: string (valueCont..

Data Engineering/Spark

[Spark] TypeError: Can not infer schema for type: <class 'str'> 해결 방법

data = { 'parent': [{ 'id': 'id_1', 'category': 'category_1', }, { 'id': 'id_2', 'category': 'category_2', }] } df = spark.createDataFrame(data) df.printSchema() Fail to execute line 49: df = spark.createDataFrame(data) Traceback (most recent call last): File "/tmp/python16708257068745741506/zeppelin_python.py", line 162, in exec(code, _zcUserQueryNameSpace) File "", line 49, in File "/usr/local..

Data Engineering/Spark

[Spark] Pyspark json List를 처리하는 방법

data = [{ 'id': 'id_1', 'category': 'category_1' }, { 'id': 'id_2', 'category': 'category_2' }] schema = MapType(StringType(), StringType()) df = spark.createDataFrame(data, schema) df.printSchema() df.show(truncate=False) df.withColumn('id', df.value.id).withColumn('category', df.value.category).drop('value').show()

Data Engineering/Spark

[Spark] Pyspark List+Json 확인하는 방법

data = [{ 'id': 'id_1', 'category': 'category_1' }, { 'id': 'id_2', 'category': 'category_2' }] df = spark.createDataFrame(data) df.printSchema() df.show() schema = StructType([ StructField('id', StringType()), StructField('category', StringType()) ]) df = spark.createDataFrame(data, schema) df.printSchema() df.show()

Data Engineering/Spark

[Spark] Pyspark 간단한 StructType 사용하는 방법

data = { 'category': 'category_1', 'id': 'id_1' } df = spark.createDataFrame([data]) df.printSchema() df.show() schema = StructType([ StructField('category', StringType()), StructField('id', StringType()) ]) df = spark.createDataFrame([data], schema) df.printSchema() df.show()

박경태
'분류 전체보기' 카테고리의 글 목록 (38 Page)