Data Engineering

Data Engineering/Spark

[Spark] pyspark 코드에서 spark master host 변경하는 방법

변경 전 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("2_test_sparksession") \ .getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) spark.stop() 결과 변경 후 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("2_test_sparksession") \ .master('spark://spark-master:17077') \ .getOrCreate() sc = spark.sparkContext print(sc...

Data Engineering/Spark

[Spark] pyspark 코드에서 어플리케이션 이름 변경하는 방법

변경전 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) spark.stop() 결과 변경 후 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("2_test_sparksession") \ .getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) spark.stop() 결과

Data Engineering/Spark

[Spark] 스파크 데이터프레임 첫번째 행만 출력하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.functions import max, avg, sum, min spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', sc..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 groupBy() 동시에 컬럼명 수정하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.functions import max, avg, sum, min spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', sc..

Data Engineering/Spark

[Spark] 스파크 데이터프레임을 agg() 이용해서 groupBy() 하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.functions import max, avg, sum, min spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', sc..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 groupBy 사용하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 필터를 두번 적용하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임에 필터를 적용해 원하는 행을 추출하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 원하는 컬럼 출력하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 전체 데이터 출력하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

박경태
'Data Engineering' 카테고리의 글 목록 (15 Page)