Data Engineering/Spark

Data Engineering/Spark

[Spark] How to adjust spark memory in pyspark code

Code from pyspark.conf import SparkConf from pyspark.context import SparkContext conf = SparkConf().setAll([('spark.app.name', '2_test_sparksession'), ('spark.master', 'spark://spark-master:17077'), ('spark.driver.cores', '1'), ('spark.driver.memory','1g'), ('spark.executor.memory', '1g'), ('spark.executor.cores', '1'), ('spark.cores.max', '2')]) sc = SparkContext(conf = conf) sc.stop() Result C..

Data Engineering/Spark

[Spark] pyspark 코드에서 spark master host 변경하는 방법

변경 전 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("2_test_sparksession") \ .getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) spark.stop() 결과 변경 후 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("2_test_sparksession") \ .master('spark://spark-master:17077') \ .getOrCreate() sc = spark.sparkContext print(sc...

Data Engineering/Spark

[Spark] pyspark 코드에서 어플리케이션 이름 변경하는 방법

변경전 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) spark.stop() 결과 변경 후 코드 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("2_test_sparksession") \ .getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) spark.stop() 결과

Data Engineering/Spark

[Spark] 스파크 데이터프레임 첫번째 행만 출력하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.functions import max, avg, sum, min spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', sc..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 groupBy() 동시에 컬럼명 수정하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.functions import max, avg, sum, min spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', sc..

Data Engineering/Spark

[Spark] 스파크 데이터프레임을 agg() 이용해서 groupBy() 하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.functions import max, avg, sum, min spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', sc..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 groupBy 사용하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 필터를 두번 적용하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임에 필터를 적용해 원하는 행을 추출하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

Data Engineering/Spark

[Spark] 스파크 데이터프레임 원하는 컬럼 출력하는 방법

코드 from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession\ .builder\ .appName("1_test_dataframe")\ .getOrCreate() sc = spark.sparkContext data = [Row(name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(name = 'b', age = 15, type = 'B', score = 80, year = 2014), Row(name = 'b', age = 21, typ..

박경태
'Data Engineering/Spark' 카테고리의 글 목록 (5 Page)