code
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('0_test_rdd') \
.master('spark://spark-master-1:7077,spark-master-2:7077') \
.config('spark.driver.cores', '2') \
.config('spark.driver.memory','2g') \
.config('spark.executor.memory', '2g') \
.config('spark.executor.cores', '2') \
.config('spark.cores.max', '8') \
.getOrCreate()
sc = spark.sparkContext
line_1 = 'i love you'
line_2 = 'you are my friend'
line_3 = 'my name is park'
lines_upper = sc.parallelize([line_1.upper(),
line_2.upper(),
line_3.upper()])
lines_lower = sc.parallelize([line_1.lower(),
line_2.lower(),
line_3.lower()])
print()
print(lines_upper)
print(lines_upper.collect())
print()
print(lines_lower)
print(lines_lower.collect())
print()
sc.stop()
result
ParallelCollectionRDD[0] at readRDDFromFile at PythonRDD.scala:274
['I LOVE YOU', 'YOU ARE MY FRIEND', 'MY NAME IS PARK']
ParallelCollectionRDD[1] at readRDDFromFile at PythonRDD.scala:274
['i love you', 'you are my friend', 'my name is park']
'Data Engineering > Spark' 카테고리의 다른 글
[Spark] pyspark RDD parallelize(), flatMap(), filter() (0) | 2022.06.05 |
---|---|
[Spark] pyspark RDD parallelize(), map(), flatMap() (0) | 2022.06.05 |
[Spark] spark cluster + zookeeper 고가용성 테스트 (0) | 2022.06.05 |
[Spark] spark standalone cluster + zookeeper cluster 로 고가용성 확보하기 (0) | 2022.06.05 |
[Zookeeper] 도커로 주키퍼 클러스터 만드는 방법 (0) | 2022.06.05 |