Data Engineering/Spark

Data Engineering/Spark

[Spark] Docker, failed: port is already allocated

Creating zookeeper-navigator ... done Creating spark-master-1 ... done Creating spark-master-2 ... done WARNING: The "spark-slave" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash. Creating zeppelin ... Creating 3_spark-cluster-zookeeper_spark-slave_1 ... Creating 3_spark-cluster-zookeeper_spark-slave_1 ... error Crea..

Data Engineering/Spark

[Spark] pyspark RDD parallelize(number) union() map()

from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext data_1 = list(ra..

Data Engineering/Spark

[Spark] pyspark RDD parallelize() number and union()

from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext data_1 = list(ra..

Data Engineering/Spark

[Spark] pyspark RDD parallelize() number

code from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext data_1 = li..

Data Engineering/Spark

[Spark] pyspark RDD count(), collect()

from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext line_1 = 'i love..

Data Engineering/Spark

[Spark] pyspark RDD parallelize(), flatMap(), filter()

from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext line_1 = 'i love..

Data Engineering/Spark

[Spark] pyspark RDD parallelize(), map(), flatMap()

code from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext line_1 = 'i..

Data Engineering/Spark

[Spark] pyspark RDD upper(), lower()

code from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('0_test_rdd') \ .master('spark://spark-master-1:7077,spark-master-2:7077') \ .config('spark.driver.cores', '2') \ .config('spark.driver.memory','2g') \ .config('spark.executor.memory', '2g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '8') \ .getOrCreate() sc = spark.sparkContext line_1 = 'i..

Data Engineering/Spark

[Spark] spark standalone cluster + zookeeper cluster 로 고가용성 확보하기

docker-compose.yml version: '2.1' services: zookeeper-1: hostname: zookeeper-1 container_name: zookeeper-1 image: zookeeper:3.6 restart: always ports: - 2181:2181 environment: ZOO_MY_ID: 1 ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181 volumes: - type: bind source: ./zk-cluster/zookeeper-1/data target: /data read_only: fal..

박경태
'Data Engineering/Spark' 카테고리의 글 목록 (4 Page)