![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FSkKgQ%2FbtrDYd9c7NG%2F9kbTmNJcleG3BUHDHKbbKK%2Fimg.png)
Data Engineering
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb50KS0%2FbtrDZvVNAmC%2FCL56KVAQJg22AVXcDpuPk0%2Fimg.png)
[Spark] spark standalone cluster + zookeeper cluster 로 고가용성 확보하기
docker-compose.yml version: '2.1' services: zookeeper-1: hostname: zookeeper-1 container_name: zookeeper-1 image: zookeeper:3.6 restart: always ports: - 2181:2181 environment: ZOO_MY_ID: 1 ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181 volumes: - type: bind source: ./zk-cluster/zookeeper-1/data target: /data read_only: fal..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FY8kRY%2FbtrD2IGDwXw%2FkZB4IdRlSZaFtaivwFJd70%2Fimg.png)
[Zookeeper] 도커로 주키퍼 클러스터 만드는 방법
version: '2.1' services: zookeeper-1: hostname: zookeeper-1 container_name: zookeeper-1 image: zookeeper:3.6 ports: - 2181:2181 environment: ZOO_MY_ID: 1 ZOO_SERVERS: server.1=zookeeper-1:2888:3888;2181 server.2=zookeeper-2:2888:3888;2181 server.3=zookeeper-3:2888:3888;2181 volumes: - type: bind source: ./zk-cluster/zookeeper-1/data target: /data read_only: false zookeeper-2: hostname: zookeeper..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Ftjt1s%2FbtrDWhRe78Y%2FEJQzj8u7En6MBLt1sL0urK%2Fimg.png)
[Spark] To set up memory in a spark session
Code from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('3_test_sparksession') \ .master('spark://spark-master:17077') \ .config('spark.driver.cores', '1') \ .config('spark.driver.memory','1g') \ .config('spark.executor.memory', '1g') \ .config('spark.executor.cores', '2') \ .config('spark.cores.max', '2') \ .getOrCreate() sc = spark.sparkContext for setting in sc._conf..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc9HYK8%2FbtrDXovkkjw%2FTeYQD9mu3k7xN4vJLOEbY1%2Fimg.png)
[Spark] How to use the Global Temporary View
Code from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder \ .appName("1_test_dataframe") \ .master('spark://spark-master:17077') \ .getOrCreate() sc = spark.sparkContext data = [Row(id = 0, name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(id = 1, name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(id = 2, name = 'b', age = 15, t..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbPdqFH%2FbtrDYLDBm0e%2FOIYMlreScUW3wqLnRrUg8k%2Fimg.png)
[Spark] Basic way to use spark data frames with sql query
Code from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder \ .appName("1_test_dataframe") \ .master('spark://spark-master:17077') \ .getOrCreate() sc = spark.sparkContext data = [Row(id = 0, name = 'a', age = 12, type = 'A', score = 90, year = 2012), Row(id = 1, name = 'a', age = 15, type = 'B', score = 80, year = 2013), Row(id = 2, name = 'b', age = 15, t..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbaqW4C%2FbtrD0rq5voq%2FngzAk02Wy00xmKb7MCXIi0%2Fimg.png)
[Spark] Compare user settings in pyspark code
Code from pyspark.conf import SparkConf from pyspark.context import SparkContext conf = SparkConf().setAll([('spark.app.name', '2_test_sparkconf'), ('spark.master', 'spark://spark-master:17077')]) sc = SparkContext(conf = conf) print('first') for setting in sc._conf.getAll(): print(setting) sc.stop() conf = SparkConf().setAll([('spark.app.name', '2_test_sparkconf'), ('spark.master', 'spark://spa..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FB98YK%2FbtrDWmEkuDZ%2FKky2I5D4Zs6hdeBhGEshtk%2Fimg.png)
[Spark] To output the default settings for a Spark Session
Code from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('3_test_sparksession') \ .master('spark://spark-master:17077') \ .getOrCreate() sc = spark.sparkContext for setting in sc._conf.getAll(): print(setting) sc.stop() Result ('spark.driver.port', '39007') ('spark.master', 'spark://spark-master:17077') ('spark.sql.warehouse.dir', 'file:/home/spark/dev/spark-warehouse') ..
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbzvwrZ%2FbtrDYfx7qA5%2FYD5LlTtSSqViM332mWk760%2Fimg.png)
[Spark] To output spark settings from pyspark code
Code from pyspark.conf import SparkConf from pyspark.context import SparkContext conf = SparkConf().setAll([('spark.app.name', '2_test_sparksession'), ('spark.master', 'spark://spark-master:17077'), ('spark.driver.cores', '1'), ('spark.driver.memory','1g'), ('spark.executor.memory', '1g'), ('spark.executor.cores', '2'), ('spark.cores.max', '2')]) sc = SparkContext(conf = conf) for setting in sc...
![](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fk9X5B%2FbtrDXb97Fz9%2FVDTX9kbef8FoPaKGeuI2f1%2Fimg.png)
[Spark] How to adjust spark memory in pyspark code
Code from pyspark.conf import SparkConf from pyspark.context import SparkContext conf = SparkConf().setAll([('spark.app.name', '2_test_sparksession'), ('spark.master', 'spark://spark-master:17077'), ('spark.driver.cores', '1'), ('spark.driver.memory','1g'), ('spark.executor.memory', '1g'), ('spark.executor.cores', '1'), ('spark.cores.max', '2')]) sc = SparkContext(conf = conf) sc.stop() Result C..