How you do it with the findspark package without the need to startup a Spark shell with the options to load within a jupyterlab session.
try:
import findspark
findspark.init()
except ImportError:
pass
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
import os
spark = (
SparkSession
.builder
.master("local[*]")
.appName("exercises_notebook")
.config("spark.sql.catalogImplementation","in-memory")
.config("spark.sql.warehouse.dir", os.getcwd())
.getOrCreate()
)
Further reading
Read more in the tech topic.
Let's talk!
I'm Carlo Nicolini — I am interested on the reliability of AI reasoning systems (interpretability, inference-time methods, probabilistic language programming) and on quantitative portfolio optimization (I am a maintainer of skfolio). If you're working on something in these areas and think we might collaborate, chat, discuss, I'm happy to talk about it!
The best way to reach me is on via DM on LinkedIn.