How you do it with the findspark package without the need to startup a Spark shell with the options to load within a jupyterlab session.

try:
    import findspark
    findspark.init()
except ImportError:
    pass

from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
import os

spark = (
    SparkSession
    .builder
    .master("local[*]")
    .appName("exercises_notebook")
    .config("spark.sql.catalogImplementation","in-memory")
    .config("spark.sql.warehouse.dir", os.getcwd())
    .getOrCreate()
)

I'm Carlo Nicolini — I am interested on the reliability of AI reasoning systems (interpretability, inference-time methods, probabilistic language programming) and on quantitative portfolio optimization (I am a maintainer of skfolio). If you're working on something in these areas and think we might collaborate, chat, discuss, I'm happy to talk about it!

The best way to reach me is on via DM on LinkedIn.

PySpark initialization outside the Pyspark shell

Let's talk!

Further reading

Let's talk!