Spark
Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including
Spark SQLfor SQL and DataFrames,pandas API on Sparkfor pandas workloads,MLlibfor machine learning,GraphXfor graph processing, andStructured Streamingfor stream processing.
Document loaders
PySpark
It loads data from a PySpark DataFrame.
See a usage example.
from langchain_community.document_loaders import PySparkDataFrameLoader
API Reference:PySparkDataFrameLoader