class ShuffleDependency[K, V, C] extends Dependency[Product2[K, V]]
Represents a dependency on the output of a shuffle stage. Note that in the case of shuffle, the RDD is transient since we don't need it on the executor side.
- Annotations
- @DeveloperApi()
- Source
- Dependency.scala
- Alphabetic
- By Inheritance
- ShuffleDependency
- Dependency
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
ShuffleDependency(_rdd: RDD[_ <: Product2[K, V]], partitioner: Partitioner, serializer: Serializer = SparkEnv.get.serializer, keyOrdering: Option[Ordering[K]] = None, aggregator: Option[Aggregator[K, V, C]] = None, mapSideCombine: Boolean = false, shuffleWriterProcessor: ShuffleWriteProcessor = new ShuffleWriteProcessor)(implicit arg0: ClassTag[K], arg1: ClassTag[V], arg2: ClassTag[C])
- _rdd
the parent RDD
- partitioner
partitioner used to partition the shuffle output
- serializer
Serializer to use. If not set explicitly then the default serializer, as specified by
spark.serializer
config option, will be used.- keyOrdering
key ordering for RDD's shuffles
- aggregator
map/reduce-side aggregator for RDD's shuffle
- mapSideCombine
whether to perform partial aggregation (also known as map-side combine)
- shuffleWriterProcessor
the processor to control the write behavior in ShuffleMapTask
Value Members
- val aggregator: Option[Aggregator[K, V, C]]
- val keyOrdering: Option[Ordering[K]]
- val mapSideCombine: Boolean
- val partitioner: Partitioner
-
def
rdd: RDD[Product2[K, V]]
- Definition Classes
- ShuffleDependency → Dependency
- val serializer: Serializer
- val shuffleHandle: ShuffleHandle
- val shuffleId: Int
- val shuffleWriterProcessor: ShuffleWriteProcessor