
  • package root
    Definition Classes
  • package org
    Definition Classes
  • package apache
    Definition Classes
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
  • package mllib

    RDD-based machine learning APIs (in maintenance mode).

    RDD-based machine learning APIs (in maintenance mode).

    The spark.mllib package is in maintenance mode as of the Spark 2.0.0 release to encourage migration to the DataFrame-based APIs under the package. While in maintenance mode,

    • no new features in the RDD-based spark.mllib package will be accepted, unless they block implementing new features in the DataFrame-based package;
    • bug fixes in the RDD-based APIs will still be accepted.

    The developers will continue adding more features to the DataFrame-based APIs in the 2.x series to reach feature parity with the RDD-based APIs. And once we reach feature parity, this package will be deprecated.

    Definition Classes
    See also

    SPARK-4591 to track the progress of feature parity

  • package recommendation
    Definition Classes
  • ALS
  • MatrixFactorizationModel
  • Rating

class MatrixFactorizationModel extends Saveable with Serializable with Logging

Model representing the result of matrix factorization.


If you create the model directly using constructor, please be aware that fast prediction requires cached user/product features and their associated partitioners.

Linear Supertypes
Logging, Serializable, Saveable, AnyRef, Any
  1. Alphabetic
  2. By Inheritance
  1. MatrixFactorizationModel
  2. Logging
  3. Serializable
  4. Saveable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
  1. Public
  2. Protected

Instance Constructors

  1. new MatrixFactorizationModel(rank: Int, userFeatures: RDD[(Int, Array[Double])], productFeatures: RDD[(Int, Array[Double])])


    Rank for the features in this model.


    RDD of tuples where each tuple represents the userId and the features computed for this user.


    RDD of tuples where each tuple represents the productId and the features computed for this product.


Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes

Value Members

  1. def predict(usersProducts: JavaPairRDD[Integer, Integer]): JavaRDD[Rating]

    Java-friendly version of MatrixFactorizationModel.predict.

    Java-friendly version of MatrixFactorizationModel.predict.

  2. def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating]

    Predict the rating of many users for many products.

    Predict the rating of many users for many products. The output RDD has an element per each element in the input RDD (including all duplicates) unless a user or product is missing in the training set.


    RDD of (user, product) pairs.


    RDD of Ratings.

  3. def predict(user: Int, product: Int): Double

    Predict the rating of one user for one product.

    Predict the rating of one user for one product.

  4. val productFeatures: RDD[(Int, Array[Double])]
  5. val rank: Int
  6. def recommendProducts(user: Int, num: Int): Array[Rating]

    Recommends products to a user.

    Recommends products to a user.


    the user to recommend products to


    how many products to return. The number returned may be less than this.


    Rating objects, each of which contains the given user ID, a product ID, and a "score" in the rating field. Each represents one recommended product, and they are sorted by score, decreasing. The first returned is the one predicted to be most strongly recommended to the user. The score is an opaque value that indicates how strongly recommended the product is.

  7. def recommendProductsForUsers(num: Int): RDD[(Int, Array[Rating])]

    Recommends top products for all users.

    Recommends top products for all users.


    how many products to return for every user.


    [(Int, Array[Rating])] objects, where every tuple contains a userID and an array of rating objects which contains the same userId, recommended productID and a "score" in the rating field. Semantics of score is same as recommendProducts API

  8. def recommendUsers(product: Int, num: Int): Array[Rating]

    Recommends users to a product.

    Recommends users to a product. That is, this returns users who are most likely to be interested in a product.


    the product to recommend users to


    how many users to return. The number returned may be less than this.


    Rating objects, each of which contains a user ID, the given product ID, and a "score" in the rating field. Each represents one recommended user, and they are sorted by score, decreasing. The first returned is the one predicted to be most strongly recommended to the product. The score is an opaque value that indicates how strongly recommended the user is.

  9. def recommendUsersForProducts(num: Int): RDD[(Int, Array[Rating])]

    Recommends top users for all products.

    Recommends top users for all products.


    how many users to return for every product.


    [(Int, Array[Rating])] objects, where every tuple contains a productID and an array of rating objects which contains the recommended userId, same productID and a "score" in the rating field. Semantics of score is same as recommendUsers API

  10. def save(sc: SparkContext, path: String): Unit

    Save this model to the given path.

    Save this model to the given path.

    This saves:

    • human-readable (JSON) model metadata to path/metadata/
    • Parquet formatted data to path/data/

    The model may be loaded using Loader.load.


    Spark context used to save model data.


    Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.

    Definition Classes
  11. val userFeatures: RDD[(Int, Array[Double])]