T - the type of the materialized recordspublic class ParquetInputFormat<T> extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
ReadSupport to materialize the records.
The requestedSchema will control how the original records get projected by the loader.
It must be a subset of the original schema. Only the columns needed to reconstruct the records with the requestedSchema will be scanned.READ_SUPPORT_CLASS,
UNBOUND_RECORD_FILTER,
STRICT_TYPE_CHECKING,
FILTER_PREDICATE,
TASK_SIDE_METADATA| Modifier and Type | Field and Description |
|---|---|
static String |
DICTIONARY_FILTERING_ENABLED
key to configure whether row group dictionary filtering is enabled
|
static String |
FILTER_PREDICATE
key to configure the filter predicate
|
static String |
READ_SUPPORT_CLASS
key to configure the ReadSupport implementation
|
static String |
RECORD_FILTERING_ENABLED
key to configure whether record-level filtering is enabled
|
static String |
SPLIT_FILES
key to turn off file splitting.
|
static String |
STATS_FILTERING_ENABLED
key to configure whether row group stats filtering is enabled
|
static String |
STRICT_TYPE_CHECKING
key to configure type checking for conflicting schemas (default: true)
|
static String |
TASK_SIDE_METADATA
key to turn on or off task side metadata loading (default true)
if true then metadata is read on the task side and some tasks may finish immediately.
|
static String |
UNBOUND_RECORD_FILTER
key to configure the filter
|
| Constructor and Description |
|---|
ParquetInputFormat()
Hadoop will instantiate using this constructor
|
ParquetInputFormat(Class<S> readSupportClass)
Constructor for subclasses, such as AvroParquetInputFormat, or wrappers.
|
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<Void,T> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) |
static FilterCompat.Filter |
getFilter(org.apache.hadoop.conf.Configuration conf)
Returns a non-null Filter, which is a wrapper around either a
FilterPredicate, an UnboundRecordFilter, or a no-op filter.
|
List<Footer> |
getFooters(org.apache.hadoop.conf.Configuration configuration,
Collection<org.apache.hadoop.fs.FileStatus> statuses)
the footers for the files
|
List<Footer> |
getFooters(org.apache.hadoop.conf.Configuration configuration,
List<org.apache.hadoop.fs.FileStatus> statuses) |
List<Footer> |
getFooters(org.apache.hadoop.mapreduce.JobContext jobContext) |
GlobalMetaData |
getGlobalMetaData(org.apache.hadoop.mapreduce.JobContext jobContext) |
static Class<?> |
getReadSupportClass(org.apache.hadoop.conf.Configuration configuration) |
static <T> ReadSupport<T> |
getReadSupportInstance(org.apache.hadoop.conf.Configuration configuration) |
List<ParquetInputSplit> |
getSplits(org.apache.hadoop.conf.Configuration configuration,
List<Footer> footers)
Deprecated.
split planning using file footers will be removed
|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) |
static Class<?> |
getUnboundRecordFilter(org.apache.hadoop.conf.Configuration configuration)
Deprecated.
|
protected boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path filename) |
static boolean |
isTaskSideMetaData(org.apache.hadoop.conf.Configuration configuration) |
protected List<org.apache.hadoop.fs.FileStatus> |
listStatus(org.apache.hadoop.mapreduce.JobContext jobContext) |
static void |
setFilterPredicate(org.apache.hadoop.conf.Configuration configuration,
FilterPredicate filterPredicate) |
static void |
setReadSupportClass(org.apache.hadoop.mapreduce.Job job,
Class<?> readSupportClass) |
static void |
setReadSupportClass(org.apache.hadoop.mapred.JobConf conf,
Class<?> readSupportClass) |
static void |
setTaskSideMetaData(org.apache.hadoop.mapreduce.Job job,
boolean taskSideMetadata) |
static void |
setUnboundRecordFilter(org.apache.hadoop.mapreduce.Job job,
Class<? extends UnboundRecordFilter> filterClass) |
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic static final String READ_SUPPORT_CLASS
public static final String UNBOUND_RECORD_FILTER
public static final String STRICT_TYPE_CHECKING
public static final String FILTER_PREDICATE
public static final String RECORD_FILTERING_ENABLED
public static final String STATS_FILTERING_ENABLED
public static final String DICTIONARY_FILTERING_ENABLED
public static final String TASK_SIDE_METADATA
public static final String SPLIT_FILES
public ParquetInputFormat()
public ParquetInputFormat(Class<S> readSupportClass)
Subclasses and wrappers may use this constructor to set the ReadSupport class that will be used when reading instead of requiring the user to set the read support property in their configuration.
readSupportClass - a ReadSupport subclasspublic static void setTaskSideMetaData(org.apache.hadoop.mapreduce.Job job,
boolean taskSideMetadata)
public static boolean isTaskSideMetaData(org.apache.hadoop.conf.Configuration configuration)
public static void setReadSupportClass(org.apache.hadoop.mapreduce.Job job,
Class<?> readSupportClass)
public static void setUnboundRecordFilter(org.apache.hadoop.mapreduce.Job job,
Class<? extends UnboundRecordFilter> filterClass)
@Deprecated public static Class<?> getUnboundRecordFilter(org.apache.hadoop.conf.Configuration configuration)
getFilter(Configuration)public static void setReadSupportClass(org.apache.hadoop.mapred.JobConf conf,
Class<?> readSupportClass)
public static Class<?> getReadSupportClass(org.apache.hadoop.conf.Configuration configuration)
public static void setFilterPredicate(org.apache.hadoop.conf.Configuration configuration,
FilterPredicate filterPredicate)
public static FilterCompat.Filter getFilter(org.apache.hadoop.conf.Configuration conf)
public org.apache.hadoop.mapreduce.RecordReader<Void,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<Void,T>IOExceptionInterruptedExceptionpublic static <T> ReadSupport<T> getReadSupportInstance(org.apache.hadoop.conf.Configuration configuration)
configuration - to find the configuration for the read supportprotected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path filename)
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>IOException@Deprecated public List<ParquetInputSplit> getSplits(org.apache.hadoop.conf.Configuration configuration, List<Footer> footers) throws IOException
configuration - the configuration to connect to the file systemfooters - the footers of the files to readIOExceptionprotected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
listStatus in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>IOExceptionpublic List<Footer> getFooters(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
jobContext - the current job contextIOExceptionpublic List<Footer> getFooters(org.apache.hadoop.conf.Configuration configuration, List<org.apache.hadoop.fs.FileStatus> statuses) throws IOException
IOExceptionpublic List<Footer> getFooters(org.apache.hadoop.conf.Configuration configuration, Collection<org.apache.hadoop.fs.FileStatus> statuses) throws IOException
configuration - to connect to the file systemstatuses - the files to openIOExceptionpublic GlobalMetaData getGlobalMetaData(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException
jobContext - the current job contextIOExceptionCopyright © 2018 The Apache Software Foundation. All rights reserved.