public class AvroParquetInputFormat<T> extends ParquetInputFormat<T>
InputFormat for Parquet files.DICTIONARY_FILTERING_ENABLED, FILTER_PREDICATE, READ_SUPPORT_CLASS, RECORD_FILTERING_ENABLED, SPLIT_FILES, STATS_FILTERING_ENABLED, STRICT_TYPE_CHECKING, TASK_SIDE_METADATA, UNBOUND_RECORD_FILTER| Constructor and Description |
|---|
AvroParquetInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
static void |
setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job,
Class<? extends AvroDataSupplier> supplierClass)
Uses an instance of the specified
AvroDataSupplier class to control how the
SpecificData instance that is used to find
Avro specific records is created. |
static void |
setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema avroReadSchema)
Override the Avro schema to use for reading.
|
static void |
setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema requestedProjection)
Set the subset of columns to read (projection pushdown).
|
createRecordReader, getFilter, getFooters, getFooters, getFooters, getGlobalMetaData, getReadSupportClass, getReadSupportInstance, getSplits, getSplits, getUnboundRecordFilter, isSplitable, isTaskSideMetaData, listStatus, setFilterPredicate, setReadSupportClass, setReadSupportClass, setTaskSideMetaData, setUnboundRecordFilteraddInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic static void setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema requestedProjection)
This is useful if the full schema is large and you only want to read a few columns, since it saves time by not reading unused columns.
If a requested projection is set, then the Avro schema used for reading
must be compatible with the projection. For instance, if a column is not included
in the projection then it must either not be included or be optional in the read
schema. Use setAvroReadSchema(org.apache.hadoop.mapreduce.Job,
org.apache.avro.Schema) to set a read schema, if needed.
job - requestedProjection - setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema),
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)public static void setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema avroReadSchema)
Differences between the read and write schemas are resolved using Avro's schema resolution rules.
job - avroReadSchema - setRequestedProjection(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema),
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)public static void setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job,
Class<? extends AvroDataSupplier> supplierClass)
AvroDataSupplier class to control how the
SpecificData instance that is used to find
Avro specific records is created.job - supplierClass - Copyright © 2018 The Apache Software Foundation. All rights reserved.