PMIPairSort (OpenIMAJ master project 1.3.10 API)

java.lang.Object
- org.openimaj.hadoop.mapreduce.stage.Stage<org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<INPUT_KEY,INPUT_VALUE>,org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<OUTPUT_KEY,OUTPUT_VALUE>,INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>
- - org.openimaj.hadoop.mapreduce.stage.helper.SequenceFileTextStage<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
  - - org.openimaj.hadoop.tools.twitter.token.mode.pointwisemi.sort.PMIPairSort

public class PMIPairSort
extends SequenceFileTextStage<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>

Sort pairs by PMI within timeperiods

Author:: Sina Samangooei (ss@ecs.soton.ac.uk)

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`MINP_KEY` The minimum PMI
`static String`	`MINPAIRCOUNT_KEY` The minimum number of pairs
`static String`	`PAIRMI_LOC` The location of the pairmi
`static String`	`PMI_NAME` the output name

Constructor Summary

Constructors
Constructor and Description
`PMIPairSort(double minp, int minPairCount, org.apache.hadoop.fs.Path outpath)`
`PMIPairSort(double minp, org.apache.hadoop.fs.Path outpath)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Class<? extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>>`	`mapper()` By default this method returns the `IdentityMapper` class.
`String`	`outname()`
`static IndependentPair<Long,Double>`	`parseTimeBinary(byte[] bytes)` read time and pmi from a byte array.
`static IndependentPair<Long,Double>`	`parseTimeBinary(byte[] bytes, int start, int len)` use a `ByteArrayInputStream` and a `DataInputStream` to read a byte[]
`Class<? extends org.apache.hadoop.mapreduce.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>>`	`reducer()` By default this method returns the `IdentityReducer` class.
`void`	`setup(org.apache.hadoop.mapreduce.Job job)` Add any final adjustments to the job's config
`static byte[]`	`timePMIBinary(long timet, double pmi)` write time pmi to a byte array

Methods inherited from class org.openimaj.hadoop.mapreduce.stage.Stage
combiner, finished, lzoCompress, setCombinerClass, setMapperClass, setReducerClass, stage

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - MINP_KEY
```
public static final String MINP_KEY
```
    The minimum PMI
    
    See Also:
    
    Constant Field Values
  - PMI_NAME
```
public static final String PMI_NAME
```
    the output name
    
    See Also:
    
    Constant Field Values
  - MINPAIRCOUNT_KEY
```
public static final String MINPAIRCOUNT_KEY
```
    The minimum number of pairs
    
    See Also:
    
    Constant Field Values
  - PAIRMI_LOC
```
public static final String PAIRMI_LOC
```
    The location of the pairmi
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - PMIPairSort
```
public PMIPairSort(double minp,
                   org.apache.hadoop.fs.Path outpath)
```
    Parameters:
    
    minp - the minimum PMI value
    
    outpath - for loading the PMIStats file
  - PMIPairSort
```
public PMIPairSort(double minp,
                   int minPairCount,
                   org.apache.hadoop.fs.Path outpath)
```
    Parameters:
    
    minp - the minimum PMI value
    
    minPairCount - the minimum number of pairs to emit
    
    outpath - for loading the PMIStats file
- Method Detail
  - mapper
```
public Class<? extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>> mapper()
```
    Description copied from class: Stage
    
    By default this method returns the IdentityMapper class. This mapper outputs the values handed as they are.
    
    Overrides:
    
    mapper in class Stage<org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>,org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
    
    Returns:
    
    the class of the mapper to use
  - reducer
```
public Class<? extends org.apache.hadoop.mapreduce.Reducer<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>> reducer()
```
    Description copied from class: Stage
    
    By default this method returns the IdentityReducer class. This reducer outputs the values handed as they are.
    
    Overrides:
    
    reducer in class Stage<org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>,org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
    
    Returns:
    
    the class of the reducer to use
  - outname
```
public String outname()
```
    Overrides:
    
    outname in class Stage<org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>,org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
    
    Returns:
    
    the name of the output directory of this stage. If the name is null the directory itself is used.
  - setup
```
public void setup(org.apache.hadoop.mapreduce.Job job)
```
    Description copied from class: Stage
    
    Add any final adjustments to the job's config
    
    Overrides:
    
    setup in class Stage<org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable>,org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
  - timePMIBinary
```
public static byte[] timePMIBinary(long timet,
                                   double pmi)
                            throws IOException
```
    write time pmi to a byte array
    
    Parameters:
    
    timet -
    
    pmi -
    
    Returns:
    
    a byte array encoding of time and pmi
    
    Throws:
    
    IOException
  - parseTimeBinary
```
public static IndependentPair<Long,Double> parseTimeBinary(byte[] bytes)
                                                    throws IOException
```
    read time and pmi from a byte array. class parseTimeBinary(byte[], int, int) with start = 0 and len = bytes.length
    
    Parameters:
    
    bytes - the bytes to parse
    
    Returns:
    
    time and pmi pair
    
    Throws:
    
    IOException
  - parseTimeBinary
```
public static IndependentPair<Long,Double> parseTimeBinary(byte[] bytes,
                                                           int start,
                                                           int len)
                                                    throws IOException
```
    use a ByteArrayInputStream and a DataInputStream to read a byte[]
    
    Parameters:
    
    bytes -
    
    start - offset into bytes
    
    len - length to read
    
    Returns:
    
    the time pmi pair
    
    Throws:
    
    IOException

Class PMIPairSort

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.openimaj.hadoop.mapreduce.stage.Stage

Methods inherited from class java.lang.Object

Field Detail

MINP_KEY

PMI_NAME

MINPAIRCOUNT_KEY

PAIRMI_LOC

Constructor Detail

PMIPairSort

PMIPairSort

Method Detail

mapper

reducer

outname

setup

timePMIBinary

parseTimeBinary

parseTimeBinary