Mahout Recommendations with Data Sets containing Alpha Numeric Item Ids

June 27, 2016 by S4

Filed under Hadoop, Mahout

Last modified June 27, 2016

Mahout Recommendations with Data Sets containing Alpha Numeric Item Ids

In real world data we can’t always ensure that the input data supplied to us in order to generate recommendations should contain only integer values for User and Item Ids. If these values or any one of these are not integers then default data models that mahout provides won’t be suitable to process our data.

let us consider the case where out Item ID is Strings we would define our custom data model. In our data model we need to override a method in order to read item id as a string and convert the same into long and return the unique long value

Data Model Class

import java.io.File;
import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;

public class AlphaItemFileDataModel extends FileDataModel {
private final ItemMemIDMigrator memIdMigtr = new ItemMemIDMigrator();

public AlphaItemFileDataModel(File dataFile) throws IOException {
super(dataFile);
}

public AlphaItemFileDataModel(File dataFile, boolean transpose) throws IOException {
super(dataFile, transpose);
}

@Override
protected long readItemIDFromString(String value) {
long retValue = memIdMigtr.toLongID(value);
if(null == memIdMigtr.toStringID(retValue)){
try {
memIdMigtr.singleInit(value);
} catch (TasteException e) {
e.printStackTrace();
}
}
return retValue;
}

String getItemIDAsString(long itemId){
return memIdMigtr.toStringID(itemId);
}
}

Class that defines the map to store the String to Long values

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.AbstractIDMigrator;

public class ItemMemIDMigrator extends AbstractIDMigrator {

private final FastByIDMap longToString;

public ItemMemIDMigrator() {
this.longToString = new FastByIDMap(100);
}

@Override
public void storeMapping(long longID, String stringID) {
synchronized (longToString) {
longToString.put(longID, stringID);
}
}

@Override
public String toStringID(long longID) {
synchronized (longToString) {
return longToString.get(longID);
}
}
public void singleInit(String stringID) throws TasteException {
storeMapping(toLongID(stringID), stringID);
}

}

In your Recommender implementation you can use this Data Model class instead of the default file data model to accept an input that contains alpha numeric Item Ids. Similar you can device the code to form a data model that would accommodate alpha numeric User Ids as well.

Leave a Comment