Hello folks! Could anyone advise on the following, please?
I’m currently working on an AutoML pipeline and using Sagemaker Pipeline Python SDK based on steps.
One of the steps is HyperparameterTuning Job that looks the following way:
framework="xgboost",
region=pipeline_session.boto_region_name,
version="1.0-1",
py_version="py3",
instance_type=instance_type,
)
tuner_hpo = HyperparameterTuner(
estimator = Estimator(
image_uri=image_uri,
instance_type=instance_type,
instance_count=1,
output_path=estimator_path,
role=role,
sagemaker_session=pipeline_session,
hyperparameters = {
"eval_metric": "rmse",
"objective": "reg:squarederror",
"num_round": 10,
"eta": 0.2,
}
),
objective_metric_name = 'validation:rmse',
hyperparameter_ranges={
'max_depth': IntegerParameter(10, 11),
},
objective_type='Minimize',
max_jobs=2,
max_parallel_jobs=2,
)
step_tuning = TuningStep(
name="HPOTuning",
tuner=tuner_hpo,
inputs = {
"train": TrainingInput(
s3_data=step_preprocess_input_data.properties.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri,
content_type="text/csv",
),
"validation": TrainingInput(
s3_data=step_preprocess_input_data.properties.ProcessingOutputConfig.Outputs['validation'].S3Output.S3Uri,
content_type="text/csv",
),
},
#cache_config=cache_config
)```
afterwards, I want to train the best_estimator from the tuning job on another dataset. My way of thinking is to create another estimator and exploit the TrainStep in the following way:
```best_estimator = Estimator(
image_uri=image_uri,
role=role,
output_path=trained_model_path,
instance_type=instance_type,
instance_count=1,
sagemaker_session=pipeline_session,
hyperparameters=?,
model_uri=?
)
training_step_args = best_estimator.fit(
inputs={
"train": TrainingInput(
s3_data=step_preprocess_input_data.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri, # train -> full_history
content_type="text/csv",
),
}
)
step_train = TrainingStep(
name="TrainBestEstimator",
step_args=training_step_args
)```
The problems are:
• model_uri doesn't allow the Join object as an input, because it doesn't have a decode method
• model_uri doesn't allow the String object as an input, because it doesn't have a decode method
• in order to explicitely set the hyperparameters, it is required to get them from the tuning job somehow, I do not see the way to tackle it for now.