Expect_column_values_to_be_of_type for dtype=object

For the integer column been automatically inferred as dtype object, how to check its actual data type Int64?

when I check with int, 18% are failing

batch.expect_column_values_to_be_of_type(column="departmentId", type_="int")

{
  "success": false,
  "result": {
    "element_count": 29987,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 5411,
    "unexpected_percent": 18.044485943909027,
    "unexpected_percent_nonmissing": 18.044485943909027,
    "partial_unexpected_list": [
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288",
      "288"
    ]
  },
  "meta": {},
  "exception_info": null
}

when I check with int64, 100% failed

batch.expect_column_values_to_be_of_type(column=“departmentId”, type_=“int64”)

{
  "success": false,
  "result": {
    "element_count": 29987,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 29987,
    "unexpected_percent": 100.0,
    "unexpected_percent_nonmissing": 100.0,
    "partial_unexpected_list": [
      287,
      288,
      287,
      287,
      288,
      288,
      287,
      287,
      288,
      288,
      288,
      288,
      288,
      288,
      288,
      288,
      288,
      288,
      288,
      288
    ]
  },
  "meta": {},
  "exception_info": null
}

You can use a related expectation expect_column_values_to_be_in_type_list that allows to specify a list of types instead of one type.

Thanks, we realize for a Str column, we are ending up with below expectation.

expect_column_values_to_be_in_type_list(column="test_col", type_list=["str","float64","int64"])

Is that what’s recommended? Although it’s a str column but some small dataset are having int/float values only, I was expecting it to be successful if we are checking type str ONLY as int can be parsed into str?

1 Like

@jasonlu This expectation checks “storage types” - types the storage engine (e.g., a database or Pandas) uses to store the data. Yes, verifying against a list of types is recommended in this case.

thanks @eugene.mandel