Avoid Memory Error : Techniques to reduce Dataframe Memory Usage
Lets follow STAR (Situation, Task, Action, Result) approach to understand the article.
SITUATION
I was working on a project involves Machine learning with 4 GB RAM System and required a lot of memory intensive computation or data-set size that was large enough to hang my system.
TASK
Can I still complete my project which requires memory intensive computation with 4 GB RAM ?
How to avoid Memory Error ?
How to reduce memory usage by your variables in program (Lets take variable as dataframe object as of now) ?
ACTION
I have written Jupyter notebook to show techniques to reduce dataframe size even by 98% in some cases. For detailed explanation of different memory reduction scenarios and complete code, please refer to Jupyter notebook.
However, I am just pasting important 4 lines of code for your reference i.e. 4 techniques to reduce dataframe size:
- Change in int datatype
## Action: conversion of dtype from "int32" to "uint8"
converted_df_age = df_age.astype(np.uint8)
- Change in float datatype
## Action: conversion of dtype from "float64" to "float16"
converted_df_query_doc = df_query_doc.astype('float16')
- Change from object to category datatype
## Action: conversion of dtype from "object" to "category"
converted_df_day_of_week = df_day_of_week.astype('category')
- Convert to Sparse DataFrame
## Action: Change of DataFrame type to SparseDataFrame
df_sparse = df_dense.to_sparse()
RESULT
Hurray !! You learned how to reduce dataframe size given different scenarios assuming you have gone through Jupyter notebook completely :).
Please clap if article helps you and share with your friends as well.
Happy Learning !!