Python时间序列缺失值填充
mark来源:时间序列缺失值填充import pandas as pddef fill_source(source, start_time, end_time):"""采用窗口长度为5的移动均值对缺失值进行填充@param source_df:@param start_time: 开始时间戳,str格式@param end_time: 结束时间,str格式@return:
·
mark来源:时间序列缺失值填充
import pandas as pd
def fill_source(source, start_time, end_time):
"""
采用窗口长度为5的移动均值对缺失值进行填充
@param source_df:
@param start_time: 开始时间戳,str格式
@param end_time: 结束时间,str格式
@return:
"""
if source is None or len(source) <= 0:
return source
# 转换起止时间格式
start_time = datetime.strptime(start_time, time_format_sample_time)
end_time = datetime.strptime(end_time, time_format_sample_time)
# 缺失值填充
source[column_name_time] = pd.to_datetime(source[column_name_time])
helper = pd.DataFrame({column_name_time: pd.date_range(start_time, end_time, freq="H")})
source = pd.merge(source, helper, on=column_name_time, how="outer").sort_values(column_name_time)
# 线性插值
source[column_name_data] = source[column_name_data].interpolate(method="linear")
# 填补未计算出来的缺失值(method可以参考官方文档)
source[column_name_data].fillna(method="backfill", inplace=True)
return source
source = pd.DataFrame()
time_list = ["2022-01-04 04:00:00", "2022-01-04 06:00:00", "2022-01-04 00:00:00", "2022-01-04 02:00:00"]
data_list = [4, 6, 0, 2]
source[column_name_time] = time_list
source[column_name_data] = data_list
source = fill_source(source, "2022-01-03 23:00:00", "2022-01-04 07:00:00")

GitCode 天启AI是一款由 GitCode 团队打造的智能助手,基于先进的LLM(大语言模型)与多智能体 Agent 技术构建,致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话,还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力,真正做到“一句话,让 Al帮你完成复杂任务”。
更多推荐
所有评论(0)