I try to learn matplotlib and stuck on some nuisance. I have these lines:
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
current_dir = os.path.dirname(os.path.abspath(__file__))
csv_path = os.path.join(current_dir, "CSV\\")
df = pd.DataFrame()
df = df.append(pd.read_csv(csv_path + "MainData.csv"), sort=False)
periodB4 = "'2023-05-10' AND '2023-05-13'"
def makeStartEndDates(x):
start_date, end_date = x.split(' AND ')
start_date = start_date.strip()
end_date = end_date.strip()
return [start_date, end_date]
start_date_b4, end_date_b4 = makeStartEndDates(periodB4)
selected_df = df.iloc[:-5, :]
selected_df['date'] = pd.to_datetime(selected_df['date'], format='%Y-%m-%d')
b4period = selected_df.loc[selected_df['date'].between(start_date_b4, end_date_b4)]
# print(b4period)
plt.bar(b4period['date'], b4period['dau'])
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Bar Chart Example')
plt.tight_layout()
plt.savefig('chart.png')
So basically I get excess date 2023-05-09 and all other dates are duplicated. And it is only in chart, can't see any of that in csv of df.
How can I avoid that? So that x axis will have dates from '2023-05-10' to '2023-05-13', and they will be shown only once?
Some complications with dates are needed to be used together with some other scripts, to work with BigQuery and SQL.
Here is a sample of csv:
Output of print(b4period.head(10).to_dict('list'))
:
{'date': [Timestamp('2023-05-10 00:00:00'), Timestamp('2023-05-11 00:00:00'), Timestamp('2023-05-12 00:00:00'), Timestamp('2023-05-13 00:00:00')], 'new_users': [2885.0, 2954.0, 3160.0, 4086.0], 'dau': [8627.0, 9112.0, 9318.0, 9327.0], 'wau': [28542.0, 28542.0, 28542.0, 28542.0]}