Sign in

Photo by Isaac Smith on Unsplash

*This blog post is intended for newer data scientists that know the basics of time series and are starting to get their feet wet with modeling.*

Time series analysis can quickly become very confusing when first diving in. It is therefore important for data scientists to gather as many helpful tools as possible. ARIMA/SARIMA modeling are some of the top choice modeling techniques that are used for time series analysis. These models require a handful of parameters that need to be known to create an accurate model. There are different methods that can be used to find the most optimal…


Photo by Sebastian Herrmann on Unsplash

Working with real world data can be quite frustrating. It’s usually missing data points, some values might have been entered wrong, and some datatypes might be underrepresented. These can all lead to very messy data to work with. One such factor that comes up when doing a classification problem on real world data is “class imbalance”. This article is specifically going to address ways that we can combat this issue in our datasets.

So what is class imbalance? Class imbalance is when one class is not represented in the same quantity as the other classes in a dataset. This often…


Photo by Lukas Blazek on Unsplash

Modeling is a very useful tool that can be used in every industry. It can be used to see what factors are important, looking at the trends in the dataset and even look to the future to predict outcomes. Since modeling is so important, every Data Scientist needs to learn the most basic modeling, Linear Regression.

So what is Linear Regression? Linear Regression is a linear approach to modeling that looks at a dataset and tries to fit a linear “Best Fit” line through a set of data points. By placing a best fit line a linear trend between the…


*If you know SQL and are comfortable with its syntax, you might find this blog post helpful when working with Pandas Data Frames.*

Gathering data from multiple Pandas Data Frames can be a headache, especially if you are new to Python and unfamiliar with pandas. However, there is a neat package called pandasql that can simplify it all. Kind of like a translator, Pandasql is able to take in queries written in SQL syntax and apply them to Pandas DataFrames.

Setting up pandasql

First things first, pandasql needs to be installed. To do this you can simply run this line in the terminal:

Steven Kyle

25 year old Texan in the midst of a career change into DataScience.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store