PyTorch로 시작하는 딥 러닝 입문 00. 파이토치 공식 문서 링크 01. 책 소개하기 02. 파이토치 기초(PyTorch Basic) 01. 파이토치 패키지의 기본 구성 02. 텐서 조작하기(Tensor Manipulation) 1 03. 텐서 조작하기(Tensor Manipulation) 2 04. 파이썬 클래스(class) 03. 선형 회귀(Linear Regression) 01. 선형 회귀(Linear Regression) 02. 자동 미분(Autograd) 03. 다중 선형 회귀(Multivariable Linear regression) 04. nn.Module로 구현하는 선형 회귀 05. 클래스로 파이토치 모델 구현하기 06. 미니 배치와 데이터 로드(Mini Batch and Data Load) 07. 커스텀 데이터셋(Custom Dataset) 04. 로지스틱 회귀(Logistic Regression) 01. 로지스틱 회귀(Logistic Regression) 02. nn.Module로 구현하는 로지스틱 회귀 03. 클래스로 파이토치 모델 구현하기 05. 소프트맥스 회귀(Softmax Regression) 01. 원-핫 인코딩(One-Hot Encoding) 02. 소프트맥스 회귀(Softmax Regression) 이해하기 03. 소프트맥스 회귀의 비용 함수 구현하기 04. 소프트맥스 회귀 구현하기 05. 소프트맥스 회귀로 MNIST 데이터 분류하기 06. 인공 신경망(Aritificial Neural Network) 01. 머신 러닝 용어 이해하기 02. 퍼셉트론(Perceptron) 03. XOR 문제 - 단층 퍼셉트론 구현하기 04. 역전파(BackPropagation) 05. XOR 문제 - 다층 퍼셉트론 구현하기 06. 비선형 활성화 함수(Activation function) 07. 다층 퍼셉트론으로 손글씨 분류하기 08. 다층 퍼셉트론으로 MNIST 분류하기 09. 과적합(Overfitting)을 막는 방법들 10. 기울기 소실(Gradient Vanishing)과 폭주(Exploding) 07. 합성곱 신경망(Convolutional Neural Network) 01. 합성곱과 풀링(Convolution and Pooling) 02. CNN으로 MNIST 분류하기 03. 깊은 CNN으로 MNIST 분류하기 08. 자연어 처리의 전처리 01. 자연어 처리 전처리 이해하기 02. 토치텍스트 튜토리얼(Torchtext tutorial) - 영어 03. 토치텍스트 튜토리얼(Torchtext tutorial) - 한국어 04. 토치텍스트(TorchText)의 batch_first 09. 단어의 표현 방법 01. NLP에서의 원-핫 인코딩(One-hot encoding) 02. 워드 임베딩(Word Embedding) 03. 워드투벡터(Word2Vec) 05. 임베딩 벡터의 시각화(Embedding Visualization) 06. 글로브(GloVe) 07. 파이토치(PyTorch)의 nn.Embedding() 08. 사전 훈련된 워드 임베딩(Pretrained Word Embedding) 10. 순환 신경망(Recurrent Neural Network) 01. 순환 신경망(Recurrent Neural Network, RNN) 02. 장단기 메모리(Long Short-Term Memory, LSTM) 11. 다대다 RNN을 이용한 텍스트 생성 01. 문자 단위 RNN(Char RNN) 02. 문자 단위 RNN(Char RNN) - 더 많은 데이터 03. 단어 단위 RNN - 임베딩 사용 12. 다대일 RNN을 이용한 텍스트 분류 01. 파이토치를 이용한 텍스트 분류(Text classification using PyTorch) 02. IMDB 리뷰 감성 분류하기(IMDB Movie Review Sentiment Analysis) 13. 시퀀스 레이블링(Sequence Labeling) 01. 시퀀스 레이블링(Sequence Labeling) 02. 양방향 RNN을 이용한 품사 태깅 14. 시퀀스투시퀀스(Sequence-to-Sequence, seq2seq) 01. 시퀀스투시퀀스(Sequence-to-Sequence, seq2seq) 15. 교육 문의 Show
This article was first published on Python – Hutsons-hacks , and kindly contributed to python-bloggers. (You can report issue about the content on this page here) This assumes you know how to programme in
Python and know a little about n-dimensional arrays and how to work with them in numpy (don’t worry if you don’t I got you covered). PyTorch is a pythonic way of building Deep Learning neural networks from scratch. This is something I have been learning over the last 2 years, as historically my go to deep learning framework was
Tensorflow. For beginners to PyTorch it can be daunting to first work with the application as it forces you in the direction of building Python classes, inheritance and tensor and array programming. However, once you start to work with it you start to appreciate the power of PyTorch and how much control it gives you on the creation
process of deep neural networks. I have since implemented these frameworks for Natrual Language Processing, Computer Vision, Transformers and audio classification. In this tutorial we will build up a MLP from the ground up and I will teach you what each step of my network is doing. If you are ready – then let’s dive in! Open your mind and prepare to explore the wonderful and strange world of PyTorch. With this tutorial we will use a dataset from the MLDataR project on classifying whether a person has thyroid disease or not. This uses a number of variables to indicate if thyroid disease is present. There are two flavours: We are going to need a number of different functions and packages. The first stage is to
get all these imported, if they are not installed, then do not concern yourself as I have a requirements.txt file on hand to help you in the supporting GitHub repository: Here we are using numpy (for array processing), pandas (for working with data frames and series), sklearn for encoding of labels and building up model metrics, torch utilities for working with the input data, torch tensors and other elements of our MLP stack and the time module for timing how long our training
loops take. The data were are going to be using for this PyTorch tutorial has already been preprocessed and consists of all the fields where I have stripped off the row headers. This is linked to the thyroid data discussed and contains: The link to the dataset is: https://raw.githubusercontent.com/StatsGary/Data/main/thyroid_raw.csv. The first stage of the process is to take the data and create a PyTorch readable data object. These are called data loaders and tell PyTorch how to work with the data. This will be defined in the next steps, but to read more about data loaders, see the
official tutorial site: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html. This is where things get interesting and we will give chunk by chunk into what is happening under the hood. Creating the data loader to pull in CSV filesFirstly we need to create a dataset class with one input Dataset – this is a specific PyTorch module that works with various types of data. Because we have tabular data, we will need to declare a reader to read in the file from the link above (the raw data stored on GitHub) and then we will do some conversions: class ThryoidCSVDataset(Dataset): #Constructor for initially loading def __init__(self,path): df = read_csv(path, header=None) # Store the inputs and outputs self.X = df.values[:, :-1] self.y = df.values[:, -1] #Assuming your outcome variable is in the first column self.X = self.X.astype('float32') # Label encode the target as values 1 and 0 or sick and not sick self.y = LabelEncoder().fit_transform(self.y) self.y = self.y.astype('float32') self.y = self.y.reshape((len(self.y), 1)) So there are a number of things to explain here:
Magic method (dunder) in our classStill within the class, we build two more class functions. These are what are known as dunder (magic) methods that every class has in Python: def __init__(self,path): # Get the number of rows in the dataset def __len__(self): return len(self.X) # Get a row at an index def __getitem__(self,idx): return [self.X[idx], self.y[idx]] Here we have overridden the length method to provide the number of rows in our training set and we have used the __getitem__ method to pull the relevant index position (idx) of the relevant item. This will allow us to slice and retrieve elements. Building our custom method to split our data into train and test splitsFrom looking at this code do you notice that when we declare self in the function parameter block we always declare it first! Also, tip, it does not have to be self you can call it myself, gary, colin, so long as it is consistent. However, it is standard practice to use the self naming convention. def split_data(self, split_ratio=0.2): test_size = round(split_ratio * len(self.X)) train_size = len(self.X) - test_size return random_split(self, [train_size, test_size]) People familiar with SKLearn will be aware of the train_test_split function. Here I have replicated the same functionality. This is what the function does:
If you made it this far then well done – we have covered quite a bit of ground already. In saying that, we have only really told PyTorch how to read the data. What we do next is tell the network how we want to model the data. The full implementation of the class we just defined, is below: # Create a custom CSVDataset loader class ThyroidCSVDataset(Dataset): #Constructor for initially loading def __init__(self,path): df = read_csv(path, header=None) # Store the inputs and outputs self.X = df.values[:, :-1] self.y = df.values[:, -1] #Assuming your outcome variable is in the first column self.X = self.X.astype('float32') # Label encode the target as values 1 and 0 or sick and not sick self.y = LabelEncoder().fit_transform(self.y) self.y = self.y.astype('float32') self.y = self.y.reshape((len(self.y), 1)) # Get the number of rows in the dataset def __len__(self): return len(self.X) # Get a row at an index def __getitem__(self,idx): return [self.X[idx], self.y[idx]] # Create custom class method - instead of dunder methods def split_data(self, split_ratio=0.2): test_size = round(split_ratio * len(self.X)) train_size = len(self.X) - test_size return random_split(self, [train_size, test_size]) Step two – defining our multi-layer perceptron ANNStill with me? If you are then thanks, if not come back after a hot coffee, or tea (so British of me!). A wee bit of theoryWe are going to define a multilayer perceptron class, and we will implement a forward message passing layer, we call this a feed forward structure: The idea is that we feed our inputs – in this case of independent variables, we then build what are known as hidden layers these layers pass values between them and can start to work out patterns in the data, that as mere mortals we would struggle to analyse. The more layers we expose, the longer the network takes to train due to increase complexity and number of connections. Weights are fed at various stages through the network and then updated through something called back propagation which is normally monitored through an epoch. The network learns through how big an update is made to the weights across the network – this is called learning rate and the loss is optimised and minimised through each pass through the network. How the nodes are activated in the network is controlled by an activation function, which says which nodes to activate and which nodes not to. This network simulates how the human cognitive process learns. I have reached the end of the theory part of this blog. Let’s get on with implementing it. Implementing the layers in our networkFirstly we will declare our class and specify the number of inputs to the network: # Create model class ThyroidMLP(Module): def __init__(self, n_inputs): super(ThyroidMLP, self).__init__() # First hidden layer self.hidden1 = Linear(n_inputs, 20) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # Second hidden layer self.hidden2 = Linear(20, 10) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # Third hidden layer self.hidden3 = Linear(10,1) xavier_uniform_(self.hidden3.weight) self.act3 = Sigmoid() Lots to explain here:
Create the forward passing mechanismYou will see this in a number of PyTorch scripts – the weights are passed forward through the network and then the updates to the weights i.e. optimization is done by back propogation. Let’s put the final piece of our network together. def forward(self, X): #Input to the first hidden layer X = self.hidden1(X) X = self.act1(X) # Second hidden layer X = self.hidden2(X) X = self.act2(X) # Third hidden layer X = self.hidden3(X) X = self.act3(X) return X We use X to keep passing each layer to the next until we have all of our hidden layers and activation functions defined. We have now created our model class, that we will work with in the training loop. The full implementation of this part of the class is below: class ThyroidMLP(Module): def __init__(self, n_inputs): super(ThyroidMLP, self).__init__() # First hidden layer self.hidden1 = Linear(n_inputs, 20) kaiming_uniform_(self.hidden1.weight, nonlinearity='relu') self.act1 = ReLU() # Second hidden layer self.hidden2 = Linear(20, 10) kaiming_uniform_(self.hidden2.weight, nonlinearity='relu') self.act2 = ReLU() # Third hidden layer self.hidden3 = Linear(10,1) xavier_uniform_(self.hidden3.weight) self.act3 = Sigmoid() def forward(self, X): #Input to the first hidden layer X = self.hidden1(X) X = self.act1(X) # Second hidden layer X = self.hidden2(X) X = self.act2(X) # Third hidden layer X = self.hidden3(X) X = self.act3(X) return X The fun part is coming up, definition of our training loop. Step three – the training loopThis is how the model will train and update. These can be as complex or simple as you want to make them. I have tried to aim it in the middle, as overly simplified would not be useful, and to complicated might fry your brain at this stage, especially if you are a beginner coming at it. # Create training loop based off our custom class def train_model(train_dl, model, epochs=100, lr=0.01, momentum=0.9, save_path='thyroid_best_model.pth'): # Define your optimisation function for reducing loss when weights are calculated # and propogated through the network start = time.time() criterion = BCELoss() optimizer = SGD(model.parameters(), lr=lr, momentum=momentum) loss = 0.0 for epoch in range(epochs): print('Epoch {}/{}'.format(epoch+1, epochs)) print('-' * 10) model.train() # Iterate through training data loader for i, (inputs, targets) in enumerate(train_dl): optimizer.zero_grad() outputs = model(inputs) _, preds = torch.max(outputs.data,1) #Get the class labels loss = criterion(outputs, targets) loss.backward() optimizer.step() torch.save(model, save_path) time_delta = time.time() - start print('Training complete in {:.0f}m {:.0f}s'.format( time_delta // 60, time_delta % 60 )) return model Here we go:
The next step is to create the loop to loop through each epoch and start to train the model:
Note: to use the GPU you would need to cast model.to(device) or model.to(“cuda”) for parallel processing. There we go we have created the training loop. Next we need to decide how we are going to evaluate the model. Step four – evaluating the performance of our networkThe next function will be used to evaluate our PyTorch model to see if it is any good, or if we have been wasting our time for the last 20 minutes. The function for this is contained hereunder, and as always, I will add my view of what is happening in each step: import math def evaluate_model(test_dl, model, beta=1.0): preds = [] actuals = [] for (i, (inputs, targets)) in enumerate(test_dl): #Evaluate the model on the test set yhat = model(inputs) #Retrieve a numpy weights array yhat = yhat.detach().numpy() # Extract the weights using detach to get the numerical values in an ndarray, instead of tensor actual = targets.numpy() actual = actual.reshape((len(actual), 1)) # Round to get the class value i.e. sick vs not sick yhat = yhat.round() # Store the predictions in the empty lists initialised at the start of the class preds.append(yhat) actuals.append(actual) # Stack the predictions and actual arrays vertically preds, actuals = vstack(preds), vstack(actuals) #Calculate metrics cm = confusion_matrix(actuals, preds) # Get descriptions of tp, tn, fp, fn tn, fp, fn, tp = cm.ravel() total = sum(cm.ravel()) metrics = { 'accuracy': accuracy_score(actuals, preds), 'AU_ROC': roc_auc_score(actuals, preds), 'f1_score': f1_score(actuals, preds), 'average_precision_score': average_precision_score(actuals, preds), 'f_beta': ((1+beta**2) * precision_score(actuals, preds) * recall_score(actuals, preds)) / (beta**2 * precision_score(actuals, preds) + recall_score(actuals, preds)), 'matthews_correlation_coefficient': (tp*tn - fp*fn) / math.sqrt((tp+fp)*(tp+fn)*(tn+fp)*(tn+fn)), 'precision': precision_score(actuals, preds), 'recall': recall_score(actuals, preds), 'true_positive_rate_TPR':recall_score(actuals, preds), 'false_positive_rate_FPR':fp / (fp + tn) , 'false_discovery_rate': fp / (fp +tp), 'false_negative_rate': fn / (fn + tp) , 'negative_predictive_value': tn / (tn+fn), 'misclassification_error_rate': (fp+fn)/total , 'sensitivity': tp / (tp + fn), 'specificity': tn / (tn + fp), #'confusion_matrix': confusion_matrix(actuals, preds), 'TP': tp, 'FP': fp, 'FN': fn, 'TN': tn } return metrics, preds, actuals There is a lot to go over here:
Step five – creating the prediction routineThis routine is a relatively simple function to those we have compared above. This routine takes in the row (a new list of data) as well as the relevant model and returns a prediction from the model yhat. Finally, we return a detached numpy array: def predict(row, model): row = Tensor([row]) yhat = model(row) # Get numpy array yhat = yhat.detach().numpy() return yhat This will give use a prediction for each input we pass into the model. The next step is to prepare our data ready for working with the model. Step six – prepare the data to use with our modelWe are actually at the point where we will be using our custom model structure to run our model, but first we need an additional helper function to allow us to prepare our dataset: def prepare_thyroid_dataset(path): dataset = ThyroidCSVDataset(path) train, test = dataset.split_data(split_ratio=0.1) # Prepare data loaders train_dl = DataLoader(train, batch_size=32, shuffle=True) test_dl = DataLoader(test, batch_size=1024, shuffle=False) return train_dl, test_dl This function takes the path of where the csv file is stored. In our case this is on GitHub: https://raw.githubusercontent.com/StatsGary/Data/main/thyroid_raw.csv. Then the following happens:
Using our custom classes to train the modelWe have prepared all the groundwork needed to build out supervised machine learning classifier. So let’s step through and call the relevant functions. Loading the datasetWe will fetch the thyroid dataset from the blob storage. This dataset is highly imbalanced and is a Kaggle classification project, so I would expect the model to do well in predicting the negative examples and not so well in picking up whether a patient is sick. You would need to have some imbalanced label strategies in your back pocket – such as SMOTE and ROSE, but these are beyond the scope of this tutorial. Let’s load the data: train_dl, test_dl = prepare_thyroid_dataset('https://raw.githubusercontent.com/StatsGary/Data/main/thyroid_raw.csv') Training the modelTo train the model we will pass one input – this is the number of independent variables, or predictor variables, to use with the model. I know that the thyroid dataset has 26, so this is the input we would choose. The only thing you would need to change is this value: # Specify the number of input dimensions model = ThyroidMLP(26) To configure the training run we will use the train_model function we created: train_model(train_dl, model, save_path='data/thyroid_model.pth', epochs=150, lr=0.01) Here I pass:
You will see the epochs running and then you will get a print out of the DAG (Directed acyclic graph). Evaluating how well the model performs with the test dataPreviously we built an evaluation model function, which had a long dictionary of model results. We are going to pass our test dataloader to this to get the results: results = evaluate_model(test_dl, model, beta=1) model_metrics = results[0] metrics_df = pd.DataFrame.from_dict(model_metrics, orient='index', columns=['metric']) metrics_df.index.name = 'metric_type' metrics_df.reset_index(inplace=True) metrics_df.to_csv('confusion_matrix_thyroid.csv', index=False) To decode this:
The results are below: # metric_type metric # 0 accuracy 0.727273 # 1 AU_ROC 0.497436 # 2 f1_score 0.096386 # 3 average_precision_score 0.235493 # 4 f_beta 0.096386 # 5 matthews_correlation_coefficient -0.008809 # 6 precision 0.222222 # 7 recall 0.061538 # 8 true_positive_rate_TPR 0.061538 # 9 false_positive_rate_FPR 0.066667 # 10 false_discovery_rate 0.777778 # 11 false_negative_rate 0.938462 # 12 negative_predictive_value 0.762646 # 13 misclassification_error_rate 0.272727 # 14 sensitivity 0.061538 # 15 specificity 0.933333 # 16 TP 12.000000 # 17 FP 42.000000 # 18 FN 183.000000 # 19 TN 588.000000 As suspected – we have a massively imbalanced datasets, so the MLP is struggling to produce a good classification for the true positives, as the absence of positive labels is apparent. Let’s poke our model and see what prediction we get, but looking at these results it is most likely to output that the classification of thyroid disease is negative. Make a prediction against the modelThis would be the part where if you were happy you would push the model into production and then new unseen observations would be scored against the model. I will poke it with one observation to see how it performs. row = [0.8408678952719717,0.7480132415430958,-0.3366221139379705,-0.0938130059640389,-0.1101874782051067,-0.2098160394213988,-0.1260114177378201,-0.1118651062104989,-0.1274917875477927,-0.240146053214037,-0.2574472174396955,-0.0715198539852151,-0.0855764265990022,-0.1493202733578882,-0.0190692517849118,-0.2590488060984638,0.0,-0.1753175780014474,0.0,-0.9782211033008232,0.0,-1.3237957945784953,0.0,-0.6384998731458282,0.0,-1.209042232192488] yhat = predict(row, model) print('Predicted: %.3f (class=%d)' % (yhat, yhat.round())) I have used a row in the dataframe where I know the patient is sick to test the label value of the model. I can see that my suspicions about the imbalance in the model are true:
I would never deploy this model. My next step would be to try some class rebalancing techniques. Running the model against balanced datasetFor sake of time – I will use a dataset called Ionsphere that I know is well balanced and will show our model in a better light than this example. We will do the data prep and training in one cell and then poke the model. The ionsphere data is stored here:https://raw.githubusercontent.com/StatsGary/Data/main/ion.csv. Prepare and trainI will add the ionsphere data in one code cell – this will show how to prepare the data, train the model, evaluate and predict against the model: # Get the ionsphere data train_dl, test_dl = prepare_thyroid_dataset('https://raw.githubusercontent.com/StatsGary/Data/main/ion.csv') # Train the model # Specify the number of input dimensions model = ThyroidMLP(34) # Train the model train_model(train_dl, model, save_path='data/ionsphere_model.pth', epochs=150, lr=0.01) The only differences here is that I loaded a different dataset and specified the number of input dimensions differently. Evaluate the modelI have written a little eval_model wrapper function here, as there are a couple of steps to converting the stored dictionary into a data frame: # Evaluate the model def eval_model(test_dl, model, cm_out_name='confusion_mat.csv', beta=1, export_index=False): results = evaluate_model(test_dl, model, beta) model_metrics = results[0] metrics_df = pd.DataFrame.from_dict(model_metrics, orient='index', columns=['metric']) metrics_df.index.name = 'metric_type' metrics_df.reset_index(inplace=True) metrics_df.to_csv(cm_out_name, index=export_index) print(metrics_df) return metrics_df, model_metrics, results results = eval_model(test_dl, model) print(results[0]) This returns the metrics data frame, the model metrics as their raw dictionary and the results of the evaluate_model function. Let’s see how our model performs: # metric_type metric # 0 accuracy 0.923810 # 1 AU_ROC 0.882353 # 2 f1_score 0.946667 # 3 average_precision_score 0.898734 # 4 f_beta 0.946667 # 5 matthews_correlation_coefficient 0.829016 # 6 precision 0.898734 # 7 recall 1.000000 # 8 true_positive_rate_TPR 1.000000 # 9 false_positive_rate_FPR 0.235294 # 10 false_discovery_rate 0.101266 # 11 false_negative_rate 0.000000 # 12 negative_predictive_value 1.000000 # 13 misclassification_error_rate 0.076190 # 14 sensitivity 1.000000 # 15 specificity 0.764706 # 16 TP 71.000000 # 17 FP 8.000000 # 18 FN 0.000000 # 19 TN 26.000000 This model is much more balanced, and much more unrealistic than the first scenario presented, however it makes for good practice in implemeting the model with different datasets. Predict with the modelThe final step we will make a prediction with the model. I will choose a row that I know should be a positive class and let’s see how our model performs: # Make prediction against model row = [1,0,1,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1,-0.04549,0.50874,-0.67743,0.34432,-0.69707,-0.51685,-0.97515,0.05499,-0.62237,0.33109,-1,-0.13151,-0.45300,-0.18056,-0.35734,-0.20332,-0.26569,-0.20468,-0.18401,-0.19040,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447] yhat = predict(row, model) print('Predicted: %.3f (class=%d)' % (yhat, yhat.round()))
This model does so much better at predicting the right class label. That is that! You have done excellent!That is it! You have reached the end of this tutorial. I hope by working through this you feel confident to implement your own PyTorch module, or just use this code for your projects. Please reach out if you need any help adapting this code. I have really enjoyed putting this together and I continue to develop in this toolset, as I love the flexibility. All there is left to say is: To leave a comment for the author, please follow the link and comment on their blog: Python – Hutsons-hacks . Want to share your content on python-bloggers? click here. |