4 분 소요

Improving NN - 가중치 초기화

목적

NN의 가중치 초기화를 통한 성능향상을 시도한다.

데이터셋

Untitled

도넛형으로 구성된 이진 분류 데이터셋이다.

Model

모델은 전 과정에서 구축한 3-layer NN을 사용한다.

def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):
    """
    Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.
    
    Arguments:
    X -- input data, of shape (2, number of examples)
    Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)
    learning_rate -- learning rate for gradient descent 
    num_iterations -- number of iterations to run gradient descent
    print_cost -- if True, print the cost every 1000 iterations
    initialization -- flag to choose which initialization to use ("zeros","random" or "he")
    
    Returns:
    parameters -- parameters learnt by the model
    """
        
    grads = {}
    costs = [] # to keep track of the loss
    m = X.shape[1] # number of examples
    layers_dims = [X.shape[0], 10, 5, 1]
    
    # Initialize parameters dictionary.
    if initialization == "zeros":
        parameters = initialize_parameters_zeros(layers_dims)
    elif initialization == "random":
        parameters = initialize_parameters_random(layers_dims)
    elif initialization == "he":
        parameters = initialize_parameters_he(layers_dims)

    # Loop (gradient descent)

    for i in range(num_iterations):

        # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.
        a3, cache = forward_propagation(X, parameters)
        
        # Loss
        cost = compute_loss(a3, Y)

        # Backward propagation.
        grads = backward_propagation(X, Y, cache)
        
        # Update parameters.
        parameters = update_parameters(parameters, grads, learning_rate)
        
        # Print the loss every 1000 iterations
        if print_cost and i % 1000 == 0:
            print("Cost after iteration {}: {}".format(i, cost))
            costs.append(cost)
            
    # plot the loss
    plt.plot(costs)
    plt.ylabel('cost')
    plt.xlabel('iterations (per hundreds)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()
    
    return parameters

weight 초기화 방식

W[1] … W[L-1], W[L] / b[1] … b[L-1],b[L]의 가중치 초기화에 대한 방법이다.

  • Zero 초기화

    np.zeros()를 활용하여 각 NN에 shape에 맞게 0으로 초기화한다.

      def initialize_parameters_zeros(layers_dims):
      	parameters = {}
      	L = len(layers_dims)
      	for i in range(1,L):
      		parameters['W' + str(i)] = np.zeros((layers_dims[i],layers_dims[i-1]))
      		parameters['b' + str(i)] = np.zeros((layers_dims[i],1))
      	return parameters
    
      parameters = model(train_X, train_Y, initialization = "zeros")
      print ("On the train set:")
      predictions_train = predict(train_X, train_Y, parameters)
      print ("On the test set:")
      predictions_test = predict(test_X, test_Y, parameters)
    
      Cost after iteration 0: 0.6931471805599453
      Cost after iteration 1000: 0.6931471805599453
      Cost after iteration 2000: 0.6931471805599453
      Cost after iteration 3000: 0.6931471805599453
      Cost after iteration 4000: 0.6931471805599453
      Cost after iteration 5000: 0.6931471805599453
      Cost after iteration 6000: 0.6931471805599453
      Cost after iteration 7000: 0.6931471805599453
      Cost after iteration 8000: 0.6931471805599453
      Cost after iteration 9000: 0.6931471805599453
      Cost after iteration 10000: 0.6931471805599455
      Cost after iteration 11000: 0.6931471805599453
      Cost after iteration 12000: 0.6931471805599453
      Cost after iteration 13000: 0.6931471805599453
      Cost after iteration 14000: 0.6931471805599453
        
      On the train set:
      Accuracy: 0.5
      On the test set:
      Accuracy: 0.5
    

    Untitled 1

    𝑎=𝑅𝑒𝐿𝑈(𝑧)=𝑚𝑎𝑥(0,𝑧)=0

    초기 가중치가 0으로 초기화 되었으므로, a=0이며, 말단에 sigmoid를 적용하면

    1/ 1+e^0 = 1/2 = ypred가 된다.

    이를 손실함수에 적용하면

    L(𝑎,𝑦)=−𝑦 ln (𝑦𝑝𝑟𝑒𝑑)−(1−𝑦) ln (1−𝑦𝑝𝑟𝑒𝑑)

    if y가 1이라면

    = - 1 ln ( 1/2) - (0) ln ( 1 - 0.5)

    = -ln(1/2) = 0.693

    if y가 0이라면

    = -ln(1/2) = 0.693

    예측이 0.5일 수 밖에 없기때문에, 동일한 손실을 출력할 수 밖에 없고, 이는 가중치를 변경하지 않게한다.

    → 절대 0으로 초기화해서는안된다.

  • 랜덤 초기화

    np.random.randn(shape)을 사용하여 랜덤으로 w를 초기화하고, b는 zeros를 통해 초기화한다.

      # np.random.randn은 정규분포
      # np.random.rand는 균일분포에서 랜덤으로 한다.
        
      def initialize_parameters_random(layers_dims):
        
      	np.random.seed(3)
      	parameters = {}
      	L = len(layers_dims)
      	for l in range(1,L):
      		parameters['W'+str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 10
      		parameters['b'+str(l)] = np.zeros((layers_dims[l],1))
      	return parameters
    
      parameters = model(train_X, train_Y, initialization = "random")
      print ("On the train set:")
      predictions_train = predict(train_X, train_Y, parameters)
      print ("On the test set:")
      predictions_test = predict(test_X, test_Y, parameters)
    
      Cost after iteration 0: inf
      Cost after iteration 1000: 0.6247924745506072
      Cost after iteration 2000: 0.5980258056061102
      Cost after iteration 3000: 0.5637539062842213
      Cost after iteration 4000: 0.5501256393526495
      Cost after iteration 5000: 0.5443826306793814
      Cost after iteration 6000: 0.5373895855049121
      Cost after iteration 7000: 0.47157999220550006
      Cost after iteration 8000: 0.39770475516243037
      Cost after iteration 9000: 0.3934560146692851
      Cost after iteration 10000: 0.3920227137490125
      Cost after iteration 11000: 0.38913700035966736
      Cost after iteration 12000: 0.3861358766546214
      Cost after iteration 13000: 0.38497629552893475
      Cost after iteration 14000: 0.38276694641706693
        
      On the train set:
      Accuracy: 0.83
      On the test set:
      Accuracy: 0.86
    

    초기 비용함수의 손실값이 inf로 나타나는 것은 가중치 초기화를 매우큰값으로 시작했기에, 손실함수가 무한대에 비슷한 손실을 가져 표현을 못하게 된것이다. 즉, 높은 랜덤 가중치 값은 최적화의 속도를 느리게 하기에 작은 random값을 곱해주는것이 중요하다.

    Untitled 2

    Untitled 3

  • He 초기화

    Xavier 초기화라고도 불리는 초기화방식으로 정규분포로 부터 뽑은 랜덤value에

    sqrt(2./layers_dims[l-1])를 곱해서 사용한다.

      def initialize_parameters_he(layers_dims):
      		np.random.seed(3)
          parameters = {}
          L = len(layers_dims) - 1 
             
          for l in range(1, L + 1):
               
              parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) *np.sqrt(2./layers_dims[l-1])
              parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
                
                
          return parameters
        
    
      parameters = model(train_X, train_Y, initialization = "he")
      print ("On the train set:")
      predictions_train = predict(train_X, train_Y, parameters)
      print ("On the test set:")
      predictions_test = predict(test_X, test_Y, parameters)
    
      Cost after iteration 0: 0.8830537463419761
      Cost after iteration 1000: 0.6879825919728063
      Cost after iteration 2000: 0.6751286264523371
      Cost after iteration 3000: 0.6526117768893805
      Cost after iteration 4000: 0.6082958970572938
      Cost after iteration 5000: 0.5304944491717495
      Cost after iteration 6000: 0.4138645817071794
      Cost after iteration 7000: 0.3117803464844441
      Cost after iteration 8000: 0.23696215330322562
      Cost after iteration 9000: 0.1859728720920684
      Cost after iteration 10000: 0.15015556280371808
      Cost after iteration 11000: 0.12325079292273551
      Cost after iteration 12000: 0.09917746546525937
      Cost after iteration 13000: 0.08457055954024283
      Cost after iteration 14000: 0.07357895962677366
        
      On the train set:
      Accuracy: 0.9933333333333333
      On the test set:
      Accuracy: 0.96
    

    Untitled 4

    Untitled 5

결론

모델은 공통적으로 3 layer- NN

활성화 Relu, 출력층 sigmoid , 손실함수 binary-crossentropy를 사용.

초기화방식 훈련 정확도 평가
zero init 50% 동일한 손실함수 값으로 가중치 수정 불가능
large random init 83% 너무 큰 weight로 최적화속도문제
He init 99% 가장 효율적이다.

카테고리:

업데이트:

댓글남기기